# Feature: Compositional Workflow Architecture with Worktree Isolation, Test Resolution, and Review Resolution ## Feature Description Transform the agent-work-orders system from a centralized orchestrator pattern to a compositional script-based architecture that enables parallel execution through git worktrees, automatic test failure resolution with retry logic, and comprehensive review phase with blocker issue patching. This architecture change enables running 15+ work orders simultaneously in isolated worktrees with deterministic port allocation, while maintaining complete SDLC coverage from planning through testing and review. The system will support: - **Worktree-based isolation**: Each work order runs in its own git worktree under `trees//` instead of temporary clones - **Port allocation**: Deterministic backend (9100-9114) and frontend (9200-9214) port assignment based on work order ID - **Test phase with resolution**: Automatic retry loop (max 4 attempts) that resolves failed tests using AI-powered fixes - **Review phase with resolution**: Captures screenshots, compares implementation vs spec, categorizes issues (blocker/tech_debt/skippable), and automatically patches blocker issues (max 3 attempts) - **File-based state**: Simple JSON state management (`adw_state.json`) instead of in-memory repository - **Compositional scripts**: Independent workflow scripts (plan, build, test, review, doc, ship) that can be run separately or together ## User Story As a developer managing multiple concurrent features I want to run multiple agent work orders in parallel with isolated environments So that I can scale development velocity without conflicts or resource contention, while ensuring all code passes tests and review before deployment ## Problem Statement The current agent-work-orders architecture has several critical limitations: 1. **No Parallelization**: GitBranchSandbox creates temporary clones that get cleaned up, preventing safe parallel execution of multiple work orders 2. **No Test Coverage**: Missing test workflow step - implementations are committed and PR'd without validation 3. **No Automated Test Resolution**: When tests fail, there's no retry/fix mechanism to automatically resolve failures 4. **No Review Phase**: No automated review of implementation against specifications with screenshot capture and blocker detection 5. **Centralized Orchestration**: Monolithic orchestrator makes it difficult to run individual phases (e.g., just test, just review) independently 6. **In-Memory State**: State management in WorkOrderRepository is not persistent across service restarts 7. **No Port Management**: No system for allocating unique ports for parallel instances These limitations prevent scaling development workflows and ensuring code quality before PRs are created. ## Solution Statement Implement a compositional workflow architecture inspired by the ADW (AI Developer Workflow) pattern with the following components: SEE EXAMPLES HERE: PRPs/examples/\* READ THESE 1. **GitWorktreeSandbox**: Replace GitBranchSandbox with worktree-based isolation that shares the same repo but has independent working directories 2. **Port Allocation System**: Deterministic port assignment (backend: 9100-9114, frontend: 9200-9214) based on work order ID hash 3. **File-Based State Management**: JSON state files for persistence and debugging 4. **Test Workflow Module**: New `test_workflow.py` with automatic resolution and retry logic (4 attempts) 5. **Review Workflow Module**: New `review_workflow.py` with screenshot capture, spec comparison, and blocker patching (3 attempts) 6. **Compositional Scripts**: Independent workflow operations that can be composed or run individually 7. **Enhanced WorkflowStep Enum**: Add TEST, RESOLVE_TEST, REVIEW, RESOLVE_REVIEW steps 8. **Resolution Commands**: New Claude commands `/resolve_failed_test` and `/resolve_failed_review` for AI-powered fixes ## Relevant Files ### Core Workflow Files - `python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py` - Main orchestrator that needs refactoring for compositional approach - Currently: Monolithic execute_workflow with sequential steps - Needs: Modular workflow composition with test/review phases - `python/src/agent_work_orders/workflow_engine/workflow_operations.py` - Atomic workflow operations - Currently: classify_issue, build_plan, implement_plan, create_commit, create_pull_request - Needs: Add test_workflow, review_workflow, resolve_test, resolve_review operations - `python/src/agent_work_orders/models.py` - Data models including WorkflowStep enum - Currently: WorkflowStep has CLASSIFY, PLAN, IMPLEMENT, COMMIT, REVIEW, TEST, CREATE_PR - Needs: Add RESOLVE_TEST, RESOLVE_REVIEW steps ### Sandbox Management Files - `python/src/agent_work_orders/sandbox_manager/git_branch_sandbox.py` - Current temp clone implementation - Problem: Creates temp dirs, no parallelization support - Will be replaced by: GitWorktreeSandbox - `python/src/agent_work_orders/sandbox_manager/sandbox_factory.py` - Factory for creating sandboxes - Needs: Add GitWorktreeSandbox creation logic - `python/src/agent_work_orders/sandbox_manager/sandbox_protocol.py` - Sandbox interface - May need: Port allocation methods ### State Management Files - `python/src/agent_work_orders/state_manager/work_order_repository.py` - Current in-memory state - Currently: In-memory dictionary with async methods - Needs: File-based JSON persistence option - `python/src/agent_work_orders/config.py` - Configuration - Needs: Port range configuration, worktree base directory ### Command Files - `python/.claude/commands/agent-work-orders/test.md` - Currently just a hello world test - Needs: Comprehensive test suite runner that returns JSON with failed tests - `python/.claude/commands/agent-work-orders/implementor.md` - Implementation command - May need: Context about test requirements ### New Files #### Worktree Management - `python/src/agent_work_orders/sandbox_manager/git_worktree_sandbox.py` - New worktree-based sandbox - `python/src/agent_work_orders/utils/worktree_operations.py` - Worktree CRUD operations - `python/src/agent_work_orders/utils/port_allocation.py` - Port management utilities #### Test Workflow - `python/src/agent_work_orders/workflow_engine/test_workflow.py` - Test execution with resolution - `python/.claude/commands/agent-work-orders/test_runner.md` - Run test suite, return JSON - `python/.claude/commands/agent-work-orders/resolve_failed_test.md` - Fix failed test given JSON #### Review Workflow - `python/src/agent_work_orders/workflow_engine/review_workflow.py` - Review with screenshot capture - `python/.claude/commands/agent-work-orders/review_runner.md` - Run review against spec - `python/.claude/commands/agent-work-orders/resolve_failed_review.md` - Patch blocker issues - `python/.claude/commands/agent-work-orders/create_patch_plan.md` - Generate patch plan for issue #### State Management - `python/src/agent_work_orders/state_manager/file_state_repository.py` - JSON file-based state - `python/src/agent_work_orders/models/workflow_state.py` - State data models #### Documentation - `docs/compositional-workflows.md` - Architecture documentation - `docs/worktree-management.md` - Worktree operations guide - `docs/test-resolution.md` - Test workflow documentation - `docs/review-resolution.md` - Review workflow documentation ## Implementation Plan ### Phase 1: Foundation - Worktree Isolation and Port Allocation Establish the core infrastructure for parallel execution through git worktrees and deterministic port allocation. This phase creates the foundation for all subsequent phases. **Key Deliverables**: - GitWorktreeSandbox implementation - Port allocation system - Worktree management utilities - `.ports.env` file generation - Updated sandbox factory ### Phase 2: File-Based State Management Replace in-memory state repository with file-based JSON persistence for durability and debuggability across service restarts. **Key Deliverables**: - FileStateRepository implementation - WorkflowState models - State migration utilities - JSON serialization/deserialization - Backward compatibility layer ### Phase 3: Test Workflow with Resolution Implement comprehensive test execution with automatic failure resolution and retry logic. **Key Deliverables**: - test_workflow.py module - test_runner.md command (returns JSON array of test results) - resolve_failed_test.md command (takes test JSON, fixes issue) - Retry loop (max 4 attempts) - Test result parsing and formatting - Integration with orchestrator ### Phase 4: Review Workflow with Resolution Add review phase with screenshot capture, spec comparison, and automatic blocker patching. **Key Deliverables**: - review_workflow.py module - review_runner.md command (compares implementation vs spec) - resolve_failed_review.md command (patches blocker issues) - Screenshot capture integration - Issue severity categorization (blocker/tech_debt/skippable) - Retry loop (max 3 attempts) - R2 upload integration (optional) ### Phase 5: Compositional Refactoring Refactor the centralized orchestrator into composable workflow scripts that can be run independently. **Key Deliverables**: - Modular workflow composition - Independent script execution - Workflow step dependencies - Enhanced error handling - Workflow resumption support ## Step by Step Tasks ### Step 1: Create Worktree Sandbox Implementation Create the core GitWorktreeSandbox class that manages git worktrees for isolated execution. - Create `python/src/agent_work_orders/sandbox_manager/git_worktree_sandbox.py` - Implement `GitWorktreeSandbox` class with: - `__init__(repository_url, sandbox_identifier)` - Initialize with worktree path calculation - `setup()` - Create worktree under `trees//` from origin/main - `cleanup()` - Remove worktree using `git worktree remove` - `execute_command(command, timeout)` - Execute commands in worktree context - `get_git_branch_name()` - Query current branch in worktree - Handle existing worktree detection and validation - Add logging for all worktree operations - Write unit tests for GitWorktreeSandbox in `python/tests/agent_work_orders/sandbox_manager/test_git_worktree_sandbox.py` ### Step 2: Implement Port Allocation System Create deterministic port allocation based on work order ID to enable parallel instances. - Create `python/src/agent_work_orders/utils/port_allocation.py` - Implement functions: - `get_ports_for_work_order(work_order_id) -> Tuple[int, int]` - Calculate ports from ID hash (backend: 9100-9114, frontend: 9200-9214) - `is_port_available(port: int) -> bool` - Check if port is bindable - `find_next_available_ports(work_order_id, max_attempts=15) -> Tuple[int, int]` - Find available ports with offset - `create_ports_env_file(worktree_path, backend_port, frontend_port)` - Generate `.ports.env` file - Add port range configuration to `python/src/agent_work_orders/config.py` - Write unit tests for port allocation in `python/tests/agent_work_orders/utils/test_port_allocation.py` ### Step 3: Create Worktree Management Utilities Build helper utilities for worktree CRUD operations. - Create `python/src/agent_work_orders/utils/worktree_operations.py` - Implement functions: - `create_worktree(work_order_id, branch_name, logger) -> Tuple[str, Optional[str]]` - Create worktree and return path or error - `validate_worktree(work_order_id, state) -> Tuple[bool, Optional[str]]` - Three-way validation (state, filesystem, git) - `get_worktree_path(work_order_id) -> str` - Calculate absolute worktree path - `remove_worktree(work_order_id, logger) -> Tuple[bool, Optional[str]]` - Clean up worktree - `setup_worktree_environment(worktree_path, backend_port, frontend_port, logger)` - Create .ports.env - Handle git fetch operations before worktree creation - Add comprehensive error handling and logging - Write unit tests for worktree operations in `python/tests/agent_work_orders/utils/test_worktree_operations.py` ### Step 4: Update Sandbox Factory Modify the sandbox factory to support creating GitWorktreeSandbox instances. - Update `python/src/agent_work_orders/sandbox_manager/sandbox_factory.py` - Add GIT_WORKTREE case to `create_sandbox()` method - Integrate port allocation during sandbox creation - Pass port configuration to GitWorktreeSandbox - Update SandboxType enum in models.py to promote GIT_WORKTREE from placeholder - Write integration tests for sandbox factory with worktrees ### Step 5: Implement File-Based State Repository Create file-based state management for persistence and debugging. - Create `python/src/agent_work_orders/state_manager/file_state_repository.py` - Implement `FileStateRepository` class: - `__init__(state_directory: str)` - Initialize with state directory path - `save_state(work_order_id, state_data)` - Write JSON to `/.json` - `load_state(work_order_id) -> Optional[dict]` - Read JSON from file - `list_states() -> List[str]` - List all work order IDs with state files - `delete_state(work_order_id)` - Remove state file - `update_status(work_order_id, status, **kwargs)` - Update specific fields - `save_step_history(work_order_id, step_history)` - Persist step history - Add state directory configuration to config.py - Create state models in `python/src/agent_work_orders/models/workflow_state.py` - Write unit tests for file state repository ### Step 6: Update WorkflowStep Enum Add new workflow steps for test and review resolution. - Update `python/src/agent_work_orders/models.py` - Add to WorkflowStep enum: - `RESOLVE_TEST = "resolve_test"` - Test failure resolution step - `RESOLVE_REVIEW = "resolve_review"` - Review issue resolution step - Update `StepHistory.get_current_step()` to include new steps in sequence: - Updated sequence: CLASSIFY → PLAN → FIND_PLAN → GENERATE_BRANCH → IMPLEMENT → COMMIT → TEST → RESOLVE_TEST (if needed) → REVIEW → RESOLVE_REVIEW (if needed) → CREATE_PR - Write unit tests for updated step sequence logic ### Step 7: Create Test Runner Command Build Claude command to execute test suite and return structured JSON results. - Update `python/.claude/commands/agent-work-orders/test_runner.md` - Command should: - Execute backend tests: `cd python && uv run pytest tests/ -v --tb=short` - Execute frontend tests: `cd archon-ui-main && npm test` - Parse test results from output - Return JSON array with structure: ```json [ { "test_name": "string", "test_file": "string", "passed": boolean, "error": "optional string", "execution_command": "string" } ] ``` - Include test purpose and reproduction command - Sort failed tests first - Handle timeout and command errors gracefully - Test the command manually with sample repositories ### Step 8: Create Resolve Failed Test Command Build Claude command to analyze and fix failed tests given test JSON. - Create `python/.claude/commands/agent-work-orders/resolve_failed_test.md` - Command takes single argument: test result JSON object - Command should: - Parse test failure information - Analyze root cause of failure - Read relevant test file and code under test - Implement fix (code change or test update) - Re-run the specific failed test to verify fix - Report success/failure - Include examples of common test failure patterns - Add constraints (don't skip tests, maintain test coverage) - Test the command with sample failed test JSONs ### Step 9: Implement Test Workflow Module Create the test workflow module with automatic resolution and retry logic. - Create `python/src/agent_work_orders/workflow_engine/test_workflow.py` - Implement functions: - `run_tests(executor, command_loader, work_order_id, working_dir) -> StepExecutionResult` - Execute test suite - `parse_test_results(output, logger) -> Tuple[List[TestResult], int, int]` - Parse JSON output - `resolve_failed_test(executor, command_loader, test_json, work_order_id, working_dir) -> StepExecutionResult` - Fix single test - `run_tests_with_resolution(executor, command_loader, work_order_id, working_dir, max_attempts=4) -> Tuple[List[TestResult], int, int]` - Main retry loop - Implement retry logic: - Run tests, check for failures - If failures exist and attempts < max_attempts: resolve each failed test - Re-run tests after resolution - Stop if all tests pass or max attempts reached - Add TestResult model to models.py - Write comprehensive unit tests for test workflow ### Step 10: Add Test Workflow Operation Create atomic operation for test execution in workflow_operations.py. - Update `python/src/agent_work_orders/workflow_engine/workflow_operations.py` - Add function: ```python async def execute_tests( executor: AgentCLIExecutor, command_loader: ClaudeCommandLoader, work_order_id: str, working_dir: str, ) -> StepExecutionResult ``` - Function should: - Call `run_tests_with_resolution()` from test_workflow.py - Return StepExecutionResult with test summary - Include pass/fail counts in output - Log detailed test results - Add TESTER constant to agent_names.py - Write unit tests for execute_tests operation ### Step 11: Integrate Test Phase in Orchestrator Add test phase to workflow orchestrator between COMMIT and CREATE_PR steps. - Update `python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py` - After commit step (line ~236), add: ```python # Step 7: Run tests with resolution test_result = await workflow_operations.execute_tests( self.agent_executor, self.command_loader, agent_work_order_id, sandbox.working_dir, ) step_history.steps.append(test_result) await self.state_repository.save_step_history(agent_work_order_id, step_history) if not test_result.success: raise WorkflowExecutionError(f"Tests failed: {test_result.error_message}") bound_logger.info("step_completed", step="test") ``` - Update step numbering (PR creation becomes step 8) - Add test failure handling strategy - Write integration tests for full workflow with test phase ### Step 12: Create Review Runner Command Build Claude command to review implementation against spec with screenshot capture. - Create `python/.claude/commands/agent-work-orders/review_runner.md` - Command takes arguments: spec_file_path, work_order_id - Command should: - Read specification from spec_file_path - Analyze implementation in codebase - Start application (if UI component) - Capture screenshots of key UI flows - Compare implementation against spec requirements - Categorize issues by severity: "blocker" | "tech_debt" | "skippable" - Return JSON with structure: ```json { "review_passed": boolean, "review_issues": [ { "issue_title": "string", "issue_description": "string", "issue_severity": "blocker|tech_debt|skippable", "affected_files": ["string"], "screenshots": ["string"] } ], "screenshots": ["string"] } ``` - Include review criteria and severity definitions - Test command with sample specifications ### Step 13: Create Resolve Failed Review Command Build Claude command to patch blocker issues from review. - Create `python/.claude/commands/agent-work-orders/resolve_failed_review.md` - Command takes single argument: review issue JSON object - Command should: - Parse review issue details - Create patch plan addressing the issue - Implement the patch (code changes) - Verify patch resolves the issue - Report success/failure - Include constraints (only fix blocker issues, maintain functionality) - Add examples of common review issue patterns - Test command with sample review issues ### Step 14: Implement Review Workflow Module Create the review workflow module with automatic blocker patching. - Create `python/src/agent_work_orders/workflow_engine/review_workflow.py` - Implement functions: - `run_review(executor, command_loader, spec_file, work_order_id, working_dir) -> ReviewResult` - Execute review - `parse_review_results(output, logger) -> ReviewResult` - Parse JSON output - `resolve_review_issue(executor, command_loader, issue_json, work_order_id, working_dir) -> StepExecutionResult` - Patch single issue - `run_review_with_resolution(executor, command_loader, spec_file, work_order_id, working_dir, max_attempts=3) -> ReviewResult` - Main retry loop - Implement retry logic: - Run review, check for blocker issues - If blockers exist and attempts < max_attempts: resolve each blocker - Re-run review after patching - Stop if no blockers or max attempts reached - Allow tech_debt and skippable issues to pass - Add ReviewResult and ReviewIssue models to models.py - Write comprehensive unit tests for review workflow ### Step 15: Add Review Workflow Operation Create atomic operation for review execution in workflow_operations.py. - Update `python/src/agent_work_orders/workflow_engine/workflow_operations.py` - Add function: ```python async def execute_review( executor: AgentCLIExecutor, command_loader: ClaudeCommandLoader, spec_file: str, work_order_id: str, working_dir: str, ) -> StepExecutionResult ``` - Function should: - Call `run_review_with_resolution()` from review_workflow.py - Return StepExecutionResult with review summary - Include blocker count in output - Log detailed review results - Add REVIEWER constant to agent_names.py - Write unit tests for execute_review operation ### Step 16: Integrate Review Phase in Orchestrator Add review phase to workflow orchestrator between TEST and CREATE_PR steps. - Update `python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py` - After test step, add: ```python # Step 8: Run review with resolution review_result = await workflow_operations.execute_review( self.agent_executor, self.command_loader, plan_file or "", agent_work_order_id, sandbox.working_dir, ) step_history.steps.append(review_result) await self.state_repository.save_step_history(agent_work_order_id, step_history) if not review_result.success: raise WorkflowExecutionError(f"Review failed: {review_result.error_message}") bound_logger.info("step_completed", step="review") ``` - Update step numbering (PR creation becomes step 9) - Add review failure handling strategy - Write integration tests for full workflow with review phase ### Step 17: Refactor Orchestrator for Composition Refactor workflow orchestrator to support modular composition. - Update `python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py` - Extract workflow phases into separate methods: - `_execute_planning_phase()` - classify → plan → find_plan → generate_branch - `_execute_implementation_phase()` - implement → commit - `_execute_testing_phase()` - test → resolve_test (if needed) - `_execute_review_phase()` - review → resolve_review (if needed) - `_execute_deployment_phase()` - create_pr - Update `execute_workflow()` to compose phases: ```python await self._execute_planning_phase(...) await self._execute_implementation_phase(...) await self._execute_testing_phase(...) await self._execute_review_phase(...) await self._execute_deployment_phase(...) ``` - Add phase-level error handling and recovery - Support skipping phases via configuration - Write unit tests for each phase method ### Step 18: Add Configuration for New Features Add configuration options for worktrees, ports, and new workflow phases. - Update `python/src/agent_work_orders/config.py` - Add configuration: ```python # Worktree configuration WORKTREE_BASE_DIR: str = os.getenv("WORKTREE_BASE_DIR", "trees") # Port allocation BACKEND_PORT_RANGE_START: int = int(os.getenv("BACKEND_PORT_START", "9100")) BACKEND_PORT_RANGE_END: int = int(os.getenv("BACKEND_PORT_END", "9114")) FRONTEND_PORT_RANGE_START: int = int(os.getenv("FRONTEND_PORT_START", "9200")) FRONTEND_PORT_RANGE_END: int = int(os.getenv("FRONTEND_PORT_END", "9214")) # Test workflow MAX_TEST_RETRY_ATTEMPTS: int = int(os.getenv("MAX_TEST_RETRY_ATTEMPTS", "4")) ENABLE_TEST_PHASE: bool = os.getenv("ENABLE_TEST_PHASE", "true").lower() == "true" # Review workflow MAX_REVIEW_RETRY_ATTEMPTS: int = int(os.getenv("MAX_REVIEW_RETRY_ATTEMPTS", "3")) ENABLE_REVIEW_PHASE: bool = os.getenv("ENABLE_REVIEW_PHASE", "true").lower() == "true" ENABLE_SCREENSHOT_CAPTURE: bool = os.getenv("ENABLE_SCREENSHOT_CAPTURE", "true").lower() == "true" # State management STATE_STORAGE_TYPE: str = os.getenv("STATE_STORAGE_TYPE", "memory") # "memory" or "file" FILE_STATE_DIRECTORY: str = os.getenv("FILE_STATE_DIRECTORY", "agent-work-orders-state") ``` - Update `.env.example` with new configuration options - Document configuration in README ### Step 19: Create Documentation Document the new compositional architecture and workflows. - Create `docs/compositional-workflows.md`: - Architecture overview - Compositional design principles - Phase composition examples - Error handling and recovery - Configuration guide - Create `docs/worktree-management.md`: - Worktree vs temporary clone comparison - Parallelization capabilities - Port allocation system - Cleanup and maintenance - Create `docs/test-resolution.md`: - Test workflow overview - Retry logic explanation - Test resolution examples - Troubleshooting failed tests - Create `docs/review-resolution.md`: - Review workflow overview - Screenshot capture setup - Issue severity definitions - Blocker patching process - R2 upload configuration ### Step 20: Run Validation Commands Execute all validation commands to ensure the feature works correctly with zero regressions. - Run backend tests: `cd python && uv run pytest tests/agent_work_orders/ -v` - Run backend linting: `cd python && uv run ruff check src/agent_work_orders/` - Run type checking: `cd python && uv run mypy src/agent_work_orders/` - Test worktree creation manually: ```bash cd python python -c " from src.agent_work_orders.utils.worktree_operations import create_worktree from src.agent_work_orders.utils.structured_logger import get_logger logger = get_logger('test') path, err = create_worktree('test-wo-123', 'test-branch', logger) print(f'Path: {path}, Error: {err}') " ``` - Test port allocation: ```bash cd python python -c " from src.agent_work_orders.utils.port_allocation import get_ports_for_work_order backend, frontend = get_ports_for_work_order('test-wo-123') print(f'Backend: {backend}, Frontend: {frontend}') " ``` - Create test work order with new workflow: ```bash curl -X POST http://localhost:8181/agent-work-orders \ -H "Content-Type: application/json" \ -d '{ "repository_url": "https://github.com/your-test-repo", "sandbox_type": "git_worktree", "workflow_type": "agent_workflow_plan", "user_request": "Add a new feature with tests" }' ``` - Verify worktree created under `trees//` - Verify `.ports.env` created in worktree - Monitor workflow execution through all phases - Verify test phase runs and resolves failures - Verify review phase runs and patches blockers - Verify PR created successfully - Clean up test worktrees: `git worktree prune` ## Testing Strategy ### Unit Tests **Worktree Management**: - Test worktree creation with valid repository - Test worktree creation with invalid branch - Test worktree validation (three-way check) - Test worktree cleanup - Test handling of existing worktrees **Port Allocation**: - Test deterministic port assignment from work order ID - Test port availability checking - Test finding next available ports with collision - Test port range boundaries (9100-9114, 9200-9214) - Test `.ports.env` file generation **Test Workflow**: - Test parsing valid test result JSON - Test parsing malformed test result JSON - Test retry loop with all tests passing - Test retry loop with some tests failing then passing - Test retry loop reaching max attempts - Test individual test resolution **Review Workflow**: - Test parsing valid review result JSON - Test parsing malformed review result JSON - Test retry loop with no blocker issues - Test retry loop with blockers then resolved - Test retry loop reaching max attempts - Test issue severity filtering **State Management**: - Test saving state to JSON file - Test loading state from JSON file - Test updating specific state fields - Test handling missing state files - Test concurrent state access ### Integration Tests **End-to-End Workflow**: - Test complete workflow with worktree sandbox: classify → plan → implement → commit → test → review → PR - Test test phase with intentional test failure and resolution - Test review phase with intentional blocker issue and patching - Test parallel execution of multiple work orders with different ports - Test workflow resumption after failure - Test cleanup of worktrees after completion **Sandbox Integration**: - Test command execution in worktree context - Test git operations in worktree - Test branch creation in worktree - Test worktree isolation (parallel instances don't interfere) **State Persistence**: - Test state survives service restart (file-based) - Test state migration from memory to file - Test state corruption recovery ### Edge Cases **Worktree Edge Cases**: - Worktree already exists (should reuse or fail gracefully) - Git repository unreachable (should fail setup) - Insufficient disk space for worktree (should fail with clear error) - Worktree removal fails (should log error and continue) - Maximum worktrees reached (15 concurrent) - should queue or fail **Port Allocation Edge Cases**: - All ports in range occupied (should fail with error) - Port becomes occupied between allocation and use (should retry) - Invalid port range in configuration (should fail validation) **Test Workflow Edge Cases**: - Test command times out (should mark as failed) - Test command returns invalid JSON (should fail gracefully) - All tests fail and none can be resolved (should fail after max attempts) - Test resolution introduces new failures (should continue with retry loop) **Review Workflow Edge Cases**: - Review command crashes (should fail gracefully) - Screenshot capture fails (should continue review without screenshots) - Review finds only skippable issues (should pass) - Blocker patch introduces new blocker (should continue with retry loop) - Spec file not found (should fail with clear error) **State Management Edge Cases**: - State file corrupted (should fail with recovery suggestion) - State directory not writable (should fail with permission error) - Concurrent access to same state file (should handle with locking or fail safely) ## Acceptance Criteria - [ ] GitWorktreeSandbox successfully creates and manages worktrees under `trees//` - [ ] Port allocation deterministically assigns unique ports (backend: 9100-9114, frontend: 9200-9214) based on work order ID - [ ] Multiple work orders (at least 3) can run in parallel without port or filesystem conflicts - [ ] `.ports.env` file is created in each worktree with correct port configuration - [ ] Test workflow successfully runs test suite and returns structured JSON results - [ ] Test workflow automatically resolves failed tests up to 4 attempts - [ ] Test workflow stops retrying when all tests pass - [ ] Review workflow successfully reviews implementation against spec - [ ] Review workflow captures screenshots (when enabled) - [ ] Review workflow categorizes issues by severity (blocker/tech_debt/skippable) - [ ] Review workflow automatically patches blocker issues up to 3 attempts - [ ] Review workflow allows tech_debt and skippable issues to pass - [ ] WorkflowStep enum includes TEST, RESOLVE_TEST, REVIEW, RESOLVE_REVIEW steps - [ ] Workflow orchestrator executes all phases: planning → implementation → testing → review → deployment - [ ] File-based state repository persists state to JSON files - [ ] State survives service restarts when using file-based storage - [ ] Configuration supports enabling/disabling test and review phases - [ ] All existing tests pass with zero regressions - [ ] New unit tests achieve >80% code coverage for new modules - [ ] Integration tests verify end-to-end workflow with parallel execution - [ ] Documentation covers compositional architecture, worktrees, test resolution, and review resolution - [ ] Cleanup of worktrees works correctly (git worktree remove + prune) - [ ] Error messages are clear and actionable for all failure scenarios ## Validation Commands Execute every command to validate the feature works correctly with zero regressions. ### Backend Tests - `cd python && uv run pytest tests/agent_work_orders/ -v --tb=short` - Run all agent work orders tests - `cd python && uv run pytest tests/agent_work_orders/sandbox_manager/ -v` - Test sandbox management - `cd python && uv run pytest tests/agent_work_orders/workflow_engine/ -v` - Test workflow engine - `cd python && uv run pytest tests/agent_work_orders/utils/ -v` - Test utilities ### Code Quality - `cd python && uv run ruff check src/agent_work_orders/` - Check code quality - `cd python && uv run mypy src/agent_work_orders/` - Type checking ### Manual Worktree Testing ```bash # Test worktree creation cd python python -c " from src.agent_work_orders.utils.worktree_operations import create_worktree, validate_worktree, remove_worktree from src.agent_work_orders.utils.structured_logger import get_logger logger = get_logger('test') # Create worktree path, err = create_worktree('test-wo-123', 'test-branch', logger) print(f'Created worktree at: {path}') assert err is None, f'Error: {err}' # Validate worktree from src.agent_work_orders.state_manager.file_state_repository import FileStateRepository state_repo = FileStateRepository('test-state') state_data = {'worktree_path': path} valid, err = validate_worktree('test-wo-123', state_data) assert valid, f'Validation failed: {err}' # Remove worktree success, err = remove_worktree('test-wo-123', logger) assert success, f'Removal failed: {err}' print('Worktree lifecycle test passed!') " ``` ### Manual Port Allocation Testing ```bash cd python python -c " from src.agent_work_orders.utils.port_allocation import get_ports_for_work_order, find_next_available_ports, is_port_available backend, frontend = get_ports_for_work_order('test-wo-123') print(f'Ports for test-wo-123: Backend={backend}, Frontend={frontend}') assert 9100 <= backend <= 9114, f'Backend port out of range: {backend}' assert 9200 <= frontend <= 9214, f'Frontend port out of range: {frontend}' # Test availability check available = is_port_available(backend) print(f'Backend port {backend} available: {available}') # Test finding next available next_backend, next_frontend = find_next_available_ports('test-wo-456') print(f'Next available ports: Backend={next_backend}, Frontend={next_frontend}') print('Port allocation test passed!') " ``` ### Integration Testing ```bash # Start agent work orders service docker compose up -d archon-server # Create work order with worktree sandbox curl -X POST http://localhost:8181/agent-work-orders \ -H "Content-Type: application/json" \ -d '{ "repository_url": "https://github.com/coleam00/archon", "sandbox_type": "git_worktree", "workflow_type": "agent_workflow_plan", "user_request": "Fix issue #123" }' # Verify worktree created ls -la trees/ # Monitor workflow progress watch -n 2 'curl -s http://localhost:8181/agent-work-orders | jq' # Verify .ports.env in worktree cat trees//.ports.env # After completion, verify cleanup git worktree list ``` ### Parallel Execution Testing ```bash # Create 3 work orders simultaneously for i in 1 2 3; do curl -X POST http://localhost:8181/agent-work-orders \ -H "Content-Type: application/json" \ -d "{ \"repository_url\": \"https://github.com/coleam00/archon\", \"sandbox_type\": \"git_worktree\", \"workflow_type\": \"agent_workflow_plan\", \"user_request\": \"Parallel test $i\" }" & done wait # Verify all worktrees exist ls -la trees/ # Verify different ports allocated for dir in trees/*/; do echo "Worktree: $dir" cat "$dir/.ports.env" echo "---" done ``` ## Notes ### Architecture Decision: Compositional vs Centralized This feature implements Option B (compositional refactoring) because: 1. **Scalability**: Compositional design enables running individual phases (e.g., just test or just review) without full workflow 2. **Debugging**: Independent scripts are easier to test and debug in isolation 3. **Flexibility**: Users can compose custom workflows (e.g., skip review for simple PRs) 4. **Maintainability**: Smaller, focused modules are easier to maintain than monolithic orchestrator 5. **Parallelization**: Worktree-based approach inherently supports compositional execution ### Performance Considerations - **Worktree Creation**: Worktrees are faster than clones (~2-3x) because they share the same .git directory - **Port Allocation**: Hash-based allocation is deterministic but may have collisions; fallback to linear search adds minimal overhead - **Retry Loops**: Test (4 attempts) and review (3 attempts) retry limits prevent infinite loops while allowing reasonable resolution attempts - **State I/O**: File-based state adds disk I/O but enables persistence; consider eventual move to database for high-volume deployments ### Future Enhancements 1. **Database State**: Replace file-based state with PostgreSQL/Supabase for better concurrent access and querying 2. **WebSocket Updates**: Stream test/review progress to UI in real-time 3. **Screenshot Upload**: Integrate R2/S3 for screenshot storage and PR comments with images 4. **Workflow Resumption**: Support resuming failed workflows from last successful step 5. **Custom Workflows**: Allow users to define custom workflow compositions via config 6. **Metrics**: Add OpenTelemetry instrumentation for workflow performance monitoring 7. **E2E Testing**: Add Playwright/Cypress integration for UI-focused review 8. **Distributed Execution**: Support running work orders across multiple machines ### Migration Path For existing deployments: 1. **Backward Compatibility**: Keep GitBranchSandbox working alongside GitWorktreeSandbox 2. **Gradual Migration**: Default to GIT_BRANCH, opt-in to GIT_WORKTREE via configuration 3. **State Migration**: Provide utility to migrate in-memory state to file-based state 4. **Cleanup**: Add command to clean up old temporary clones: `rm -rf /tmp/agent-work-orders/*` ### Dependencies New dependencies to add via `uv add`: - (None required - uses existing git, pytest, claude CLI) ### Related Issues/PRs - #XXX - Original agent-work-orders MVP implementation - #XXX - Worktree isolation discussion - #XXX - Test phase feature request - #XXX - Review automation proposal