Files
archon/PRPs/specs/compositional-workflow-architecture.md
Rasmus Widing 1c0020946b feat: Implement phases 3-5 of compositional workflow architecture
Completes the implementation of test/review workflows with automatic resolution
and integrates them into the orchestrator.

**Phase 3: Test Workflow with Resolution**
- Created test_workflow.py with automatic test failure resolution
- Implements retry loop with max 4 attempts (configurable via MAX_TEST_RETRY_ATTEMPTS)
- Parses JSON test results and resolves failures one by one
- Uses existing test.md and resolve_failed_test.md commands
- Added run_tests() and resolve_test_failure() to workflow_operations.py

**Phase 4: Review Workflow with Resolution**
- Created review_workflow.py with automatic blocker issue resolution
- Implements retry loop with max 3 attempts (configurable via MAX_REVIEW_RETRY_ATTEMPTS)
- Categorizes issues by severity (blocker/tech_debt/skippable)
- Only blocks on blocker issues - tech_debt and skippable allowed to pass
- Created review_runner.md and resolve_failed_review.md commands
- Added run_review() and resolve_review_issue() to workflow_operations.py
- Supports screenshot capture for UI review (configurable via ENABLE_SCREENSHOT_CAPTURE)

**Phase 5: Compositional Integration**
- Updated workflow_orchestrator.py to integrate test and review phases
- Test phase runs between commit and PR creation (if ENABLE_TEST_PHASE=true)
- Review phase runs after tests (if ENABLE_REVIEW_PHASE=true)
- Both phases are optional and controlled by config flags
- Step history tracks test and review execution results
- Proper error handling and logging for all phases

**Supporting Changes**
- Updated agent_names.py to add REVIEWER constant
- Added configuration flags to config.py for test/review phases
- All new code follows structured logging patterns
- Maintains compatibility with existing workflow steps

**Files Changed**: 19 files, 3035+ lines
- New: test_workflow.py, review_workflow.py, review commands
- Modified: orchestrator, workflow_operations, agent_names, config
- Phases 1-2 files (worktree, state, port allocation) also staged

The implementation is complete and ready for testing. All phases now support
parallel execution via worktree isolation with deterministic port allocation.
2025-10-16 19:18:03 +03:00

38 KiB

Feature: Compositional Workflow Architecture with Worktree Isolation, Test Resolution, and Review Resolution

Feature Description

Transform the agent-work-orders system from a centralized orchestrator pattern to a compositional script-based architecture that enables parallel execution through git worktrees, automatic test failure resolution with retry logic, and comprehensive review phase with blocker issue patching. This architecture change enables running 15+ work orders simultaneously in isolated worktrees with deterministic port allocation, while maintaining complete SDLC coverage from planning through testing and review.

The system will support:

  • Worktree-based isolation: Each work order runs in its own git worktree under trees/<work_order_id>/ instead of temporary clones
  • Port allocation: Deterministic backend (9100-9114) and frontend (9200-9214) port assignment based on work order ID
  • Test phase with resolution: Automatic retry loop (max 4 attempts) that resolves failed tests using AI-powered fixes
  • Review phase with resolution: Captures screenshots, compares implementation vs spec, categorizes issues (blocker/tech_debt/skippable), and automatically patches blocker issues (max 3 attempts)
  • File-based state: Simple JSON state management (adw_state.json) instead of in-memory repository
  • Compositional scripts: Independent workflow scripts (plan, build, test, review, doc, ship) that can be run separately or together

User Story

As a developer managing multiple concurrent features I want to run multiple agent work orders in parallel with isolated environments So that I can scale development velocity without conflicts or resource contention, while ensuring all code passes tests and review before deployment

Problem Statement

The current agent-work-orders architecture has several critical limitations:

  1. No Parallelization: GitBranchSandbox creates temporary clones that get cleaned up, preventing safe parallel execution of multiple work orders
  2. No Test Coverage: Missing test workflow step - implementations are committed and PR'd without validation
  3. No Automated Test Resolution: When tests fail, there's no retry/fix mechanism to automatically resolve failures
  4. No Review Phase: No automated review of implementation against specifications with screenshot capture and blocker detection
  5. Centralized Orchestration: Monolithic orchestrator makes it difficult to run individual phases (e.g., just test, just review) independently
  6. In-Memory State: State management in WorkOrderRepository is not persistent across service restarts
  7. No Port Management: No system for allocating unique ports for parallel instances

These limitations prevent scaling development workflows and ensuring code quality before PRs are created.

Solution Statement

Implement a compositional workflow architecture inspired by the ADW (AI Developer Workflow) pattern with the following components: SEE EXAMPLES HERE: PRPs/examples/* READ THESE

  1. GitWorktreeSandbox: Replace GitBranchSandbox with worktree-based isolation that shares the same repo but has independent working directories
  2. Port Allocation System: Deterministic port assignment (backend: 9100-9114, frontend: 9200-9214) based on work order ID hash
  3. File-Based State Management: JSON state files for persistence and debugging
  4. Test Workflow Module: New test_workflow.py with automatic resolution and retry logic (4 attempts)
  5. Review Workflow Module: New review_workflow.py with screenshot capture, spec comparison, and blocker patching (3 attempts)
  6. Compositional Scripts: Independent workflow operations that can be composed or run individually
  7. Enhanced WorkflowStep Enum: Add TEST, RESOLVE_TEST, REVIEW, RESOLVE_REVIEW steps
  8. Resolution Commands: New Claude commands /resolve_failed_test and /resolve_failed_review for AI-powered fixes

Relevant Files

Core Workflow Files

  • python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py - Main orchestrator that needs refactoring for compositional approach

    • Currently: Monolithic execute_workflow with sequential steps
    • Needs: Modular workflow composition with test/review phases
  • python/src/agent_work_orders/workflow_engine/workflow_operations.py - Atomic workflow operations

    • Currently: classify_issue, build_plan, implement_plan, create_commit, create_pull_request
    • Needs: Add test_workflow, review_workflow, resolve_test, resolve_review operations
  • python/src/agent_work_orders/models.py - Data models including WorkflowStep enum

    • Currently: WorkflowStep has CLASSIFY, PLAN, IMPLEMENT, COMMIT, REVIEW, TEST, CREATE_PR
    • Needs: Add RESOLVE_TEST, RESOLVE_REVIEW steps

Sandbox Management Files

  • python/src/agent_work_orders/sandbox_manager/git_branch_sandbox.py - Current temp clone implementation

    • Problem: Creates temp dirs, no parallelization support
    • Will be replaced by: GitWorktreeSandbox
  • python/src/agent_work_orders/sandbox_manager/sandbox_factory.py - Factory for creating sandboxes

    • Needs: Add GitWorktreeSandbox creation logic
  • python/src/agent_work_orders/sandbox_manager/sandbox_protocol.py - Sandbox interface

    • May need: Port allocation methods

State Management Files

  • python/src/agent_work_orders/state_manager/work_order_repository.py - Current in-memory state

    • Currently: In-memory dictionary with async methods
    • Needs: File-based JSON persistence option
  • python/src/agent_work_orders/config.py - Configuration

    • Needs: Port range configuration, worktree base directory

Command Files

  • python/.claude/commands/agent-work-orders/test.md - Currently just a hello world test

    • Needs: Comprehensive test suite runner that returns JSON with failed tests
  • python/.claude/commands/agent-work-orders/implementor.md - Implementation command

    • May need: Context about test requirements

New Files

Worktree Management

  • python/src/agent_work_orders/sandbox_manager/git_worktree_sandbox.py - New worktree-based sandbox
  • python/src/agent_work_orders/utils/worktree_operations.py - Worktree CRUD operations
  • python/src/agent_work_orders/utils/port_allocation.py - Port management utilities

Test Workflow

  • python/src/agent_work_orders/workflow_engine/test_workflow.py - Test execution with resolution
  • python/.claude/commands/agent-work-orders/test_runner.md - Run test suite, return JSON
  • python/.claude/commands/agent-work-orders/resolve_failed_test.md - Fix failed test given JSON

Review Workflow

  • python/src/agent_work_orders/workflow_engine/review_workflow.py - Review with screenshot capture
  • python/.claude/commands/agent-work-orders/review_runner.md - Run review against spec
  • python/.claude/commands/agent-work-orders/resolve_failed_review.md - Patch blocker issues
  • python/.claude/commands/agent-work-orders/create_patch_plan.md - Generate patch plan for issue

State Management

  • python/src/agent_work_orders/state_manager/file_state_repository.py - JSON file-based state
  • python/src/agent_work_orders/models/workflow_state.py - State data models

Documentation

  • docs/compositional-workflows.md - Architecture documentation
  • docs/worktree-management.md - Worktree operations guide
  • docs/test-resolution.md - Test workflow documentation
  • docs/review-resolution.md - Review workflow documentation

Implementation Plan

Phase 1: Foundation - Worktree Isolation and Port Allocation

Establish the core infrastructure for parallel execution through git worktrees and deterministic port allocation. This phase creates the foundation for all subsequent phases.

Key Deliverables:

  • GitWorktreeSandbox implementation
  • Port allocation system
  • Worktree management utilities
  • .ports.env file generation
  • Updated sandbox factory

Phase 2: File-Based State Management

Replace in-memory state repository with file-based JSON persistence for durability and debuggability across service restarts.

Key Deliverables:

  • FileStateRepository implementation
  • WorkflowState models
  • State migration utilities
  • JSON serialization/deserialization
  • Backward compatibility layer

Phase 3: Test Workflow with Resolution

Implement comprehensive test execution with automatic failure resolution and retry logic.

Key Deliverables:

  • test_workflow.py module
  • test_runner.md command (returns JSON array of test results)
  • resolve_failed_test.md command (takes test JSON, fixes issue)
  • Retry loop (max 4 attempts)
  • Test result parsing and formatting
  • Integration with orchestrator

Phase 4: Review Workflow with Resolution

Add review phase with screenshot capture, spec comparison, and automatic blocker patching.

Key Deliverables:

  • review_workflow.py module
  • review_runner.md command (compares implementation vs spec)
  • resolve_failed_review.md command (patches blocker issues)
  • Screenshot capture integration
  • Issue severity categorization (blocker/tech_debt/skippable)
  • Retry loop (max 3 attempts)
  • R2 upload integration (optional)

Phase 5: Compositional Refactoring

Refactor the centralized orchestrator into composable workflow scripts that can be run independently.

Key Deliverables:

  • Modular workflow composition
  • Independent script execution
  • Workflow step dependencies
  • Enhanced error handling
  • Workflow resumption support

Step by Step Tasks

Step 1: Create Worktree Sandbox Implementation

Create the core GitWorktreeSandbox class that manages git worktrees for isolated execution.

  • Create python/src/agent_work_orders/sandbox_manager/git_worktree_sandbox.py
  • Implement GitWorktreeSandbox class with:
    • __init__(repository_url, sandbox_identifier) - Initialize with worktree path calculation
    • setup() - Create worktree under trees/<sandbox_identifier>/ from origin/main
    • cleanup() - Remove worktree using git worktree remove
    • execute_command(command, timeout) - Execute commands in worktree context
    • get_git_branch_name() - Query current branch in worktree
  • Handle existing worktree detection and validation
  • Add logging for all worktree operations
  • Write unit tests for GitWorktreeSandbox in python/tests/agent_work_orders/sandbox_manager/test_git_worktree_sandbox.py

Step 2: Implement Port Allocation System

Create deterministic port allocation based on work order ID to enable parallel instances.

  • Create python/src/agent_work_orders/utils/port_allocation.py
  • Implement functions:
    • get_ports_for_work_order(work_order_id) -> Tuple[int, int] - Calculate ports from ID hash (backend: 9100-9114, frontend: 9200-9214)
    • is_port_available(port: int) -> bool - Check if port is bindable
    • find_next_available_ports(work_order_id, max_attempts=15) -> Tuple[int, int] - Find available ports with offset
    • create_ports_env_file(worktree_path, backend_port, frontend_port) - Generate .ports.env file
  • Add port range configuration to python/src/agent_work_orders/config.py
  • Write unit tests for port allocation in python/tests/agent_work_orders/utils/test_port_allocation.py

Step 3: Create Worktree Management Utilities

Build helper utilities for worktree CRUD operations.

  • Create python/src/agent_work_orders/utils/worktree_operations.py
  • Implement functions:
    • create_worktree(work_order_id, branch_name, logger) -> Tuple[str, Optional[str]] - Create worktree and return path or error
    • validate_worktree(work_order_id, state) -> Tuple[bool, Optional[str]] - Three-way validation (state, filesystem, git)
    • get_worktree_path(work_order_id) -> str - Calculate absolute worktree path
    • remove_worktree(work_order_id, logger) -> Tuple[bool, Optional[str]] - Clean up worktree
    • setup_worktree_environment(worktree_path, backend_port, frontend_port, logger) - Create .ports.env
  • Handle git fetch operations before worktree creation
  • Add comprehensive error handling and logging
  • Write unit tests for worktree operations in python/tests/agent_work_orders/utils/test_worktree_operations.py

Step 4: Update Sandbox Factory

Modify the sandbox factory to support creating GitWorktreeSandbox instances.

  • Update python/src/agent_work_orders/sandbox_manager/sandbox_factory.py
  • Add GIT_WORKTREE case to create_sandbox() method
  • Integrate port allocation during sandbox creation
  • Pass port configuration to GitWorktreeSandbox
  • Update SandboxType enum in models.py to promote GIT_WORKTREE from placeholder
  • Write integration tests for sandbox factory with worktrees

Step 5: Implement File-Based State Repository

Create file-based state management for persistence and debugging.

  • Create python/src/agent_work_orders/state_manager/file_state_repository.py
  • Implement FileStateRepository class:
    • __init__(state_directory: str) - Initialize with state directory path
    • save_state(work_order_id, state_data) - Write JSON to <state_dir>/<work_order_id>.json
    • load_state(work_order_id) -> Optional[dict] - Read JSON from file
    • list_states() -> List[str] - List all work order IDs with state files
    • delete_state(work_order_id) - Remove state file
    • update_status(work_order_id, status, **kwargs) - Update specific fields
    • save_step_history(work_order_id, step_history) - Persist step history
  • Add state directory configuration to config.py
  • Create state models in python/src/agent_work_orders/models/workflow_state.py
  • Write unit tests for file state repository

Step 6: Update WorkflowStep Enum

Add new workflow steps for test and review resolution.

  • Update python/src/agent_work_orders/models.py
  • Add to WorkflowStep enum:
    • RESOLVE_TEST = "resolve_test" - Test failure resolution step
    • RESOLVE_REVIEW = "resolve_review" - Review issue resolution step
  • Update StepHistory.get_current_step() to include new steps in sequence:
    • Updated sequence: CLASSIFY → PLAN → FIND_PLAN → GENERATE_BRANCH → IMPLEMENT → COMMIT → TEST → RESOLVE_TEST (if needed) → REVIEW → RESOLVE_REVIEW (if needed) → CREATE_PR
  • Write unit tests for updated step sequence logic

Step 7: Create Test Runner Command

Build Claude command to execute test suite and return structured JSON results.

  • Update python/.claude/commands/agent-work-orders/test_runner.md
  • Command should:
    • Execute backend tests: cd python && uv run pytest tests/ -v --tb=short
    • Execute frontend tests: cd archon-ui-main && npm test
    • Parse test results from output
    • Return JSON array with structure:
      [
        {
          "test_name": "string",
          "test_file": "string",
          "passed": boolean,
          "error": "optional string",
          "execution_command": "string"
        }
      ]
      
    • Include test purpose and reproduction command
    • Sort failed tests first
    • Handle timeout and command errors gracefully
  • Test the command manually with sample repositories

Step 8: Create Resolve Failed Test Command

Build Claude command to analyze and fix failed tests given test JSON.

  • Create python/.claude/commands/agent-work-orders/resolve_failed_test.md
  • Command takes single argument: test result JSON object
  • Command should:
    • Parse test failure information
    • Analyze root cause of failure
    • Read relevant test file and code under test
    • Implement fix (code change or test update)
    • Re-run the specific failed test to verify fix
    • Report success/failure
  • Include examples of common test failure patterns
  • Add constraints (don't skip tests, maintain test coverage)
  • Test the command with sample failed test JSONs

Step 9: Implement Test Workflow Module

Create the test workflow module with automatic resolution and retry logic.

  • Create python/src/agent_work_orders/workflow_engine/test_workflow.py
  • Implement functions:
    • run_tests(executor, command_loader, work_order_id, working_dir) -> StepExecutionResult - Execute test suite
    • parse_test_results(output, logger) -> Tuple[List[TestResult], int, int] - Parse JSON output
    • resolve_failed_test(executor, command_loader, test_json, work_order_id, working_dir) -> StepExecutionResult - Fix single test
    • run_tests_with_resolution(executor, command_loader, work_order_id, working_dir, max_attempts=4) -> Tuple[List[TestResult], int, int] - Main retry loop
  • Implement retry logic:
    • Run tests, check for failures
    • If failures exist and attempts < max_attempts: resolve each failed test
    • Re-run tests after resolution
    • Stop if all tests pass or max attempts reached
  • Add TestResult model to models.py
  • Write comprehensive unit tests for test workflow

Step 10: Add Test Workflow Operation

Create atomic operation for test execution in workflow_operations.py.

  • Update python/src/agent_work_orders/workflow_engine/workflow_operations.py
  • Add function:
    async def execute_tests(
        executor: AgentCLIExecutor,
        command_loader: ClaudeCommandLoader,
        work_order_id: str,
        working_dir: str,
    ) -> StepExecutionResult
    
  • Function should:
    • Call run_tests_with_resolution() from test_workflow.py
    • Return StepExecutionResult with test summary
    • Include pass/fail counts in output
    • Log detailed test results
  • Add TESTER constant to agent_names.py
  • Write unit tests for execute_tests operation

Step 11: Integrate Test Phase in Orchestrator

Add test phase to workflow orchestrator between COMMIT and CREATE_PR steps.

  • Update python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py

  • After commit step (line ~236), add:

    # Step 7: Run tests with resolution
    test_result = await workflow_operations.execute_tests(
        self.agent_executor,
        self.command_loader,
        agent_work_order_id,
        sandbox.working_dir,
    )
    step_history.steps.append(test_result)
    await self.state_repository.save_step_history(agent_work_order_id, step_history)
    
    if not test_result.success:
        raise WorkflowExecutionError(f"Tests failed: {test_result.error_message}")
    
    bound_logger.info("step_completed", step="test")
    
  • Update step numbering (PR creation becomes step 8)

  • Add test failure handling strategy

  • Write integration tests for full workflow with test phase

Step 12: Create Review Runner Command

Build Claude command to review implementation against spec with screenshot capture.

  • Create python/.claude/commands/agent-work-orders/review_runner.md
  • Command takes arguments: spec_file_path, work_order_id
  • Command should:
    • Read specification from spec_file_path
    • Analyze implementation in codebase
    • Start application (if UI component)
    • Capture screenshots of key UI flows
    • Compare implementation against spec requirements
    • Categorize issues by severity: "blocker" | "tech_debt" | "skippable"
    • Return JSON with structure:
      {
        "review_passed": boolean,
        "review_issues": [
          {
            "issue_title": "string",
            "issue_description": "string",
            "issue_severity": "blocker|tech_debt|skippable",
            "affected_files": ["string"],
            "screenshots": ["string"]
          }
        ],
        "screenshots": ["string"]
      }
      
  • Include review criteria and severity definitions
  • Test command with sample specifications

Step 13: Create Resolve Failed Review Command

Build Claude command to patch blocker issues from review.

  • Create python/.claude/commands/agent-work-orders/resolve_failed_review.md
  • Command takes single argument: review issue JSON object
  • Command should:
    • Parse review issue details
    • Create patch plan addressing the issue
    • Implement the patch (code changes)
    • Verify patch resolves the issue
    • Report success/failure
  • Include constraints (only fix blocker issues, maintain functionality)
  • Add examples of common review issue patterns
  • Test command with sample review issues

Step 14: Implement Review Workflow Module

Create the review workflow module with automatic blocker patching.

  • Create python/src/agent_work_orders/workflow_engine/review_workflow.py
  • Implement functions:
    • run_review(executor, command_loader, spec_file, work_order_id, working_dir) -> ReviewResult - Execute review
    • parse_review_results(output, logger) -> ReviewResult - Parse JSON output
    • resolve_review_issue(executor, command_loader, issue_json, work_order_id, working_dir) -> StepExecutionResult - Patch single issue
    • run_review_with_resolution(executor, command_loader, spec_file, work_order_id, working_dir, max_attempts=3) -> ReviewResult - Main retry loop
  • Implement retry logic:
    • Run review, check for blocker issues
    • If blockers exist and attempts < max_attempts: resolve each blocker
    • Re-run review after patching
    • Stop if no blockers or max attempts reached
    • Allow tech_debt and skippable issues to pass
  • Add ReviewResult and ReviewIssue models to models.py
  • Write comprehensive unit tests for review workflow

Step 15: Add Review Workflow Operation

Create atomic operation for review execution in workflow_operations.py.

  • Update python/src/agent_work_orders/workflow_engine/workflow_operations.py
  • Add function:
    async def execute_review(
        executor: AgentCLIExecutor,
        command_loader: ClaudeCommandLoader,
        spec_file: str,
        work_order_id: str,
        working_dir: str,
    ) -> StepExecutionResult
    
  • Function should:
    • Call run_review_with_resolution() from review_workflow.py
    • Return StepExecutionResult with review summary
    • Include blocker count in output
    • Log detailed review results
  • Add REVIEWER constant to agent_names.py
  • Write unit tests for execute_review operation

Step 16: Integrate Review Phase in Orchestrator

Add review phase to workflow orchestrator between TEST and CREATE_PR steps.

  • Update python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py

  • After test step, add:

    # Step 8: Run review with resolution
    review_result = await workflow_operations.execute_review(
        self.agent_executor,
        self.command_loader,
        plan_file or "",
        agent_work_order_id,
        sandbox.working_dir,
    )
    step_history.steps.append(review_result)
    await self.state_repository.save_step_history(agent_work_order_id, step_history)
    
    if not review_result.success:
        raise WorkflowExecutionError(f"Review failed: {review_result.error_message}")
    
    bound_logger.info("step_completed", step="review")
    
  • Update step numbering (PR creation becomes step 9)

  • Add review failure handling strategy

  • Write integration tests for full workflow with review phase

Step 17: Refactor Orchestrator for Composition

Refactor workflow orchestrator to support modular composition.

  • Update python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py
  • Extract workflow phases into separate methods:
    • _execute_planning_phase() - classify → plan → find_plan → generate_branch
    • _execute_implementation_phase() - implement → commit
    • _execute_testing_phase() - test → resolve_test (if needed)
    • _execute_review_phase() - review → resolve_review (if needed)
    • _execute_deployment_phase() - create_pr
  • Update execute_workflow() to compose phases:
    await self._execute_planning_phase(...)
    await self._execute_implementation_phase(...)
    await self._execute_testing_phase(...)
    await self._execute_review_phase(...)
    await self._execute_deployment_phase(...)
    
  • Add phase-level error handling and recovery
  • Support skipping phases via configuration
  • Write unit tests for each phase method

Step 18: Add Configuration for New Features

Add configuration options for worktrees, ports, and new workflow phases.

  • Update python/src/agent_work_orders/config.py

  • Add configuration:

    # Worktree configuration
    WORKTREE_BASE_DIR: str = os.getenv("WORKTREE_BASE_DIR", "trees")
    
    # Port allocation
    BACKEND_PORT_RANGE_START: int = int(os.getenv("BACKEND_PORT_START", "9100"))
    BACKEND_PORT_RANGE_END: int = int(os.getenv("BACKEND_PORT_END", "9114"))
    FRONTEND_PORT_RANGE_START: int = int(os.getenv("FRONTEND_PORT_START", "9200"))
    FRONTEND_PORT_RANGE_END: int = int(os.getenv("FRONTEND_PORT_END", "9214"))
    
    # Test workflow
    MAX_TEST_RETRY_ATTEMPTS: int = int(os.getenv("MAX_TEST_RETRY_ATTEMPTS", "4"))
    ENABLE_TEST_PHASE: bool = os.getenv("ENABLE_TEST_PHASE", "true").lower() == "true"
    
    # Review workflow
    MAX_REVIEW_RETRY_ATTEMPTS: int = int(os.getenv("MAX_REVIEW_RETRY_ATTEMPTS", "3"))
    ENABLE_REVIEW_PHASE: bool = os.getenv("ENABLE_REVIEW_PHASE", "true").lower() == "true"
    ENABLE_SCREENSHOT_CAPTURE: bool = os.getenv("ENABLE_SCREENSHOT_CAPTURE", "true").lower() == "true"
    
    # State management
    STATE_STORAGE_TYPE: str = os.getenv("STATE_STORAGE_TYPE", "memory")  # "memory" or "file"
    FILE_STATE_DIRECTORY: str = os.getenv("FILE_STATE_DIRECTORY", "agent-work-orders-state")
    
  • Update .env.example with new configuration options

  • Document configuration in README

Step 19: Create Documentation

Document the new compositional architecture and workflows.

  • Create docs/compositional-workflows.md:

    • Architecture overview
    • Compositional design principles
    • Phase composition examples
    • Error handling and recovery
    • Configuration guide
  • Create docs/worktree-management.md:

    • Worktree vs temporary clone comparison
    • Parallelization capabilities
    • Port allocation system
    • Cleanup and maintenance
  • Create docs/test-resolution.md:

    • Test workflow overview
    • Retry logic explanation
    • Test resolution examples
    • Troubleshooting failed tests
  • Create docs/review-resolution.md:

    • Review workflow overview
    • Screenshot capture setup
    • Issue severity definitions
    • Blocker patching process
    • R2 upload configuration

Step 20: Run Validation Commands

Execute all validation commands to ensure the feature works correctly with zero regressions.

  • Run backend tests: cd python && uv run pytest tests/agent_work_orders/ -v
  • Run backend linting: cd python && uv run ruff check src/agent_work_orders/
  • Run type checking: cd python && uv run mypy src/agent_work_orders/
  • Test worktree creation manually:
    cd python
    python -c "
    from src.agent_work_orders.utils.worktree_operations import create_worktree
    from src.agent_work_orders.utils.structured_logger import get_logger
    logger = get_logger('test')
    path, err = create_worktree('test-wo-123', 'test-branch', logger)
    print(f'Path: {path}, Error: {err}')
    "
    
  • Test port allocation:
    cd python
    python -c "
    from src.agent_work_orders.utils.port_allocation import get_ports_for_work_order
    backend, frontend = get_ports_for_work_order('test-wo-123')
    print(f'Backend: {backend}, Frontend: {frontend}')
    "
    
  • Create test work order with new workflow:
    curl -X POST http://localhost:8181/agent-work-orders \
      -H "Content-Type: application/json" \
      -d '{
        "repository_url": "https://github.com/your-test-repo",
        "sandbox_type": "git_worktree",
        "workflow_type": "agent_workflow_plan",
        "user_request": "Add a new feature with tests"
      }'
    
  • Verify worktree created under trees/<work_order_id>/
  • Verify .ports.env created in worktree
  • Monitor workflow execution through all phases
  • Verify test phase runs and resolves failures
  • Verify review phase runs and patches blockers
  • Verify PR created successfully
  • Clean up test worktrees: git worktree prune

Testing Strategy

Unit Tests

Worktree Management:

  • Test worktree creation with valid repository
  • Test worktree creation with invalid branch
  • Test worktree validation (three-way check)
  • Test worktree cleanup
  • Test handling of existing worktrees

Port Allocation:

  • Test deterministic port assignment from work order ID
  • Test port availability checking
  • Test finding next available ports with collision
  • Test port range boundaries (9100-9114, 9200-9214)
  • Test .ports.env file generation

Test Workflow:

  • Test parsing valid test result JSON
  • Test parsing malformed test result JSON
  • Test retry loop with all tests passing
  • Test retry loop with some tests failing then passing
  • Test retry loop reaching max attempts
  • Test individual test resolution

Review Workflow:

  • Test parsing valid review result JSON
  • Test parsing malformed review result JSON
  • Test retry loop with no blocker issues
  • Test retry loop with blockers then resolved
  • Test retry loop reaching max attempts
  • Test issue severity filtering

State Management:

  • Test saving state to JSON file
  • Test loading state from JSON file
  • Test updating specific state fields
  • Test handling missing state files
  • Test concurrent state access

Integration Tests

End-to-End Workflow:

  • Test complete workflow with worktree sandbox: classify → plan → implement → commit → test → review → PR
  • Test test phase with intentional test failure and resolution
  • Test review phase with intentional blocker issue and patching
  • Test parallel execution of multiple work orders with different ports
  • Test workflow resumption after failure
  • Test cleanup of worktrees after completion

Sandbox Integration:

  • Test command execution in worktree context
  • Test git operations in worktree
  • Test branch creation in worktree
  • Test worktree isolation (parallel instances don't interfere)

State Persistence:

  • Test state survives service restart (file-based)
  • Test state migration from memory to file
  • Test state corruption recovery

Edge Cases

Worktree Edge Cases:

  • Worktree already exists (should reuse or fail gracefully)
  • Git repository unreachable (should fail setup)
  • Insufficient disk space for worktree (should fail with clear error)
  • Worktree removal fails (should log error and continue)
  • Maximum worktrees reached (15 concurrent) - should queue or fail

Port Allocation Edge Cases:

  • All ports in range occupied (should fail with error)
  • Port becomes occupied between allocation and use (should retry)
  • Invalid port range in configuration (should fail validation)

Test Workflow Edge Cases:

  • Test command times out (should mark as failed)
  • Test command returns invalid JSON (should fail gracefully)
  • All tests fail and none can be resolved (should fail after max attempts)
  • Test resolution introduces new failures (should continue with retry loop)

Review Workflow Edge Cases:

  • Review command crashes (should fail gracefully)
  • Screenshot capture fails (should continue review without screenshots)
  • Review finds only skippable issues (should pass)
  • Blocker patch introduces new blocker (should continue with retry loop)
  • Spec file not found (should fail with clear error)

State Management Edge Cases:

  • State file corrupted (should fail with recovery suggestion)
  • State directory not writable (should fail with permission error)
  • Concurrent access to same state file (should handle with locking or fail safely)

Acceptance Criteria

  • GitWorktreeSandbox successfully creates and manages worktrees under trees/<work_order_id>/
  • Port allocation deterministically assigns unique ports (backend: 9100-9114, frontend: 9200-9214) based on work order ID
  • Multiple work orders (at least 3) can run in parallel without port or filesystem conflicts
  • .ports.env file is created in each worktree with correct port configuration
  • Test workflow successfully runs test suite and returns structured JSON results
  • Test workflow automatically resolves failed tests up to 4 attempts
  • Test workflow stops retrying when all tests pass
  • Review workflow successfully reviews implementation against spec
  • Review workflow captures screenshots (when enabled)
  • Review workflow categorizes issues by severity (blocker/tech_debt/skippable)
  • Review workflow automatically patches blocker issues up to 3 attempts
  • Review workflow allows tech_debt and skippable issues to pass
  • WorkflowStep enum includes TEST, RESOLVE_TEST, REVIEW, RESOLVE_REVIEW steps
  • Workflow orchestrator executes all phases: planning → implementation → testing → review → deployment
  • File-based state repository persists state to JSON files
  • State survives service restarts when using file-based storage
  • Configuration supports enabling/disabling test and review phases
  • All existing tests pass with zero regressions
  • New unit tests achieve >80% code coverage for new modules
  • Integration tests verify end-to-end workflow with parallel execution
  • Documentation covers compositional architecture, worktrees, test resolution, and review resolution
  • Cleanup of worktrees works correctly (git worktree remove + prune)
  • Error messages are clear and actionable for all failure scenarios

Validation Commands

Execute every command to validate the feature works correctly with zero regressions.

Backend Tests

  • cd python && uv run pytest tests/agent_work_orders/ -v --tb=short - Run all agent work orders tests
  • cd python && uv run pytest tests/agent_work_orders/sandbox_manager/ -v - Test sandbox management
  • cd python && uv run pytest tests/agent_work_orders/workflow_engine/ -v - Test workflow engine
  • cd python && uv run pytest tests/agent_work_orders/utils/ -v - Test utilities

Code Quality

  • cd python && uv run ruff check src/agent_work_orders/ - Check code quality
  • cd python && uv run mypy src/agent_work_orders/ - Type checking

Manual Worktree Testing

# Test worktree creation
cd python
python -c "
from src.agent_work_orders.utils.worktree_operations import create_worktree, validate_worktree, remove_worktree
from src.agent_work_orders.utils.structured_logger import get_logger
logger = get_logger('test')

# Create worktree
path, err = create_worktree('test-wo-123', 'test-branch', logger)
print(f'Created worktree at: {path}')
assert err is None, f'Error: {err}'

# Validate worktree
from src.agent_work_orders.state_manager.file_state_repository import FileStateRepository
state_repo = FileStateRepository('test-state')
state_data = {'worktree_path': path}
valid, err = validate_worktree('test-wo-123', state_data)
assert valid, f'Validation failed: {err}'

# Remove worktree
success, err = remove_worktree('test-wo-123', logger)
assert success, f'Removal failed: {err}'
print('Worktree lifecycle test passed!')
"

Manual Port Allocation Testing

cd python
python -c "
from src.agent_work_orders.utils.port_allocation import get_ports_for_work_order, find_next_available_ports, is_port_available
backend, frontend = get_ports_for_work_order('test-wo-123')
print(f'Ports for test-wo-123: Backend={backend}, Frontend={frontend}')
assert 9100 <= backend <= 9114, f'Backend port out of range: {backend}'
assert 9200 <= frontend <= 9214, f'Frontend port out of range: {frontend}'

# Test availability check
available = is_port_available(backend)
print(f'Backend port {backend} available: {available}')

# Test finding next available
next_backend, next_frontend = find_next_available_ports('test-wo-456')
print(f'Next available ports: Backend={next_backend}, Frontend={next_frontend}')
print('Port allocation test passed!')
"

Integration Testing

# Start agent work orders service
docker compose up -d archon-server

# Create work order with worktree sandbox
curl -X POST http://localhost:8181/agent-work-orders \
  -H "Content-Type: application/json" \
  -d '{
    "repository_url": "https://github.com/coleam00/archon",
    "sandbox_type": "git_worktree",
    "workflow_type": "agent_workflow_plan",
    "user_request": "Fix issue #123"
  }'

# Verify worktree created
ls -la trees/

# Monitor workflow progress
watch -n 2 'curl -s http://localhost:8181/agent-work-orders | jq'

# Verify .ports.env in worktree
cat trees/<work_order_id>/.ports.env

# After completion, verify cleanup
git worktree list

Parallel Execution Testing

# Create 3 work orders simultaneously
for i in 1 2 3; do
  curl -X POST http://localhost:8181/agent-work-orders \
    -H "Content-Type: application/json" \
    -d "{
      \"repository_url\": \"https://github.com/coleam00/archon\",
      \"sandbox_type\": \"git_worktree\",
      \"workflow_type\": \"agent_workflow_plan\",
      \"user_request\": \"Parallel test $i\"
    }" &
done
wait

# Verify all worktrees exist
ls -la trees/

# Verify different ports allocated
for dir in trees/*/; do
  echo "Worktree: $dir"
  cat "$dir/.ports.env"
  echo "---"
done

Notes

Architecture Decision: Compositional vs Centralized

This feature implements Option B (compositional refactoring) because:

  1. Scalability: Compositional design enables running individual phases (e.g., just test or just review) without full workflow
  2. Debugging: Independent scripts are easier to test and debug in isolation
  3. Flexibility: Users can compose custom workflows (e.g., skip review for simple PRs)
  4. Maintainability: Smaller, focused modules are easier to maintain than monolithic orchestrator
  5. Parallelization: Worktree-based approach inherently supports compositional execution

Performance Considerations

  • Worktree Creation: Worktrees are faster than clones (~2-3x) because they share the same .git directory
  • Port Allocation: Hash-based allocation is deterministic but may have collisions; fallback to linear search adds minimal overhead
  • Retry Loops: Test (4 attempts) and review (3 attempts) retry limits prevent infinite loops while allowing reasonable resolution attempts
  • State I/O: File-based state adds disk I/O but enables persistence; consider eventual move to database for high-volume deployments

Future Enhancements

  1. Database State: Replace file-based state with PostgreSQL/Supabase for better concurrent access and querying
  2. WebSocket Updates: Stream test/review progress to UI in real-time
  3. Screenshot Upload: Integrate R2/S3 for screenshot storage and PR comments with images
  4. Workflow Resumption: Support resuming failed workflows from last successful step
  5. Custom Workflows: Allow users to define custom workflow compositions via config
  6. Metrics: Add OpenTelemetry instrumentation for workflow performance monitoring
  7. E2E Testing: Add Playwright/Cypress integration for UI-focused review
  8. Distributed Execution: Support running work orders across multiple machines

Migration Path

For existing deployments:

  1. Backward Compatibility: Keep GitBranchSandbox working alongside GitWorktreeSandbox
  2. Gradual Migration: Default to GIT_BRANCH, opt-in to GIT_WORKTREE via configuration
  3. State Migration: Provide utility to migrate in-memory state to file-based state
  4. Cleanup: Add command to clean up old temporary clones: rm -rf /tmp/agent-work-orders/*

Dependencies

New dependencies to add via uv add:

  • (None required - uses existing git, pytest, claude CLI)
  • #XXX - Original agent-work-orders MVP implementation
  • #XXX - Worktree isolation discussion
  • #XXX - Test phase feature request
  • #XXX - Review automation proposal