Completes the implementation of test/review workflows with automatic resolution and integrates them into the orchestrator. **Phase 3: Test Workflow with Resolution** - Created test_workflow.py with automatic test failure resolution - Implements retry loop with max 4 attempts (configurable via MAX_TEST_RETRY_ATTEMPTS) - Parses JSON test results and resolves failures one by one - Uses existing test.md and resolve_failed_test.md commands - Added run_tests() and resolve_test_failure() to workflow_operations.py **Phase 4: Review Workflow with Resolution** - Created review_workflow.py with automatic blocker issue resolution - Implements retry loop with max 3 attempts (configurable via MAX_REVIEW_RETRY_ATTEMPTS) - Categorizes issues by severity (blocker/tech_debt/skippable) - Only blocks on blocker issues - tech_debt and skippable allowed to pass - Created review_runner.md and resolve_failed_review.md commands - Added run_review() and resolve_review_issue() to workflow_operations.py - Supports screenshot capture for UI review (configurable via ENABLE_SCREENSHOT_CAPTURE) **Phase 5: Compositional Integration** - Updated workflow_orchestrator.py to integrate test and review phases - Test phase runs between commit and PR creation (if ENABLE_TEST_PHASE=true) - Review phase runs after tests (if ENABLE_REVIEW_PHASE=true) - Both phases are optional and controlled by config flags - Step history tracks test and review execution results - Proper error handling and logging for all phases **Supporting Changes** - Updated agent_names.py to add REVIEWER constant - Added configuration flags to config.py for test/review phases - All new code follows structured logging patterns - Maintains compatibility with existing workflow steps **Files Changed**: 19 files, 3035+ lines - New: test_workflow.py, review_workflow.py, review commands - Modified: orchestrator, workflow_operations, agent_names, config - Phases 1-2 files (worktree, state, port allocation) also staged The implementation is complete and ready for testing. All phases now support parallel execution via worktree isolation with deterministic port allocation.
38 KiB
Feature: Compositional Workflow Architecture with Worktree Isolation, Test Resolution, and Review Resolution
Feature Description
Transform the agent-work-orders system from a centralized orchestrator pattern to a compositional script-based architecture that enables parallel execution through git worktrees, automatic test failure resolution with retry logic, and comprehensive review phase with blocker issue patching. This architecture change enables running 15+ work orders simultaneously in isolated worktrees with deterministic port allocation, while maintaining complete SDLC coverage from planning through testing and review.
The system will support:
- Worktree-based isolation: Each work order runs in its own git worktree under
trees/<work_order_id>/instead of temporary clones - Port allocation: Deterministic backend (9100-9114) and frontend (9200-9214) port assignment based on work order ID
- Test phase with resolution: Automatic retry loop (max 4 attempts) that resolves failed tests using AI-powered fixes
- Review phase with resolution: Captures screenshots, compares implementation vs spec, categorizes issues (blocker/tech_debt/skippable), and automatically patches blocker issues (max 3 attempts)
- File-based state: Simple JSON state management (
adw_state.json) instead of in-memory repository - Compositional scripts: Independent workflow scripts (plan, build, test, review, doc, ship) that can be run separately or together
User Story
As a developer managing multiple concurrent features I want to run multiple agent work orders in parallel with isolated environments So that I can scale development velocity without conflicts or resource contention, while ensuring all code passes tests and review before deployment
Problem Statement
The current agent-work-orders architecture has several critical limitations:
- No Parallelization: GitBranchSandbox creates temporary clones that get cleaned up, preventing safe parallel execution of multiple work orders
- No Test Coverage: Missing test workflow step - implementations are committed and PR'd without validation
- No Automated Test Resolution: When tests fail, there's no retry/fix mechanism to automatically resolve failures
- No Review Phase: No automated review of implementation against specifications with screenshot capture and blocker detection
- Centralized Orchestration: Monolithic orchestrator makes it difficult to run individual phases (e.g., just test, just review) independently
- In-Memory State: State management in WorkOrderRepository is not persistent across service restarts
- No Port Management: No system for allocating unique ports for parallel instances
These limitations prevent scaling development workflows and ensuring code quality before PRs are created.
Solution Statement
Implement a compositional workflow architecture inspired by the ADW (AI Developer Workflow) pattern with the following components: SEE EXAMPLES HERE: PRPs/examples/* READ THESE
- GitWorktreeSandbox: Replace GitBranchSandbox with worktree-based isolation that shares the same repo but has independent working directories
- Port Allocation System: Deterministic port assignment (backend: 9100-9114, frontend: 9200-9214) based on work order ID hash
- File-Based State Management: JSON state files for persistence and debugging
- Test Workflow Module: New
test_workflow.pywith automatic resolution and retry logic (4 attempts) - Review Workflow Module: New
review_workflow.pywith screenshot capture, spec comparison, and blocker patching (3 attempts) - Compositional Scripts: Independent workflow operations that can be composed or run individually
- Enhanced WorkflowStep Enum: Add TEST, RESOLVE_TEST, REVIEW, RESOLVE_REVIEW steps
- Resolution Commands: New Claude commands
/resolve_failed_testand/resolve_failed_reviewfor AI-powered fixes
Relevant Files
Core Workflow Files
-
python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py- Main orchestrator that needs refactoring for compositional approach- Currently: Monolithic execute_workflow with sequential steps
- Needs: Modular workflow composition with test/review phases
-
python/src/agent_work_orders/workflow_engine/workflow_operations.py- Atomic workflow operations- Currently: classify_issue, build_plan, implement_plan, create_commit, create_pull_request
- Needs: Add test_workflow, review_workflow, resolve_test, resolve_review operations
-
python/src/agent_work_orders/models.py- Data models including WorkflowStep enum- Currently: WorkflowStep has CLASSIFY, PLAN, IMPLEMENT, COMMIT, REVIEW, TEST, CREATE_PR
- Needs: Add RESOLVE_TEST, RESOLVE_REVIEW steps
Sandbox Management Files
-
python/src/agent_work_orders/sandbox_manager/git_branch_sandbox.py- Current temp clone implementation- Problem: Creates temp dirs, no parallelization support
- Will be replaced by: GitWorktreeSandbox
-
python/src/agent_work_orders/sandbox_manager/sandbox_factory.py- Factory for creating sandboxes- Needs: Add GitWorktreeSandbox creation logic
-
python/src/agent_work_orders/sandbox_manager/sandbox_protocol.py- Sandbox interface- May need: Port allocation methods
State Management Files
-
python/src/agent_work_orders/state_manager/work_order_repository.py- Current in-memory state- Currently: In-memory dictionary with async methods
- Needs: File-based JSON persistence option
-
python/src/agent_work_orders/config.py- Configuration- Needs: Port range configuration, worktree base directory
Command Files
-
python/.claude/commands/agent-work-orders/test.md- Currently just a hello world test- Needs: Comprehensive test suite runner that returns JSON with failed tests
-
python/.claude/commands/agent-work-orders/implementor.md- Implementation command- May need: Context about test requirements
New Files
Worktree Management
python/src/agent_work_orders/sandbox_manager/git_worktree_sandbox.py- New worktree-based sandboxpython/src/agent_work_orders/utils/worktree_operations.py- Worktree CRUD operationspython/src/agent_work_orders/utils/port_allocation.py- Port management utilities
Test Workflow
python/src/agent_work_orders/workflow_engine/test_workflow.py- Test execution with resolutionpython/.claude/commands/agent-work-orders/test_runner.md- Run test suite, return JSONpython/.claude/commands/agent-work-orders/resolve_failed_test.md- Fix failed test given JSON
Review Workflow
python/src/agent_work_orders/workflow_engine/review_workflow.py- Review with screenshot capturepython/.claude/commands/agent-work-orders/review_runner.md- Run review against specpython/.claude/commands/agent-work-orders/resolve_failed_review.md- Patch blocker issuespython/.claude/commands/agent-work-orders/create_patch_plan.md- Generate patch plan for issue
State Management
python/src/agent_work_orders/state_manager/file_state_repository.py- JSON file-based statepython/src/agent_work_orders/models/workflow_state.py- State data models
Documentation
docs/compositional-workflows.md- Architecture documentationdocs/worktree-management.md- Worktree operations guidedocs/test-resolution.md- Test workflow documentationdocs/review-resolution.md- Review workflow documentation
Implementation Plan
Phase 1: Foundation - Worktree Isolation and Port Allocation
Establish the core infrastructure for parallel execution through git worktrees and deterministic port allocation. This phase creates the foundation for all subsequent phases.
Key Deliverables:
- GitWorktreeSandbox implementation
- Port allocation system
- Worktree management utilities
.ports.envfile generation- Updated sandbox factory
Phase 2: File-Based State Management
Replace in-memory state repository with file-based JSON persistence for durability and debuggability across service restarts.
Key Deliverables:
- FileStateRepository implementation
- WorkflowState models
- State migration utilities
- JSON serialization/deserialization
- Backward compatibility layer
Phase 3: Test Workflow with Resolution
Implement comprehensive test execution with automatic failure resolution and retry logic.
Key Deliverables:
- test_workflow.py module
- test_runner.md command (returns JSON array of test results)
- resolve_failed_test.md command (takes test JSON, fixes issue)
- Retry loop (max 4 attempts)
- Test result parsing and formatting
- Integration with orchestrator
Phase 4: Review Workflow with Resolution
Add review phase with screenshot capture, spec comparison, and automatic blocker patching.
Key Deliverables:
- review_workflow.py module
- review_runner.md command (compares implementation vs spec)
- resolve_failed_review.md command (patches blocker issues)
- Screenshot capture integration
- Issue severity categorization (blocker/tech_debt/skippable)
- Retry loop (max 3 attempts)
- R2 upload integration (optional)
Phase 5: Compositional Refactoring
Refactor the centralized orchestrator into composable workflow scripts that can be run independently.
Key Deliverables:
- Modular workflow composition
- Independent script execution
- Workflow step dependencies
- Enhanced error handling
- Workflow resumption support
Step by Step Tasks
Step 1: Create Worktree Sandbox Implementation
Create the core GitWorktreeSandbox class that manages git worktrees for isolated execution.
- Create
python/src/agent_work_orders/sandbox_manager/git_worktree_sandbox.py - Implement
GitWorktreeSandboxclass with:__init__(repository_url, sandbox_identifier)- Initialize with worktree path calculationsetup()- Create worktree undertrees/<sandbox_identifier>/from origin/maincleanup()- Remove worktree usinggit worktree removeexecute_command(command, timeout)- Execute commands in worktree contextget_git_branch_name()- Query current branch in worktree
- Handle existing worktree detection and validation
- Add logging for all worktree operations
- Write unit tests for GitWorktreeSandbox in
python/tests/agent_work_orders/sandbox_manager/test_git_worktree_sandbox.py
Step 2: Implement Port Allocation System
Create deterministic port allocation based on work order ID to enable parallel instances.
- Create
python/src/agent_work_orders/utils/port_allocation.py - Implement functions:
get_ports_for_work_order(work_order_id) -> Tuple[int, int]- Calculate ports from ID hash (backend: 9100-9114, frontend: 9200-9214)is_port_available(port: int) -> bool- Check if port is bindablefind_next_available_ports(work_order_id, max_attempts=15) -> Tuple[int, int]- Find available ports with offsetcreate_ports_env_file(worktree_path, backend_port, frontend_port)- Generate.ports.envfile
- Add port range configuration to
python/src/agent_work_orders/config.py - Write unit tests for port allocation in
python/tests/agent_work_orders/utils/test_port_allocation.py
Step 3: Create Worktree Management Utilities
Build helper utilities for worktree CRUD operations.
- Create
python/src/agent_work_orders/utils/worktree_operations.py - Implement functions:
create_worktree(work_order_id, branch_name, logger) -> Tuple[str, Optional[str]]- Create worktree and return path or errorvalidate_worktree(work_order_id, state) -> Tuple[bool, Optional[str]]- Three-way validation (state, filesystem, git)get_worktree_path(work_order_id) -> str- Calculate absolute worktree pathremove_worktree(work_order_id, logger) -> Tuple[bool, Optional[str]]- Clean up worktreesetup_worktree_environment(worktree_path, backend_port, frontend_port, logger)- Create .ports.env
- Handle git fetch operations before worktree creation
- Add comprehensive error handling and logging
- Write unit tests for worktree operations in
python/tests/agent_work_orders/utils/test_worktree_operations.py
Step 4: Update Sandbox Factory
Modify the sandbox factory to support creating GitWorktreeSandbox instances.
- Update
python/src/agent_work_orders/sandbox_manager/sandbox_factory.py - Add GIT_WORKTREE case to
create_sandbox()method - Integrate port allocation during sandbox creation
- Pass port configuration to GitWorktreeSandbox
- Update SandboxType enum in models.py to promote GIT_WORKTREE from placeholder
- Write integration tests for sandbox factory with worktrees
Step 5: Implement File-Based State Repository
Create file-based state management for persistence and debugging.
- Create
python/src/agent_work_orders/state_manager/file_state_repository.py - Implement
FileStateRepositoryclass:__init__(state_directory: str)- Initialize with state directory pathsave_state(work_order_id, state_data)- Write JSON to<state_dir>/<work_order_id>.jsonload_state(work_order_id) -> Optional[dict]- Read JSON from filelist_states() -> List[str]- List all work order IDs with state filesdelete_state(work_order_id)- Remove state fileupdate_status(work_order_id, status, **kwargs)- Update specific fieldssave_step_history(work_order_id, step_history)- Persist step history
- Add state directory configuration to config.py
- Create state models in
python/src/agent_work_orders/models/workflow_state.py - Write unit tests for file state repository
Step 6: Update WorkflowStep Enum
Add new workflow steps for test and review resolution.
- Update
python/src/agent_work_orders/models.py - Add to WorkflowStep enum:
RESOLVE_TEST = "resolve_test"- Test failure resolution stepRESOLVE_REVIEW = "resolve_review"- Review issue resolution step
- Update
StepHistory.get_current_step()to include new steps in sequence:- Updated sequence: CLASSIFY → PLAN → FIND_PLAN → GENERATE_BRANCH → IMPLEMENT → COMMIT → TEST → RESOLVE_TEST (if needed) → REVIEW → RESOLVE_REVIEW (if needed) → CREATE_PR
- Write unit tests for updated step sequence logic
Step 7: Create Test Runner Command
Build Claude command to execute test suite and return structured JSON results.
- Update
python/.claude/commands/agent-work-orders/test_runner.md - Command should:
- Execute backend tests:
cd python && uv run pytest tests/ -v --tb=short - Execute frontend tests:
cd archon-ui-main && npm test - Parse test results from output
- Return JSON array with structure:
[ { "test_name": "string", "test_file": "string", "passed": boolean, "error": "optional string", "execution_command": "string" } ] - Include test purpose and reproduction command
- Sort failed tests first
- Handle timeout and command errors gracefully
- Execute backend tests:
- Test the command manually with sample repositories
Step 8: Create Resolve Failed Test Command
Build Claude command to analyze and fix failed tests given test JSON.
- Create
python/.claude/commands/agent-work-orders/resolve_failed_test.md - Command takes single argument: test result JSON object
- Command should:
- Parse test failure information
- Analyze root cause of failure
- Read relevant test file and code under test
- Implement fix (code change or test update)
- Re-run the specific failed test to verify fix
- Report success/failure
- Include examples of common test failure patterns
- Add constraints (don't skip tests, maintain test coverage)
- Test the command with sample failed test JSONs
Step 9: Implement Test Workflow Module
Create the test workflow module with automatic resolution and retry logic.
- Create
python/src/agent_work_orders/workflow_engine/test_workflow.py - Implement functions:
run_tests(executor, command_loader, work_order_id, working_dir) -> StepExecutionResult- Execute test suiteparse_test_results(output, logger) -> Tuple[List[TestResult], int, int]- Parse JSON outputresolve_failed_test(executor, command_loader, test_json, work_order_id, working_dir) -> StepExecutionResult- Fix single testrun_tests_with_resolution(executor, command_loader, work_order_id, working_dir, max_attempts=4) -> Tuple[List[TestResult], int, int]- Main retry loop
- Implement retry logic:
- Run tests, check for failures
- If failures exist and attempts < max_attempts: resolve each failed test
- Re-run tests after resolution
- Stop if all tests pass or max attempts reached
- Add TestResult model to models.py
- Write comprehensive unit tests for test workflow
Step 10: Add Test Workflow Operation
Create atomic operation for test execution in workflow_operations.py.
- Update
python/src/agent_work_orders/workflow_engine/workflow_operations.py - Add function:
async def execute_tests( executor: AgentCLIExecutor, command_loader: ClaudeCommandLoader, work_order_id: str, working_dir: str, ) -> StepExecutionResult - Function should:
- Call
run_tests_with_resolution()from test_workflow.py - Return StepExecutionResult with test summary
- Include pass/fail counts in output
- Log detailed test results
- Call
- Add TESTER constant to agent_names.py
- Write unit tests for execute_tests operation
Step 11: Integrate Test Phase in Orchestrator
Add test phase to workflow orchestrator between COMMIT and CREATE_PR steps.
-
Update
python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py -
After commit step (line ~236), add:
# Step 7: Run tests with resolution test_result = await workflow_operations.execute_tests( self.agent_executor, self.command_loader, agent_work_order_id, sandbox.working_dir, ) step_history.steps.append(test_result) await self.state_repository.save_step_history(agent_work_order_id, step_history) if not test_result.success: raise WorkflowExecutionError(f"Tests failed: {test_result.error_message}") bound_logger.info("step_completed", step="test") -
Update step numbering (PR creation becomes step 8)
-
Add test failure handling strategy
-
Write integration tests for full workflow with test phase
Step 12: Create Review Runner Command
Build Claude command to review implementation against spec with screenshot capture.
- Create
python/.claude/commands/agent-work-orders/review_runner.md - Command takes arguments: spec_file_path, work_order_id
- Command should:
- Read specification from spec_file_path
- Analyze implementation in codebase
- Start application (if UI component)
- Capture screenshots of key UI flows
- Compare implementation against spec requirements
- Categorize issues by severity: "blocker" | "tech_debt" | "skippable"
- Return JSON with structure:
{ "review_passed": boolean, "review_issues": [ { "issue_title": "string", "issue_description": "string", "issue_severity": "blocker|tech_debt|skippable", "affected_files": ["string"], "screenshots": ["string"] } ], "screenshots": ["string"] }
- Include review criteria and severity definitions
- Test command with sample specifications
Step 13: Create Resolve Failed Review Command
Build Claude command to patch blocker issues from review.
- Create
python/.claude/commands/agent-work-orders/resolve_failed_review.md - Command takes single argument: review issue JSON object
- Command should:
- Parse review issue details
- Create patch plan addressing the issue
- Implement the patch (code changes)
- Verify patch resolves the issue
- Report success/failure
- Include constraints (only fix blocker issues, maintain functionality)
- Add examples of common review issue patterns
- Test command with sample review issues
Step 14: Implement Review Workflow Module
Create the review workflow module with automatic blocker patching.
- Create
python/src/agent_work_orders/workflow_engine/review_workflow.py - Implement functions:
run_review(executor, command_loader, spec_file, work_order_id, working_dir) -> ReviewResult- Execute reviewparse_review_results(output, logger) -> ReviewResult- Parse JSON outputresolve_review_issue(executor, command_loader, issue_json, work_order_id, working_dir) -> StepExecutionResult- Patch single issuerun_review_with_resolution(executor, command_loader, spec_file, work_order_id, working_dir, max_attempts=3) -> ReviewResult- Main retry loop
- Implement retry logic:
- Run review, check for blocker issues
- If blockers exist and attempts < max_attempts: resolve each blocker
- Re-run review after patching
- Stop if no blockers or max attempts reached
- Allow tech_debt and skippable issues to pass
- Add ReviewResult and ReviewIssue models to models.py
- Write comprehensive unit tests for review workflow
Step 15: Add Review Workflow Operation
Create atomic operation for review execution in workflow_operations.py.
- Update
python/src/agent_work_orders/workflow_engine/workflow_operations.py - Add function:
async def execute_review( executor: AgentCLIExecutor, command_loader: ClaudeCommandLoader, spec_file: str, work_order_id: str, working_dir: str, ) -> StepExecutionResult - Function should:
- Call
run_review_with_resolution()from review_workflow.py - Return StepExecutionResult with review summary
- Include blocker count in output
- Log detailed review results
- Call
- Add REVIEWER constant to agent_names.py
- Write unit tests for execute_review operation
Step 16: Integrate Review Phase in Orchestrator
Add review phase to workflow orchestrator between TEST and CREATE_PR steps.
-
Update
python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py -
After test step, add:
# Step 8: Run review with resolution review_result = await workflow_operations.execute_review( self.agent_executor, self.command_loader, plan_file or "", agent_work_order_id, sandbox.working_dir, ) step_history.steps.append(review_result) await self.state_repository.save_step_history(agent_work_order_id, step_history) if not review_result.success: raise WorkflowExecutionError(f"Review failed: {review_result.error_message}") bound_logger.info("step_completed", step="review") -
Update step numbering (PR creation becomes step 9)
-
Add review failure handling strategy
-
Write integration tests for full workflow with review phase
Step 17: Refactor Orchestrator for Composition
Refactor workflow orchestrator to support modular composition.
- Update
python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py - Extract workflow phases into separate methods:
_execute_planning_phase()- classify → plan → find_plan → generate_branch_execute_implementation_phase()- implement → commit_execute_testing_phase()- test → resolve_test (if needed)_execute_review_phase()- review → resolve_review (if needed)_execute_deployment_phase()- create_pr
- Update
execute_workflow()to compose phases:await self._execute_planning_phase(...) await self._execute_implementation_phase(...) await self._execute_testing_phase(...) await self._execute_review_phase(...) await self._execute_deployment_phase(...) - Add phase-level error handling and recovery
- Support skipping phases via configuration
- Write unit tests for each phase method
Step 18: Add Configuration for New Features
Add configuration options for worktrees, ports, and new workflow phases.
-
Update
python/src/agent_work_orders/config.py -
Add configuration:
# Worktree configuration WORKTREE_BASE_DIR: str = os.getenv("WORKTREE_BASE_DIR", "trees") # Port allocation BACKEND_PORT_RANGE_START: int = int(os.getenv("BACKEND_PORT_START", "9100")) BACKEND_PORT_RANGE_END: int = int(os.getenv("BACKEND_PORT_END", "9114")) FRONTEND_PORT_RANGE_START: int = int(os.getenv("FRONTEND_PORT_START", "9200")) FRONTEND_PORT_RANGE_END: int = int(os.getenv("FRONTEND_PORT_END", "9214")) # Test workflow MAX_TEST_RETRY_ATTEMPTS: int = int(os.getenv("MAX_TEST_RETRY_ATTEMPTS", "4")) ENABLE_TEST_PHASE: bool = os.getenv("ENABLE_TEST_PHASE", "true").lower() == "true" # Review workflow MAX_REVIEW_RETRY_ATTEMPTS: int = int(os.getenv("MAX_REVIEW_RETRY_ATTEMPTS", "3")) ENABLE_REVIEW_PHASE: bool = os.getenv("ENABLE_REVIEW_PHASE", "true").lower() == "true" ENABLE_SCREENSHOT_CAPTURE: bool = os.getenv("ENABLE_SCREENSHOT_CAPTURE", "true").lower() == "true" # State management STATE_STORAGE_TYPE: str = os.getenv("STATE_STORAGE_TYPE", "memory") # "memory" or "file" FILE_STATE_DIRECTORY: str = os.getenv("FILE_STATE_DIRECTORY", "agent-work-orders-state") -
Update
.env.examplewith new configuration options -
Document configuration in README
Step 19: Create Documentation
Document the new compositional architecture and workflows.
-
Create
docs/compositional-workflows.md:- Architecture overview
- Compositional design principles
- Phase composition examples
- Error handling and recovery
- Configuration guide
-
Create
docs/worktree-management.md:- Worktree vs temporary clone comparison
- Parallelization capabilities
- Port allocation system
- Cleanup and maintenance
-
Create
docs/test-resolution.md:- Test workflow overview
- Retry logic explanation
- Test resolution examples
- Troubleshooting failed tests
-
Create
docs/review-resolution.md:- Review workflow overview
- Screenshot capture setup
- Issue severity definitions
- Blocker patching process
- R2 upload configuration
Step 20: Run Validation Commands
Execute all validation commands to ensure the feature works correctly with zero regressions.
- Run backend tests:
cd python && uv run pytest tests/agent_work_orders/ -v - Run backend linting:
cd python && uv run ruff check src/agent_work_orders/ - Run type checking:
cd python && uv run mypy src/agent_work_orders/ - Test worktree creation manually:
cd python python -c " from src.agent_work_orders.utils.worktree_operations import create_worktree from src.agent_work_orders.utils.structured_logger import get_logger logger = get_logger('test') path, err = create_worktree('test-wo-123', 'test-branch', logger) print(f'Path: {path}, Error: {err}') " - Test port allocation:
cd python python -c " from src.agent_work_orders.utils.port_allocation import get_ports_for_work_order backend, frontend = get_ports_for_work_order('test-wo-123') print(f'Backend: {backend}, Frontend: {frontend}') " - Create test work order with new workflow:
curl -X POST http://localhost:8181/agent-work-orders \ -H "Content-Type: application/json" \ -d '{ "repository_url": "https://github.com/your-test-repo", "sandbox_type": "git_worktree", "workflow_type": "agent_workflow_plan", "user_request": "Add a new feature with tests" }' - Verify worktree created under
trees/<work_order_id>/ - Verify
.ports.envcreated in worktree - Monitor workflow execution through all phases
- Verify test phase runs and resolves failures
- Verify review phase runs and patches blockers
- Verify PR created successfully
- Clean up test worktrees:
git worktree prune
Testing Strategy
Unit Tests
Worktree Management:
- Test worktree creation with valid repository
- Test worktree creation with invalid branch
- Test worktree validation (three-way check)
- Test worktree cleanup
- Test handling of existing worktrees
Port Allocation:
- Test deterministic port assignment from work order ID
- Test port availability checking
- Test finding next available ports with collision
- Test port range boundaries (9100-9114, 9200-9214)
- Test
.ports.envfile generation
Test Workflow:
- Test parsing valid test result JSON
- Test parsing malformed test result JSON
- Test retry loop with all tests passing
- Test retry loop with some tests failing then passing
- Test retry loop reaching max attempts
- Test individual test resolution
Review Workflow:
- Test parsing valid review result JSON
- Test parsing malformed review result JSON
- Test retry loop with no blocker issues
- Test retry loop with blockers then resolved
- Test retry loop reaching max attempts
- Test issue severity filtering
State Management:
- Test saving state to JSON file
- Test loading state from JSON file
- Test updating specific state fields
- Test handling missing state files
- Test concurrent state access
Integration Tests
End-to-End Workflow:
- Test complete workflow with worktree sandbox: classify → plan → implement → commit → test → review → PR
- Test test phase with intentional test failure and resolution
- Test review phase with intentional blocker issue and patching
- Test parallel execution of multiple work orders with different ports
- Test workflow resumption after failure
- Test cleanup of worktrees after completion
Sandbox Integration:
- Test command execution in worktree context
- Test git operations in worktree
- Test branch creation in worktree
- Test worktree isolation (parallel instances don't interfere)
State Persistence:
- Test state survives service restart (file-based)
- Test state migration from memory to file
- Test state corruption recovery
Edge Cases
Worktree Edge Cases:
- Worktree already exists (should reuse or fail gracefully)
- Git repository unreachable (should fail setup)
- Insufficient disk space for worktree (should fail with clear error)
- Worktree removal fails (should log error and continue)
- Maximum worktrees reached (15 concurrent) - should queue or fail
Port Allocation Edge Cases:
- All ports in range occupied (should fail with error)
- Port becomes occupied between allocation and use (should retry)
- Invalid port range in configuration (should fail validation)
Test Workflow Edge Cases:
- Test command times out (should mark as failed)
- Test command returns invalid JSON (should fail gracefully)
- All tests fail and none can be resolved (should fail after max attempts)
- Test resolution introduces new failures (should continue with retry loop)
Review Workflow Edge Cases:
- Review command crashes (should fail gracefully)
- Screenshot capture fails (should continue review without screenshots)
- Review finds only skippable issues (should pass)
- Blocker patch introduces new blocker (should continue with retry loop)
- Spec file not found (should fail with clear error)
State Management Edge Cases:
- State file corrupted (should fail with recovery suggestion)
- State directory not writable (should fail with permission error)
- Concurrent access to same state file (should handle with locking or fail safely)
Acceptance Criteria
- GitWorktreeSandbox successfully creates and manages worktrees under
trees/<work_order_id>/ - Port allocation deterministically assigns unique ports (backend: 9100-9114, frontend: 9200-9214) based on work order ID
- Multiple work orders (at least 3) can run in parallel without port or filesystem conflicts
.ports.envfile is created in each worktree with correct port configuration- Test workflow successfully runs test suite and returns structured JSON results
- Test workflow automatically resolves failed tests up to 4 attempts
- Test workflow stops retrying when all tests pass
- Review workflow successfully reviews implementation against spec
- Review workflow captures screenshots (when enabled)
- Review workflow categorizes issues by severity (blocker/tech_debt/skippable)
- Review workflow automatically patches blocker issues up to 3 attempts
- Review workflow allows tech_debt and skippable issues to pass
- WorkflowStep enum includes TEST, RESOLVE_TEST, REVIEW, RESOLVE_REVIEW steps
- Workflow orchestrator executes all phases: planning → implementation → testing → review → deployment
- File-based state repository persists state to JSON files
- State survives service restarts when using file-based storage
- Configuration supports enabling/disabling test and review phases
- All existing tests pass with zero regressions
- New unit tests achieve >80% code coverage for new modules
- Integration tests verify end-to-end workflow with parallel execution
- Documentation covers compositional architecture, worktrees, test resolution, and review resolution
- Cleanup of worktrees works correctly (git worktree remove + prune)
- Error messages are clear and actionable for all failure scenarios
Validation Commands
Execute every command to validate the feature works correctly with zero regressions.
Backend Tests
cd python && uv run pytest tests/agent_work_orders/ -v --tb=short- Run all agent work orders testscd python && uv run pytest tests/agent_work_orders/sandbox_manager/ -v- Test sandbox managementcd python && uv run pytest tests/agent_work_orders/workflow_engine/ -v- Test workflow enginecd python && uv run pytest tests/agent_work_orders/utils/ -v- Test utilities
Code Quality
cd python && uv run ruff check src/agent_work_orders/- Check code qualitycd python && uv run mypy src/agent_work_orders/- Type checking
Manual Worktree Testing
# Test worktree creation
cd python
python -c "
from src.agent_work_orders.utils.worktree_operations import create_worktree, validate_worktree, remove_worktree
from src.agent_work_orders.utils.structured_logger import get_logger
logger = get_logger('test')
# Create worktree
path, err = create_worktree('test-wo-123', 'test-branch', logger)
print(f'Created worktree at: {path}')
assert err is None, f'Error: {err}'
# Validate worktree
from src.agent_work_orders.state_manager.file_state_repository import FileStateRepository
state_repo = FileStateRepository('test-state')
state_data = {'worktree_path': path}
valid, err = validate_worktree('test-wo-123', state_data)
assert valid, f'Validation failed: {err}'
# Remove worktree
success, err = remove_worktree('test-wo-123', logger)
assert success, f'Removal failed: {err}'
print('Worktree lifecycle test passed!')
"
Manual Port Allocation Testing
cd python
python -c "
from src.agent_work_orders.utils.port_allocation import get_ports_for_work_order, find_next_available_ports, is_port_available
backend, frontend = get_ports_for_work_order('test-wo-123')
print(f'Ports for test-wo-123: Backend={backend}, Frontend={frontend}')
assert 9100 <= backend <= 9114, f'Backend port out of range: {backend}'
assert 9200 <= frontend <= 9214, f'Frontend port out of range: {frontend}'
# Test availability check
available = is_port_available(backend)
print(f'Backend port {backend} available: {available}')
# Test finding next available
next_backend, next_frontend = find_next_available_ports('test-wo-456')
print(f'Next available ports: Backend={next_backend}, Frontend={next_frontend}')
print('Port allocation test passed!')
"
Integration Testing
# Start agent work orders service
docker compose up -d archon-server
# Create work order with worktree sandbox
curl -X POST http://localhost:8181/agent-work-orders \
-H "Content-Type: application/json" \
-d '{
"repository_url": "https://github.com/coleam00/archon",
"sandbox_type": "git_worktree",
"workflow_type": "agent_workflow_plan",
"user_request": "Fix issue #123"
}'
# Verify worktree created
ls -la trees/
# Monitor workflow progress
watch -n 2 'curl -s http://localhost:8181/agent-work-orders | jq'
# Verify .ports.env in worktree
cat trees/<work_order_id>/.ports.env
# After completion, verify cleanup
git worktree list
Parallel Execution Testing
# Create 3 work orders simultaneously
for i in 1 2 3; do
curl -X POST http://localhost:8181/agent-work-orders \
-H "Content-Type: application/json" \
-d "{
\"repository_url\": \"https://github.com/coleam00/archon\",
\"sandbox_type\": \"git_worktree\",
\"workflow_type\": \"agent_workflow_plan\",
\"user_request\": \"Parallel test $i\"
}" &
done
wait
# Verify all worktrees exist
ls -la trees/
# Verify different ports allocated
for dir in trees/*/; do
echo "Worktree: $dir"
cat "$dir/.ports.env"
echo "---"
done
Notes
Architecture Decision: Compositional vs Centralized
This feature implements Option B (compositional refactoring) because:
- Scalability: Compositional design enables running individual phases (e.g., just test or just review) without full workflow
- Debugging: Independent scripts are easier to test and debug in isolation
- Flexibility: Users can compose custom workflows (e.g., skip review for simple PRs)
- Maintainability: Smaller, focused modules are easier to maintain than monolithic orchestrator
- Parallelization: Worktree-based approach inherently supports compositional execution
Performance Considerations
- Worktree Creation: Worktrees are faster than clones (~2-3x) because they share the same .git directory
- Port Allocation: Hash-based allocation is deterministic but may have collisions; fallback to linear search adds minimal overhead
- Retry Loops: Test (4 attempts) and review (3 attempts) retry limits prevent infinite loops while allowing reasonable resolution attempts
- State I/O: File-based state adds disk I/O but enables persistence; consider eventual move to database for high-volume deployments
Future Enhancements
- Database State: Replace file-based state with PostgreSQL/Supabase for better concurrent access and querying
- WebSocket Updates: Stream test/review progress to UI in real-time
- Screenshot Upload: Integrate R2/S3 for screenshot storage and PR comments with images
- Workflow Resumption: Support resuming failed workflows from last successful step
- Custom Workflows: Allow users to define custom workflow compositions via config
- Metrics: Add OpenTelemetry instrumentation for workflow performance monitoring
- E2E Testing: Add Playwright/Cypress integration for UI-focused review
- Distributed Execution: Support running work orders across multiple machines
Migration Path
For existing deployments:
- Backward Compatibility: Keep GitBranchSandbox working alongside GitWorktreeSandbox
- Gradual Migration: Default to GIT_BRANCH, opt-in to GIT_WORKTREE via configuration
- State Migration: Provide utility to migrate in-memory state to file-based state
- Cleanup: Add command to clean up old temporary clones:
rm -rf /tmp/agent-work-orders/*
Dependencies
New dependencies to add via uv add:
- (None required - uses existing git, pytest, claude CLI)
Related Issues/PRs
- #XXX - Original agent-work-orders MVP implementation
- #XXX - Worktree isolation discussion
- #XXX - Test phase feature request
- #XXX - Review automation proposal