Files
archon/PRPs/specs/compositional-workflow-architecture.md
Rasmus Widing 1c0020946b feat: Implement phases 3-5 of compositional workflow architecture
Completes the implementation of test/review workflows with automatic resolution
and integrates them into the orchestrator.

**Phase 3: Test Workflow with Resolution**
- Created test_workflow.py with automatic test failure resolution
- Implements retry loop with max 4 attempts (configurable via MAX_TEST_RETRY_ATTEMPTS)
- Parses JSON test results and resolves failures one by one
- Uses existing test.md and resolve_failed_test.md commands
- Added run_tests() and resolve_test_failure() to workflow_operations.py

**Phase 4: Review Workflow with Resolution**
- Created review_workflow.py with automatic blocker issue resolution
- Implements retry loop with max 3 attempts (configurable via MAX_REVIEW_RETRY_ATTEMPTS)
- Categorizes issues by severity (blocker/tech_debt/skippable)
- Only blocks on blocker issues - tech_debt and skippable allowed to pass
- Created review_runner.md and resolve_failed_review.md commands
- Added run_review() and resolve_review_issue() to workflow_operations.py
- Supports screenshot capture for UI review (configurable via ENABLE_SCREENSHOT_CAPTURE)

**Phase 5: Compositional Integration**
- Updated workflow_orchestrator.py to integrate test and review phases
- Test phase runs between commit and PR creation (if ENABLE_TEST_PHASE=true)
- Review phase runs after tests (if ENABLE_REVIEW_PHASE=true)
- Both phases are optional and controlled by config flags
- Step history tracks test and review execution results
- Proper error handling and logging for all phases

**Supporting Changes**
- Updated agent_names.py to add REVIEWER constant
- Added configuration flags to config.py for test/review phases
- All new code follows structured logging patterns
- Maintains compatibility with existing workflow steps

**Files Changed**: 19 files, 3035+ lines
- New: test_workflow.py, review_workflow.py, review commands
- Modified: orchestrator, workflow_operations, agent_names, config
- Phases 1-2 files (worktree, state, port allocation) also staged

The implementation is complete and ready for testing. All phases now support
parallel execution via worktree isolation with deterministic port allocation.
2025-10-16 19:18:03 +03:00

947 lines
38 KiB
Markdown

# Feature: Compositional Workflow Architecture with Worktree Isolation, Test Resolution, and Review Resolution
## Feature Description
Transform the agent-work-orders system from a centralized orchestrator pattern to a compositional script-based architecture that enables parallel execution through git worktrees, automatic test failure resolution with retry logic, and comprehensive review phase with blocker issue patching. This architecture change enables running 15+ work orders simultaneously in isolated worktrees with deterministic port allocation, while maintaining complete SDLC coverage from planning through testing and review.
The system will support:
- **Worktree-based isolation**: Each work order runs in its own git worktree under `trees/<work_order_id>/` instead of temporary clones
- **Port allocation**: Deterministic backend (9100-9114) and frontend (9200-9214) port assignment based on work order ID
- **Test phase with resolution**: Automatic retry loop (max 4 attempts) that resolves failed tests using AI-powered fixes
- **Review phase with resolution**: Captures screenshots, compares implementation vs spec, categorizes issues (blocker/tech_debt/skippable), and automatically patches blocker issues (max 3 attempts)
- **File-based state**: Simple JSON state management (`adw_state.json`) instead of in-memory repository
- **Compositional scripts**: Independent workflow scripts (plan, build, test, review, doc, ship) that can be run separately or together
## User Story
As a developer managing multiple concurrent features
I want to run multiple agent work orders in parallel with isolated environments
So that I can scale development velocity without conflicts or resource contention, while ensuring all code passes tests and review before deployment
## Problem Statement
The current agent-work-orders architecture has several critical limitations:
1. **No Parallelization**: GitBranchSandbox creates temporary clones that get cleaned up, preventing safe parallel execution of multiple work orders
2. **No Test Coverage**: Missing test workflow step - implementations are committed and PR'd without validation
3. **No Automated Test Resolution**: When tests fail, there's no retry/fix mechanism to automatically resolve failures
4. **No Review Phase**: No automated review of implementation against specifications with screenshot capture and blocker detection
5. **Centralized Orchestration**: Monolithic orchestrator makes it difficult to run individual phases (e.g., just test, just review) independently
6. **In-Memory State**: State management in WorkOrderRepository is not persistent across service restarts
7. **No Port Management**: No system for allocating unique ports for parallel instances
These limitations prevent scaling development workflows and ensuring code quality before PRs are created.
## Solution Statement
Implement a compositional workflow architecture inspired by the ADW (AI Developer Workflow) pattern with the following components: SEE EXAMPLES HERE: PRPs/examples/\* READ THESE
1. **GitWorktreeSandbox**: Replace GitBranchSandbox with worktree-based isolation that shares the same repo but has independent working directories
2. **Port Allocation System**: Deterministic port assignment (backend: 9100-9114, frontend: 9200-9214) based on work order ID hash
3. **File-Based State Management**: JSON state files for persistence and debugging
4. **Test Workflow Module**: New `test_workflow.py` with automatic resolution and retry logic (4 attempts)
5. **Review Workflow Module**: New `review_workflow.py` with screenshot capture, spec comparison, and blocker patching (3 attempts)
6. **Compositional Scripts**: Independent workflow operations that can be composed or run individually
7. **Enhanced WorkflowStep Enum**: Add TEST, RESOLVE_TEST, REVIEW, RESOLVE_REVIEW steps
8. **Resolution Commands**: New Claude commands `/resolve_failed_test` and `/resolve_failed_review` for AI-powered fixes
## Relevant Files
### Core Workflow Files
- `python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py` - Main orchestrator that needs refactoring for compositional approach
- Currently: Monolithic execute_workflow with sequential steps
- Needs: Modular workflow composition with test/review phases
- `python/src/agent_work_orders/workflow_engine/workflow_operations.py` - Atomic workflow operations
- Currently: classify_issue, build_plan, implement_plan, create_commit, create_pull_request
- Needs: Add test_workflow, review_workflow, resolve_test, resolve_review operations
- `python/src/agent_work_orders/models.py` - Data models including WorkflowStep enum
- Currently: WorkflowStep has CLASSIFY, PLAN, IMPLEMENT, COMMIT, REVIEW, TEST, CREATE_PR
- Needs: Add RESOLVE_TEST, RESOLVE_REVIEW steps
### Sandbox Management Files
- `python/src/agent_work_orders/sandbox_manager/git_branch_sandbox.py` - Current temp clone implementation
- Problem: Creates temp dirs, no parallelization support
- Will be replaced by: GitWorktreeSandbox
- `python/src/agent_work_orders/sandbox_manager/sandbox_factory.py` - Factory for creating sandboxes
- Needs: Add GitWorktreeSandbox creation logic
- `python/src/agent_work_orders/sandbox_manager/sandbox_protocol.py` - Sandbox interface
- May need: Port allocation methods
### State Management Files
- `python/src/agent_work_orders/state_manager/work_order_repository.py` - Current in-memory state
- Currently: In-memory dictionary with async methods
- Needs: File-based JSON persistence option
- `python/src/agent_work_orders/config.py` - Configuration
- Needs: Port range configuration, worktree base directory
### Command Files
- `python/.claude/commands/agent-work-orders/test.md` - Currently just a hello world test
- Needs: Comprehensive test suite runner that returns JSON with failed tests
- `python/.claude/commands/agent-work-orders/implementor.md` - Implementation command
- May need: Context about test requirements
### New Files
#### Worktree Management
- `python/src/agent_work_orders/sandbox_manager/git_worktree_sandbox.py` - New worktree-based sandbox
- `python/src/agent_work_orders/utils/worktree_operations.py` - Worktree CRUD operations
- `python/src/agent_work_orders/utils/port_allocation.py` - Port management utilities
#### Test Workflow
- `python/src/agent_work_orders/workflow_engine/test_workflow.py` - Test execution with resolution
- `python/.claude/commands/agent-work-orders/test_runner.md` - Run test suite, return JSON
- `python/.claude/commands/agent-work-orders/resolve_failed_test.md` - Fix failed test given JSON
#### Review Workflow
- `python/src/agent_work_orders/workflow_engine/review_workflow.py` - Review with screenshot capture
- `python/.claude/commands/agent-work-orders/review_runner.md` - Run review against spec
- `python/.claude/commands/agent-work-orders/resolve_failed_review.md` - Patch blocker issues
- `python/.claude/commands/agent-work-orders/create_patch_plan.md` - Generate patch plan for issue
#### State Management
- `python/src/agent_work_orders/state_manager/file_state_repository.py` - JSON file-based state
- `python/src/agent_work_orders/models/workflow_state.py` - State data models
#### Documentation
- `docs/compositional-workflows.md` - Architecture documentation
- `docs/worktree-management.md` - Worktree operations guide
- `docs/test-resolution.md` - Test workflow documentation
- `docs/review-resolution.md` - Review workflow documentation
## Implementation Plan
### Phase 1: Foundation - Worktree Isolation and Port Allocation
Establish the core infrastructure for parallel execution through git worktrees and deterministic port allocation. This phase creates the foundation for all subsequent phases.
**Key Deliverables**:
- GitWorktreeSandbox implementation
- Port allocation system
- Worktree management utilities
- `.ports.env` file generation
- Updated sandbox factory
### Phase 2: File-Based State Management
Replace in-memory state repository with file-based JSON persistence for durability and debuggability across service restarts.
**Key Deliverables**:
- FileStateRepository implementation
- WorkflowState models
- State migration utilities
- JSON serialization/deserialization
- Backward compatibility layer
### Phase 3: Test Workflow with Resolution
Implement comprehensive test execution with automatic failure resolution and retry logic.
**Key Deliverables**:
- test_workflow.py module
- test_runner.md command (returns JSON array of test results)
- resolve_failed_test.md command (takes test JSON, fixes issue)
- Retry loop (max 4 attempts)
- Test result parsing and formatting
- Integration with orchestrator
### Phase 4: Review Workflow with Resolution
Add review phase with screenshot capture, spec comparison, and automatic blocker patching.
**Key Deliverables**:
- review_workflow.py module
- review_runner.md command (compares implementation vs spec)
- resolve_failed_review.md command (patches blocker issues)
- Screenshot capture integration
- Issue severity categorization (blocker/tech_debt/skippable)
- Retry loop (max 3 attempts)
- R2 upload integration (optional)
### Phase 5: Compositional Refactoring
Refactor the centralized orchestrator into composable workflow scripts that can be run independently.
**Key Deliverables**:
- Modular workflow composition
- Independent script execution
- Workflow step dependencies
- Enhanced error handling
- Workflow resumption support
## Step by Step Tasks
### Step 1: Create Worktree Sandbox Implementation
Create the core GitWorktreeSandbox class that manages git worktrees for isolated execution.
- Create `python/src/agent_work_orders/sandbox_manager/git_worktree_sandbox.py`
- Implement `GitWorktreeSandbox` class with:
- `__init__(repository_url, sandbox_identifier)` - Initialize with worktree path calculation
- `setup()` - Create worktree under `trees/<sandbox_identifier>/` from origin/main
- `cleanup()` - Remove worktree using `git worktree remove`
- `execute_command(command, timeout)` - Execute commands in worktree context
- `get_git_branch_name()` - Query current branch in worktree
- Handle existing worktree detection and validation
- Add logging for all worktree operations
- Write unit tests for GitWorktreeSandbox in `python/tests/agent_work_orders/sandbox_manager/test_git_worktree_sandbox.py`
### Step 2: Implement Port Allocation System
Create deterministic port allocation based on work order ID to enable parallel instances.
- Create `python/src/agent_work_orders/utils/port_allocation.py`
- Implement functions:
- `get_ports_for_work_order(work_order_id) -> Tuple[int, int]` - Calculate ports from ID hash (backend: 9100-9114, frontend: 9200-9214)
- `is_port_available(port: int) -> bool` - Check if port is bindable
- `find_next_available_ports(work_order_id, max_attempts=15) -> Tuple[int, int]` - Find available ports with offset
- `create_ports_env_file(worktree_path, backend_port, frontend_port)` - Generate `.ports.env` file
- Add port range configuration to `python/src/agent_work_orders/config.py`
- Write unit tests for port allocation in `python/tests/agent_work_orders/utils/test_port_allocation.py`
### Step 3: Create Worktree Management Utilities
Build helper utilities for worktree CRUD operations.
- Create `python/src/agent_work_orders/utils/worktree_operations.py`
- Implement functions:
- `create_worktree(work_order_id, branch_name, logger) -> Tuple[str, Optional[str]]` - Create worktree and return path or error
- `validate_worktree(work_order_id, state) -> Tuple[bool, Optional[str]]` - Three-way validation (state, filesystem, git)
- `get_worktree_path(work_order_id) -> str` - Calculate absolute worktree path
- `remove_worktree(work_order_id, logger) -> Tuple[bool, Optional[str]]` - Clean up worktree
- `setup_worktree_environment(worktree_path, backend_port, frontend_port, logger)` - Create .ports.env
- Handle git fetch operations before worktree creation
- Add comprehensive error handling and logging
- Write unit tests for worktree operations in `python/tests/agent_work_orders/utils/test_worktree_operations.py`
### Step 4: Update Sandbox Factory
Modify the sandbox factory to support creating GitWorktreeSandbox instances.
- Update `python/src/agent_work_orders/sandbox_manager/sandbox_factory.py`
- Add GIT_WORKTREE case to `create_sandbox()` method
- Integrate port allocation during sandbox creation
- Pass port configuration to GitWorktreeSandbox
- Update SandboxType enum in models.py to promote GIT_WORKTREE from placeholder
- Write integration tests for sandbox factory with worktrees
### Step 5: Implement File-Based State Repository
Create file-based state management for persistence and debugging.
- Create `python/src/agent_work_orders/state_manager/file_state_repository.py`
- Implement `FileStateRepository` class:
- `__init__(state_directory: str)` - Initialize with state directory path
- `save_state(work_order_id, state_data)` - Write JSON to `<state_dir>/<work_order_id>.json`
- `load_state(work_order_id) -> Optional[dict]` - Read JSON from file
- `list_states() -> List[str]` - List all work order IDs with state files
- `delete_state(work_order_id)` - Remove state file
- `update_status(work_order_id, status, **kwargs)` - Update specific fields
- `save_step_history(work_order_id, step_history)` - Persist step history
- Add state directory configuration to config.py
- Create state models in `python/src/agent_work_orders/models/workflow_state.py`
- Write unit tests for file state repository
### Step 6: Update WorkflowStep Enum
Add new workflow steps for test and review resolution.
- Update `python/src/agent_work_orders/models.py`
- Add to WorkflowStep enum:
- `RESOLVE_TEST = "resolve_test"` - Test failure resolution step
- `RESOLVE_REVIEW = "resolve_review"` - Review issue resolution step
- Update `StepHistory.get_current_step()` to include new steps in sequence:
- Updated sequence: CLASSIFY → PLAN → FIND_PLAN → GENERATE_BRANCH → IMPLEMENT → COMMIT → TEST → RESOLVE_TEST (if needed) → REVIEW → RESOLVE_REVIEW (if needed) → CREATE_PR
- Write unit tests for updated step sequence logic
### Step 7: Create Test Runner Command
Build Claude command to execute test suite and return structured JSON results.
- Update `python/.claude/commands/agent-work-orders/test_runner.md`
- Command should:
- Execute backend tests: `cd python && uv run pytest tests/ -v --tb=short`
- Execute frontend tests: `cd archon-ui-main && npm test`
- Parse test results from output
- Return JSON array with structure:
```json
[
{
"test_name": "string",
"test_file": "string",
"passed": boolean,
"error": "optional string",
"execution_command": "string"
}
]
```
- Include test purpose and reproduction command
- Sort failed tests first
- Handle timeout and command errors gracefully
- Test the command manually with sample repositories
### Step 8: Create Resolve Failed Test Command
Build Claude command to analyze and fix failed tests given test JSON.
- Create `python/.claude/commands/agent-work-orders/resolve_failed_test.md`
- Command takes single argument: test result JSON object
- Command should:
- Parse test failure information
- Analyze root cause of failure
- Read relevant test file and code under test
- Implement fix (code change or test update)
- Re-run the specific failed test to verify fix
- Report success/failure
- Include examples of common test failure patterns
- Add constraints (don't skip tests, maintain test coverage)
- Test the command with sample failed test JSONs
### Step 9: Implement Test Workflow Module
Create the test workflow module with automatic resolution and retry logic.
- Create `python/src/agent_work_orders/workflow_engine/test_workflow.py`
- Implement functions:
- `run_tests(executor, command_loader, work_order_id, working_dir) -> StepExecutionResult` - Execute test suite
- `parse_test_results(output, logger) -> Tuple[List[TestResult], int, int]` - Parse JSON output
- `resolve_failed_test(executor, command_loader, test_json, work_order_id, working_dir) -> StepExecutionResult` - Fix single test
- `run_tests_with_resolution(executor, command_loader, work_order_id, working_dir, max_attempts=4) -> Tuple[List[TestResult], int, int]` - Main retry loop
- Implement retry logic:
- Run tests, check for failures
- If failures exist and attempts < max_attempts: resolve each failed test
- Re-run tests after resolution
- Stop if all tests pass or max attempts reached
- Add TestResult model to models.py
- Write comprehensive unit tests for test workflow
### Step 10: Add Test Workflow Operation
Create atomic operation for test execution in workflow_operations.py.
- Update `python/src/agent_work_orders/workflow_engine/workflow_operations.py`
- Add function:
```python
async def execute_tests(
executor: AgentCLIExecutor,
command_loader: ClaudeCommandLoader,
work_order_id: str,
working_dir: str,
) -> StepExecutionResult
```
- Function should:
- Call `run_tests_with_resolution()` from test_workflow.py
- Return StepExecutionResult with test summary
- Include pass/fail counts in output
- Log detailed test results
- Add TESTER constant to agent_names.py
- Write unit tests for execute_tests operation
### Step 11: Integrate Test Phase in Orchestrator
Add test phase to workflow orchestrator between COMMIT and CREATE_PR steps.
- Update `python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py`
- After commit step (line ~236), add:
```python
# Step 7: Run tests with resolution
test_result = await workflow_operations.execute_tests(
self.agent_executor,
self.command_loader,
agent_work_order_id,
sandbox.working_dir,
)
step_history.steps.append(test_result)
await self.state_repository.save_step_history(agent_work_order_id, step_history)
if not test_result.success:
raise WorkflowExecutionError(f"Tests failed: {test_result.error_message}")
bound_logger.info("step_completed", step="test")
```
- Update step numbering (PR creation becomes step 8)
- Add test failure handling strategy
- Write integration tests for full workflow with test phase
### Step 12: Create Review Runner Command
Build Claude command to review implementation against spec with screenshot capture.
- Create `python/.claude/commands/agent-work-orders/review_runner.md`
- Command takes arguments: spec_file_path, work_order_id
- Command should:
- Read specification from spec_file_path
- Analyze implementation in codebase
- Start application (if UI component)
- Capture screenshots of key UI flows
- Compare implementation against spec requirements
- Categorize issues by severity: "blocker" | "tech_debt" | "skippable"
- Return JSON with structure:
```json
{
"review_passed": boolean,
"review_issues": [
{
"issue_title": "string",
"issue_description": "string",
"issue_severity": "blocker|tech_debt|skippable",
"affected_files": ["string"],
"screenshots": ["string"]
}
],
"screenshots": ["string"]
}
```
- Include review criteria and severity definitions
- Test command with sample specifications
### Step 13: Create Resolve Failed Review Command
Build Claude command to patch blocker issues from review.
- Create `python/.claude/commands/agent-work-orders/resolve_failed_review.md`
- Command takes single argument: review issue JSON object
- Command should:
- Parse review issue details
- Create patch plan addressing the issue
- Implement the patch (code changes)
- Verify patch resolves the issue
- Report success/failure
- Include constraints (only fix blocker issues, maintain functionality)
- Add examples of common review issue patterns
- Test command with sample review issues
### Step 14: Implement Review Workflow Module
Create the review workflow module with automatic blocker patching.
- Create `python/src/agent_work_orders/workflow_engine/review_workflow.py`
- Implement functions:
- `run_review(executor, command_loader, spec_file, work_order_id, working_dir) -> ReviewResult` - Execute review
- `parse_review_results(output, logger) -> ReviewResult` - Parse JSON output
- `resolve_review_issue(executor, command_loader, issue_json, work_order_id, working_dir) -> StepExecutionResult` - Patch single issue
- `run_review_with_resolution(executor, command_loader, spec_file, work_order_id, working_dir, max_attempts=3) -> ReviewResult` - Main retry loop
- Implement retry logic:
- Run review, check for blocker issues
- If blockers exist and attempts < max_attempts: resolve each blocker
- Re-run review after patching
- Stop if no blockers or max attempts reached
- Allow tech_debt and skippable issues to pass
- Add ReviewResult and ReviewIssue models to models.py
- Write comprehensive unit tests for review workflow
### Step 15: Add Review Workflow Operation
Create atomic operation for review execution in workflow_operations.py.
- Update `python/src/agent_work_orders/workflow_engine/workflow_operations.py`
- Add function:
```python
async def execute_review(
executor: AgentCLIExecutor,
command_loader: ClaudeCommandLoader,
spec_file: str,
work_order_id: str,
working_dir: str,
) -> StepExecutionResult
```
- Function should:
- Call `run_review_with_resolution()` from review_workflow.py
- Return StepExecutionResult with review summary
- Include blocker count in output
- Log detailed review results
- Add REVIEWER constant to agent_names.py
- Write unit tests for execute_review operation
### Step 16: Integrate Review Phase in Orchestrator
Add review phase to workflow orchestrator between TEST and CREATE_PR steps.
- Update `python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py`
- After test step, add:
```python
# Step 8: Run review with resolution
review_result = await workflow_operations.execute_review(
self.agent_executor,
self.command_loader,
plan_file or "",
agent_work_order_id,
sandbox.working_dir,
)
step_history.steps.append(review_result)
await self.state_repository.save_step_history(agent_work_order_id, step_history)
if not review_result.success:
raise WorkflowExecutionError(f"Review failed: {review_result.error_message}")
bound_logger.info("step_completed", step="review")
```
- Update step numbering (PR creation becomes step 9)
- Add review failure handling strategy
- Write integration tests for full workflow with review phase
### Step 17: Refactor Orchestrator for Composition
Refactor workflow orchestrator to support modular composition.
- Update `python/src/agent_work_orders/workflow_engine/workflow_orchestrator.py`
- Extract workflow phases into separate methods:
- `_execute_planning_phase()` - classify → plan → find_plan → generate_branch
- `_execute_implementation_phase()` - implement → commit
- `_execute_testing_phase()` - test → resolve_test (if needed)
- `_execute_review_phase()` - review → resolve_review (if needed)
- `_execute_deployment_phase()` - create_pr
- Update `execute_workflow()` to compose phases:
```python
await self._execute_planning_phase(...)
await self._execute_implementation_phase(...)
await self._execute_testing_phase(...)
await self._execute_review_phase(...)
await self._execute_deployment_phase(...)
```
- Add phase-level error handling and recovery
- Support skipping phases via configuration
- Write unit tests for each phase method
### Step 18: Add Configuration for New Features
Add configuration options for worktrees, ports, and new workflow phases.
- Update `python/src/agent_work_orders/config.py`
- Add configuration:
```python
# Worktree configuration
WORKTREE_BASE_DIR: str = os.getenv("WORKTREE_BASE_DIR", "trees")
# Port allocation
BACKEND_PORT_RANGE_START: int = int(os.getenv("BACKEND_PORT_START", "9100"))
BACKEND_PORT_RANGE_END: int = int(os.getenv("BACKEND_PORT_END", "9114"))
FRONTEND_PORT_RANGE_START: int = int(os.getenv("FRONTEND_PORT_START", "9200"))
FRONTEND_PORT_RANGE_END: int = int(os.getenv("FRONTEND_PORT_END", "9214"))
# Test workflow
MAX_TEST_RETRY_ATTEMPTS: int = int(os.getenv("MAX_TEST_RETRY_ATTEMPTS", "4"))
ENABLE_TEST_PHASE: bool = os.getenv("ENABLE_TEST_PHASE", "true").lower() == "true"
# Review workflow
MAX_REVIEW_RETRY_ATTEMPTS: int = int(os.getenv("MAX_REVIEW_RETRY_ATTEMPTS", "3"))
ENABLE_REVIEW_PHASE: bool = os.getenv("ENABLE_REVIEW_PHASE", "true").lower() == "true"
ENABLE_SCREENSHOT_CAPTURE: bool = os.getenv("ENABLE_SCREENSHOT_CAPTURE", "true").lower() == "true"
# State management
STATE_STORAGE_TYPE: str = os.getenv("STATE_STORAGE_TYPE", "memory") # "memory" or "file"
FILE_STATE_DIRECTORY: str = os.getenv("FILE_STATE_DIRECTORY", "agent-work-orders-state")
```
- Update `.env.example` with new configuration options
- Document configuration in README
### Step 19: Create Documentation
Document the new compositional architecture and workflows.
- Create `docs/compositional-workflows.md`:
- Architecture overview
- Compositional design principles
- Phase composition examples
- Error handling and recovery
- Configuration guide
- Create `docs/worktree-management.md`:
- Worktree vs temporary clone comparison
- Parallelization capabilities
- Port allocation system
- Cleanup and maintenance
- Create `docs/test-resolution.md`:
- Test workflow overview
- Retry logic explanation
- Test resolution examples
- Troubleshooting failed tests
- Create `docs/review-resolution.md`:
- Review workflow overview
- Screenshot capture setup
- Issue severity definitions
- Blocker patching process
- R2 upload configuration
### Step 20: Run Validation Commands
Execute all validation commands to ensure the feature works correctly with zero regressions.
- Run backend tests: `cd python && uv run pytest tests/agent_work_orders/ -v`
- Run backend linting: `cd python && uv run ruff check src/agent_work_orders/`
- Run type checking: `cd python && uv run mypy src/agent_work_orders/`
- Test worktree creation manually:
```bash
cd python
python -c "
from src.agent_work_orders.utils.worktree_operations import create_worktree
from src.agent_work_orders.utils.structured_logger import get_logger
logger = get_logger('test')
path, err = create_worktree('test-wo-123', 'test-branch', logger)
print(f'Path: {path}, Error: {err}')
"
```
- Test port allocation:
```bash
cd python
python -c "
from src.agent_work_orders.utils.port_allocation import get_ports_for_work_order
backend, frontend = get_ports_for_work_order('test-wo-123')
print(f'Backend: {backend}, Frontend: {frontend}')
"
```
- Create test work order with new workflow:
```bash
curl -X POST http://localhost:8181/agent-work-orders \
-H "Content-Type: application/json" \
-d '{
"repository_url": "https://github.com/your-test-repo",
"sandbox_type": "git_worktree",
"workflow_type": "agent_workflow_plan",
"user_request": "Add a new feature with tests"
}'
```
- Verify worktree created under `trees/<work_order_id>/`
- Verify `.ports.env` created in worktree
- Monitor workflow execution through all phases
- Verify test phase runs and resolves failures
- Verify review phase runs and patches blockers
- Verify PR created successfully
- Clean up test worktrees: `git worktree prune`
## Testing Strategy
### Unit Tests
**Worktree Management**:
- Test worktree creation with valid repository
- Test worktree creation with invalid branch
- Test worktree validation (three-way check)
- Test worktree cleanup
- Test handling of existing worktrees
**Port Allocation**:
- Test deterministic port assignment from work order ID
- Test port availability checking
- Test finding next available ports with collision
- Test port range boundaries (9100-9114, 9200-9214)
- Test `.ports.env` file generation
**Test Workflow**:
- Test parsing valid test result JSON
- Test parsing malformed test result JSON
- Test retry loop with all tests passing
- Test retry loop with some tests failing then passing
- Test retry loop reaching max attempts
- Test individual test resolution
**Review Workflow**:
- Test parsing valid review result JSON
- Test parsing malformed review result JSON
- Test retry loop with no blocker issues
- Test retry loop with blockers then resolved
- Test retry loop reaching max attempts
- Test issue severity filtering
**State Management**:
- Test saving state to JSON file
- Test loading state from JSON file
- Test updating specific state fields
- Test handling missing state files
- Test concurrent state access
### Integration Tests
**End-to-End Workflow**:
- Test complete workflow with worktree sandbox: classify → plan → implement → commit → test → review → PR
- Test test phase with intentional test failure and resolution
- Test review phase with intentional blocker issue and patching
- Test parallel execution of multiple work orders with different ports
- Test workflow resumption after failure
- Test cleanup of worktrees after completion
**Sandbox Integration**:
- Test command execution in worktree context
- Test git operations in worktree
- Test branch creation in worktree
- Test worktree isolation (parallel instances don't interfere)
**State Persistence**:
- Test state survives service restart (file-based)
- Test state migration from memory to file
- Test state corruption recovery
### Edge Cases
**Worktree Edge Cases**:
- Worktree already exists (should reuse or fail gracefully)
- Git repository unreachable (should fail setup)
- Insufficient disk space for worktree (should fail with clear error)
- Worktree removal fails (should log error and continue)
- Maximum worktrees reached (15 concurrent) - should queue or fail
**Port Allocation Edge Cases**:
- All ports in range occupied (should fail with error)
- Port becomes occupied between allocation and use (should retry)
- Invalid port range in configuration (should fail validation)
**Test Workflow Edge Cases**:
- Test command times out (should mark as failed)
- Test command returns invalid JSON (should fail gracefully)
- All tests fail and none can be resolved (should fail after max attempts)
- Test resolution introduces new failures (should continue with retry loop)
**Review Workflow Edge Cases**:
- Review command crashes (should fail gracefully)
- Screenshot capture fails (should continue review without screenshots)
- Review finds only skippable issues (should pass)
- Blocker patch introduces new blocker (should continue with retry loop)
- Spec file not found (should fail with clear error)
**State Management Edge Cases**:
- State file corrupted (should fail with recovery suggestion)
- State directory not writable (should fail with permission error)
- Concurrent access to same state file (should handle with locking or fail safely)
## Acceptance Criteria
- [ ] GitWorktreeSandbox successfully creates and manages worktrees under `trees/<work_order_id>/`
- [ ] Port allocation deterministically assigns unique ports (backend: 9100-9114, frontend: 9200-9214) based on work order ID
- [ ] Multiple work orders (at least 3) can run in parallel without port or filesystem conflicts
- [ ] `.ports.env` file is created in each worktree with correct port configuration
- [ ] Test workflow successfully runs test suite and returns structured JSON results
- [ ] Test workflow automatically resolves failed tests up to 4 attempts
- [ ] Test workflow stops retrying when all tests pass
- [ ] Review workflow successfully reviews implementation against spec
- [ ] Review workflow captures screenshots (when enabled)
- [ ] Review workflow categorizes issues by severity (blocker/tech_debt/skippable)
- [ ] Review workflow automatically patches blocker issues up to 3 attempts
- [ ] Review workflow allows tech_debt and skippable issues to pass
- [ ] WorkflowStep enum includes TEST, RESOLVE_TEST, REVIEW, RESOLVE_REVIEW steps
- [ ] Workflow orchestrator executes all phases: planning → implementation → testing → review → deployment
- [ ] File-based state repository persists state to JSON files
- [ ] State survives service restarts when using file-based storage
- [ ] Configuration supports enabling/disabling test and review phases
- [ ] All existing tests pass with zero regressions
- [ ] New unit tests achieve >80% code coverage for new modules
- [ ] Integration tests verify end-to-end workflow with parallel execution
- [ ] Documentation covers compositional architecture, worktrees, test resolution, and review resolution
- [ ] Cleanup of worktrees works correctly (git worktree remove + prune)
- [ ] Error messages are clear and actionable for all failure scenarios
## Validation Commands
Execute every command to validate the feature works correctly with zero regressions.
### Backend Tests
- `cd python && uv run pytest tests/agent_work_orders/ -v --tb=short` - Run all agent work orders tests
- `cd python && uv run pytest tests/agent_work_orders/sandbox_manager/ -v` - Test sandbox management
- `cd python && uv run pytest tests/agent_work_orders/workflow_engine/ -v` - Test workflow engine
- `cd python && uv run pytest tests/agent_work_orders/utils/ -v` - Test utilities
### Code Quality
- `cd python && uv run ruff check src/agent_work_orders/` - Check code quality
- `cd python && uv run mypy src/agent_work_orders/` - Type checking
### Manual Worktree Testing
```bash
# Test worktree creation
cd python
python -c "
from src.agent_work_orders.utils.worktree_operations import create_worktree, validate_worktree, remove_worktree
from src.agent_work_orders.utils.structured_logger import get_logger
logger = get_logger('test')
# Create worktree
path, err = create_worktree('test-wo-123', 'test-branch', logger)
print(f'Created worktree at: {path}')
assert err is None, f'Error: {err}'
# Validate worktree
from src.agent_work_orders.state_manager.file_state_repository import FileStateRepository
state_repo = FileStateRepository('test-state')
state_data = {'worktree_path': path}
valid, err = validate_worktree('test-wo-123', state_data)
assert valid, f'Validation failed: {err}'
# Remove worktree
success, err = remove_worktree('test-wo-123', logger)
assert success, f'Removal failed: {err}'
print('Worktree lifecycle test passed!')
"
```
### Manual Port Allocation Testing
```bash
cd python
python -c "
from src.agent_work_orders.utils.port_allocation import get_ports_for_work_order, find_next_available_ports, is_port_available
backend, frontend = get_ports_for_work_order('test-wo-123')
print(f'Ports for test-wo-123: Backend={backend}, Frontend={frontend}')
assert 9100 <= backend <= 9114, f'Backend port out of range: {backend}'
assert 9200 <= frontend <= 9214, f'Frontend port out of range: {frontend}'
# Test availability check
available = is_port_available(backend)
print(f'Backend port {backend} available: {available}')
# Test finding next available
next_backend, next_frontend = find_next_available_ports('test-wo-456')
print(f'Next available ports: Backend={next_backend}, Frontend={next_frontend}')
print('Port allocation test passed!')
"
```
### Integration Testing
```bash
# Start agent work orders service
docker compose up -d archon-server
# Create work order with worktree sandbox
curl -X POST http://localhost:8181/agent-work-orders \
-H "Content-Type: application/json" \
-d '{
"repository_url": "https://github.com/coleam00/archon",
"sandbox_type": "git_worktree",
"workflow_type": "agent_workflow_plan",
"user_request": "Fix issue #123"
}'
# Verify worktree created
ls -la trees/
# Monitor workflow progress
watch -n 2 'curl -s http://localhost:8181/agent-work-orders | jq'
# Verify .ports.env in worktree
cat trees/<work_order_id>/.ports.env
# After completion, verify cleanup
git worktree list
```
### Parallel Execution Testing
```bash
# Create 3 work orders simultaneously
for i in 1 2 3; do
curl -X POST http://localhost:8181/agent-work-orders \
-H "Content-Type: application/json" \
-d "{
\"repository_url\": \"https://github.com/coleam00/archon\",
\"sandbox_type\": \"git_worktree\",
\"workflow_type\": \"agent_workflow_plan\",
\"user_request\": \"Parallel test $i\"
}" &
done
wait
# Verify all worktrees exist
ls -la trees/
# Verify different ports allocated
for dir in trees/*/; do
echo "Worktree: $dir"
cat "$dir/.ports.env"
echo "---"
done
```
## Notes
### Architecture Decision: Compositional vs Centralized
This feature implements Option B (compositional refactoring) because:
1. **Scalability**: Compositional design enables running individual phases (e.g., just test or just review) without full workflow
2. **Debugging**: Independent scripts are easier to test and debug in isolation
3. **Flexibility**: Users can compose custom workflows (e.g., skip review for simple PRs)
4. **Maintainability**: Smaller, focused modules are easier to maintain than monolithic orchestrator
5. **Parallelization**: Worktree-based approach inherently supports compositional execution
### Performance Considerations
- **Worktree Creation**: Worktrees are faster than clones (~2-3x) because they share the same .git directory
- **Port Allocation**: Hash-based allocation is deterministic but may have collisions; fallback to linear search adds minimal overhead
- **Retry Loops**: Test (4 attempts) and review (3 attempts) retry limits prevent infinite loops while allowing reasonable resolution attempts
- **State I/O**: File-based state adds disk I/O but enables persistence; consider eventual move to database for high-volume deployments
### Future Enhancements
1. **Database State**: Replace file-based state with PostgreSQL/Supabase for better concurrent access and querying
2. **WebSocket Updates**: Stream test/review progress to UI in real-time
3. **Screenshot Upload**: Integrate R2/S3 for screenshot storage and PR comments with images
4. **Workflow Resumption**: Support resuming failed workflows from last successful step
5. **Custom Workflows**: Allow users to define custom workflow compositions via config
6. **Metrics**: Add OpenTelemetry instrumentation for workflow performance monitoring
7. **E2E Testing**: Add Playwright/Cypress integration for UI-focused review
8. **Distributed Execution**: Support running work orders across multiple machines
### Migration Path
For existing deployments:
1. **Backward Compatibility**: Keep GitBranchSandbox working alongside GitWorktreeSandbox
2. **Gradual Migration**: Default to GIT_BRANCH, opt-in to GIT_WORKTREE via configuration
3. **State Migration**: Provide utility to migrate in-memory state to file-based state
4. **Cleanup**: Add command to clean up old temporary clones: `rm -rf /tmp/agent-work-orders/*`
### Dependencies
New dependencies to add via `uv add`:
- (None required - uses existing git, pytest, claude CLI)
### Related Issues/PRs
- #XXX - Original agent-work-orders MVP implementation
- #XXX - Worktree isolation discussion
- #XXX - Test phase feature request
- #XXX - Review automation proposal