Files
archon/PRPs/specs/fix-jsonl-result-extraction-and-argument-passing.md
Rasmus Widing 9a60d6ae89 sauce aow
2025-10-16 19:17:18 +03:00

29 KiB

Feature: Fix JSONL Result Extraction and Argument Passing

Feature Description

Fix critical integration issues between Agent Work Orders system and Claude CLI that prevent workflow execution from completing successfully. The system currently fails to extract the actual result text from Claude CLI's JSONL output stream and doesn't properly pass arguments to command files using the $ARGUMENTS placeholder pattern.

These fixes enable the atomic workflow execution pattern to work end-to-end by ensuring clean data flow between workflow steps.

User Story

As a developer using the Agent Work Orders system I want workflows to execute successfully end-to-end So that I can automate development tasks via GitHub issues without manual intervention

Problem Statement

The first real-world test of the atomic workflow execution system (work order wo-18d08ae8, repository: https://github.com/Wirasm/dylan.git, issue #1) revealed two critical failures that prevent workflow completion:

Problem 1: JSONL Result Not Extracted

  • workflow_operations.py uses result.stdout.strip() to get agent output
  • result.stdout contains the entire JSONL stream (multiple lines of JSON messages)
  • The actual agent result is in the "result" field of the final JSONL message with type:"result"
  • Consequence: Downstream steps receive JSONL garbage instead of clean output

Observed Example:

# What we're currently doing (WRONG):
issue_class = result.stdout.strip()
# Gets: '{"type":"session_started","session_id":"..."}\n{"type":"result","result":"/feature","is_error":false}'

# What we should do (CORRECT):
issue_class = result.result_text.strip()
# Gets: "/feature"

Problem 2: $ARGUMENTS Placeholder Not Replaced

  • Command files use $ARGUMENTS placeholder for dynamic content (ADW pattern)
  • AgentCLIExecutor.build_command() appends args to prompt but doesn't replace placeholder
  • Claude CLI receives literal "$ARGUMENTS" text instead of actual issue JSON
  • Consequence: Agents cannot access input data needed to perform their task

Observed Failure:

Step 1 (Classifier): ✅ Executed BUT ❌ Wrong Output
- Agent response: "I need to see the GitHub issue content. The $ARGUMENTS placeholder shows {}"
- Output: Full JSONL stream instead of "/feature", "/bug", or "/chore"
- Session ID: 06f225c7-bcd8-436c-8738-9fa744c8eee6

Step 2 (Planner): ❌ Failed Immediately
- Received JSONL as issue_class: {"type":"result"...}
- Error: "Unknown issue class: {JSONL output...}"
- Workflow halted - cannot proceed without clean classification

Solution Statement

Implement two critical fixes to enable proper Claude CLI integration:

Fix 1: Extract result_text from JSONL Output

  • Add result_text field to CommandExecutionResult model
  • Extract the "result" field value from JSONL's final result message in AgentCLIExecutor
  • Update all workflow_operations.py functions to use result.result_text instead of result.stdout
  • Preserve stdout for debugging (contains full JSONL stream)

Fix 2: Replace $ARGUMENTS and Positional Placeholders

  • Modify AgentCLIExecutor.build_command() to replace $ARGUMENTS with actual arguments
  • Support both $ARGUMENTS (all args) and $1, $2, $3 (positional args)
  • Pre-process command file content before passing to Claude CLI
  • Remove old code that appended "Arguments: ..." to end of prompt

This enables atomic workflows to execute correctly with clean data flow between steps.

Relevant Files

Use these files to implement the feature:

Core Models - Add result extraction field

  • python/src/agent_work_orders/models.py:180-190 - CommandExecutionResult model needs result_text field to store extracted result

Agent Executor - Implement JSONL parsing and argument replacement

  • python/src/agent_work_orders/agent_executor/agent_cli_executor.py:25-88 - build_command() needs $ARGUMENTS replacement logic (line 61-62 currently just appends args)
  • python/src/agent_work_orders/agent_executor/agent_cli_executor.py:90-236 - execute_async() needs result_text extraction (around line 170-175)
  • python/src/agent_work_orders/agent_executor/agent_cli_executor.py:337-363 - _extract_result_message() already extracts result dict, need to get "result" field value

Workflow Operations - Use extracted result_text instead of stdout

  • python/src/agent_work_orders/workflow_engine/workflow_operations.py:26-79 - classify_issue() line 51 uses result.stdout.strip()
  • python/src/agent_work_orders/workflow_engine/workflow_operations.py:82-155 - build_plan() line 133 uses result.stdout
  • python/src/agent_work_orders/workflow_engine/workflow_operations.py:158-213 - find_plan_file() line 185 uses result.stdout
  • python/src/agent_work_orders/workflow_engine/workflow_operations.py:216-267 - implement_plan() line 245 uses result.stdout
  • python/src/agent_work_orders/workflow_engine/workflow_operations.py:270-326 - generate_branch() line 299 uses result.stdout
  • python/src/agent_work_orders/workflow_engine/workflow_operations.py:329-385 - create_commit() line 358 uses result.stdout
  • python/src/agent_work_orders/workflow_engine/workflow_operations.py:388-444 - create_pull_request() line 417 uses result.stdout

Tests - Update and add test coverage

  • python/tests/agent_work_orders/test_models.py - Add tests for CommandExecutionResult with result_text field
  • python/tests/agent_work_orders/test_agent_executor.py - Add tests for result extraction and argument replacement
  • python/tests/agent_work_orders/test_workflow_operations.py:1-398 - Update ALL mocks to include result_text field (currently missing)

Command Files - Examples using $ARGUMENTS that need to work

  • .claude/commands/agent-work-orders/classify_issue.md:19-21 - Uses $ARGUMENTS placeholder
  • .claude/commands/agent-work-orders/feature.md - Uses $ARGUMENTS placeholder
  • .claude/commands/agent-work-orders/bug.md - Uses positional $1, $2, $3

New Files

No new files needed - all changes are modifications to existing files.

Implementation Plan

Phase 1: Foundation - Model Enhancement

Add the result_text field to CommandExecutionResult so we can store the extracted result value separately from the raw JSONL stdout. This is a backward-compatible change.

Phase 2: Core Implementation - Result Extraction

Implement the logic to parse JSONL output and extract the "result" field value into result_text during command execution in AgentCLIExecutor.

Phase 3: Core Implementation - Argument Replacement

Implement placeholder replacement logic in build_command() to support $ARGUMENTS and $1, $2, $3 patterns in command files.

Phase 4: Integration - Update Workflow Operations

Update all 7 workflow operation functions to use result_text instead of stdout for cleaner data flow between atomic steps.

Phase 5: Testing and Validation

Comprehensive test coverage for both fixes and end-to-end validation with actual workflow execution.

Step by Step Tasks

IMPORTANT: Execute every step in order, top to bottom.

Add result_text Field to CommandExecutionResult Model

  • Open python/src/agent_work_orders/models.py
  • Locate the CommandExecutionResult class (line 180)
  • Add new optional field after stdout:
    result_text: str | None = None
    
  • Add inline comment above the field: # Extracted result text from JSONL "result" field (if available)
  • Verify the model definition is complete and properly formatted
  • Save the file

Implement Result Text Extraction in execute_async()

  • Open python/src/agent_work_orders/agent_executor/agent_cli_executor.py
  • Locate the execute_async() method
  • Find the section around line 170-175 where _extract_result_message() is called
  • After line 173 result_message = self._extract_result_message(stdout_text), add:
    # Extract result text from JSONL result message
    result_text: str | None = None
    if result_message and "result" in result_message:
        result_value = result_message.get("result")
        # Convert result to string (handles both str and other types)
        result_text = str(result_value) if result_value is not None else None
    else:
        result_text = None
    
  • Update the CommandExecutionResult instantiation (around line 191) to include the new field:
    result = CommandExecutionResult(
        success=success,
        stdout=stdout_text,
        result_text=result_text,  # NEW: Add this line
        stderr=stderr_text,
        exit_code=process.returncode or 0,
        session_id=session_id,
        error_message=error_message,
        duration_seconds=duration,
    )
    
  • Add debug logging after extraction (before the result object is created):
    if result_text:
        self._logger.debug(
            "result_text_extracted",
            result_text_preview=result_text[:100] if len(result_text) > 100 else result_text,
            work_order_id=work_order_id
        )
    
  • Save the file

Implement $ARGUMENTS Placeholder Replacement in build_command()

  • Still in python/src/agent_work_orders/agent_executor/agent_cli_executor.py
  • Locate the build_command() method (line 25-88)
  • Find the section around line 60-62 that handles arguments
  • Replace the current args handling code:
    # OLD CODE TO REMOVE:
    # if args:
    #     prompt_text += f"\n\nArguments: {', '.join(args)}"
    
    # NEW CODE:
    # Replace argument placeholders in prompt text
    if args:
        # Replace $ARGUMENTS with first arg (or all args joined if multiple)
        prompt_text = prompt_text.replace("$ARGUMENTS", args[0] if len(args) == 1 else ", ".join(args))
    
        # Replace positional placeholders ($1, $2, $3, etc.)
        for i, arg in enumerate(args, start=1):
            prompt_text = prompt_text.replace(f"${i}", arg)
    
  • Save the file

Update classify_issue() to Use result_text

  • Open python/src/agent_work_orders/workflow_engine/workflow_operations.py
  • Locate the classify_issue() function (starts at line 26)
  • Find line 50-51 that extracts issue_class
  • Replace with:
    # OLD: if result.success and result.stdout:
    #         issue_class = result.stdout.strip()
    
    # NEW: Use result_text which contains the extracted result
    if result.success and result.result_text:
        issue_class = result.result_text.strip()
    
  • Verify the rest of the function logic remains unchanged
  • Save the file

Update build_plan() to Use result_text

  • Still in python/src/agent_work_orders/workflow_engine/workflow_operations.py
  • Locate the build_plan() function (starts at line 82)
  • Find line 133 in the success case
  • Replace output=result.stdout or "" with:
    output=result.result_text or result.stdout or ""
    
  • Note: We use fallback to stdout for backward compatibility during transition
  • Save the file

Update find_plan_file() to Use result_text

  • Still in python/src/agent_work_orders/workflow_engine/workflow_operations.py
  • Locate the find_plan_file() function (starts at line 158)
  • Find line 185 that checks stdout
  • Replace with:
    # OLD: if result.success and result.stdout and result.stdout.strip() != "0":
    #         plan_file_path = result.stdout.strip()
    
    # NEW: Use result_text
    if result.success and result.result_text and result.result_text.strip() != "0":
        plan_file_path = result.result_text.strip()
    
  • Save the file

Update implement_plan() to Use result_text

  • Still in python/src/agent_work_orders/workflow_engine/workflow_operations.py
  • Locate the implement_plan() function (starts at line 216)
  • Find line 245 in the success case
  • Replace output=result.stdout or "" with:
    output=result.result_text or result.stdout or ""
    
  • Save the file

Update generate_branch() to Use result_text

  • Still in python/src/agent_work_orders/workflow_engine/workflow_operations.py
  • Locate the generate_branch() function (starts at line 270)
  • Find line 298-299 that extracts branch_name
  • Replace with:
    # OLD: if result.success and result.stdout:
    #         branch_name = result.stdout.strip()
    
    # NEW: Use result_text
    if result.success and result.result_text:
        branch_name = result.result_text.strip()
    
  • Save the file

Update create_commit() to Use result_text

  • Still in python/src/agent_work_orders/workflow_engine/workflow_operations.py
  • Locate the create_commit() function (starts at line 329)
  • Find line 357-358 that extracts commit_message
  • Replace with:
    # OLD: if result.success and result.stdout:
    #         commit_message = result.stdout.strip()
    
    # NEW: Use result_text
    if result.success and result.result_text:
        commit_message = result.result_text.strip()
    
  • Save the file

Update create_pull_request() to Use result_text

  • Still in python/src/agent_work_orders/workflow_engine/workflow_operations.py
  • Locate the create_pull_request() function (starts at line 388)
  • Find line 416-417 that extracts pr_url
  • Replace with:
    # OLD: if result.success and result.stdout:
    #         pr_url = result.stdout.strip()
    
    # NEW: Use result_text
    if result.success and result.result_text:
        pr_url = result.result_text.strip()
    
  • Save the file
  • Verify all 7 workflow operations now use result_text

Add Model Tests for result_text Field

  • Open python/tests/agent_work_orders/test_models.py
  • Add new test function at the end of the file:
    def test_command_execution_result_with_result_text():
        """Test CommandExecutionResult includes result_text field"""
        result = CommandExecutionResult(
            success=True,
            stdout='{"type":"result","result":"/feature"}',
            result_text="/feature",
            stderr=None,
            exit_code=0,
            session_id="session-123",
        )
        assert result.result_text == "/feature"
        assert result.stdout == '{"type":"result","result":"/feature"}'
        assert result.success is True
    
    def test_command_execution_result_without_result_text():
        """Test CommandExecutionResult works without result_text (backward compatibility)"""
        result = CommandExecutionResult(
            success=True,
            stdout="raw output",
            stderr=None,
            exit_code=0,
        )
        assert result.result_text is None
        assert result.stdout == "raw output"
    
  • Save the file

Add Agent Executor Tests for Result Extraction

  • Open python/tests/agent_work_orders/test_agent_executor.py
  • Add new test function:
    @pytest.mark.asyncio
    async def test_execute_async_extracts_result_text():
        """Test that result text is extracted from JSONL output"""
        executor = AgentCLIExecutor()
    
        # Mock subprocess that returns JSONL with result
        jsonl_output = '{"type":"session_started","session_id":"test-123"}\n{"type":"result","result":"/feature","is_error":false}'
    
        with patch("asyncio.create_subprocess_shell") as mock_subprocess:
            mock_process = AsyncMock()
            mock_process.communicate = AsyncMock(return_value=(jsonl_output.encode(), b""))
            mock_process.returncode = 0
            mock_subprocess.return_value = mock_process
    
            result = await executor.execute_async(
                "claude --print",
                "/tmp/test",
                prompt_text="test prompt",
                work_order_id="wo-test"
            )
    
            assert result.success is True
            assert result.result_text == "/feature"
            assert result.session_id == "test-123"
            assert '{"type":"result"' in result.stdout
    
  • Save the file

Add Agent Executor Tests for Argument Replacement

  • Still in python/tests/agent_work_orders/test_agent_executor.py
  • Add new test functions:
    def test_build_command_replaces_arguments_placeholder():
        """Test that $ARGUMENTS placeholder is replaced with actual arguments"""
        executor = AgentCLIExecutor()
    
        # Create temp command file with $ARGUMENTS
        import tempfile
        with tempfile.NamedTemporaryFile(mode='w', suffix='.md', delete=False) as f:
            f.write("Classify this issue:\\n\\n$ARGUMENTS")
            temp_file = f.name
    
        try:
            command, prompt = executor.build_command(
                temp_file,
                args=['{"title": "Add feature", "body": "description"}']
            )
    
            assert "$ARGUMENTS" not in prompt
            assert '{"title": "Add feature"' in prompt
            assert "Classify this issue:" in prompt
        finally:
            import os
            os.unlink(temp_file)
    
    def test_build_command_replaces_positional_arguments():
        """Test that $1, $2, $3 are replaced with positional arguments"""
        executor = AgentCLIExecutor()
    
        import tempfile
        with tempfile.NamedTemporaryFile(mode='w', suffix='.md', delete=False) as f:
            f.write("Issue: $1\\nWorkOrder: $2\\nData: $3")
            temp_file = f.name
    
        try:
            command, prompt = executor.build_command(
                temp_file,
                args=["42", "wo-test", '{"title":"Test"}']
            )
    
            assert "$1" not in prompt
            assert "$2" not in prompt
            assert "$3" not in prompt
            assert "Issue: 42" in prompt
            assert "WorkOrder: wo-test" in prompt
            assert 'Data: {"title":"Test"}' in prompt
        finally:
            import os
            os.unlink(temp_file)
    
  • Save the file

Update All Workflow Operations Test Mocks

  • Open python/tests/agent_work_orders/test_workflow_operations.py
  • Find every CommandExecutionResult mock and add result_text field
  • Update test_classify_issue_success (line 27-34):
    mock_executor.execute_async = AsyncMock(
        return_value=CommandExecutionResult(
            success=True,
            stdout='{"type":"result","result":"/feature"}',
            result_text="/feature",  # ADD THIS
            stderr=None,
            exit_code=0,
            session_id="session-123",
        )
    )
    
  • Repeat for all other test functions:
    • test_build_plan_feature_success (line 93-100) - add result_text="Plan created successfully"
    • test_build_plan_bug_success (line 128-135) - add result_text="Bug plan created"
    • test_find_plan_file_success (line 180-187) - add result_text="specs/issue-42-wo-test-planner-feature.md"
    • test_find_plan_file_not_found (line 213-220) - add result_text="0"
    • test_implement_plan_success (line 243-250) - add result_text="Implementation completed"
    • test_generate_branch_success (line 274-281) - add result_text="feat-issue-42-wo-test-add-feature"
    • test_create_commit_success (line 307-314) - add result_text="implementor: feat: add user authentication"
    • test_create_pull_request_success (line 339-346) - add result_text="https://github.com/owner/repo/pull/123"
  • Save the file

Run Model Unit Tests

  • Execute: cd python && uv run pytest tests/agent_work_orders/test_models.py::test_command_execution_result_with_result_text -v
  • Verify test passes
  • Execute: cd python && uv run pytest tests/agent_work_orders/test_models.py::test_command_execution_result_without_result_text -v
  • Verify test passes

Run Agent Executor Unit Tests

  • Execute: cd python && uv run pytest tests/agent_work_orders/test_agent_executor.py::test_execute_async_extracts_result_text -v
  • Verify result extraction test passes
  • Execute: cd python && uv run pytest tests/agent_work_orders/test_agent_executor.py::test_build_command_replaces_arguments_placeholder -v
  • Verify $ARGUMENTS replacement test passes
  • Execute: cd python && uv run pytest tests/agent_work_orders/test_agent_executor.py::test_build_command_replaces_positional_arguments -v
  • Verify positional argument test passes

Run Workflow Operations Unit Tests

  • Execute: cd python && uv run pytest tests/agent_work_orders/test_workflow_operations.py -v
  • Verify all 9+ tests pass with updated mocks
  • Check for any assertion failures related to result_text

Run Full Test Suite

  • Execute: cd python && uv run pytest tests/agent_work_orders/ -v
  • Target: 100% of tests pass
  • If any tests fail, fix them immediately before proceeding
  • Execute: cd python && uv run pytest tests/agent_work_orders/ --cov=src/agent_work_orders --cov-report=term-missing
  • Verify >80% coverage for modified files

Run Type Checking

  • Execute: cd python && uv run mypy src/agent_work_orders/models.py
  • Verify no type errors in models
  • Execute: cd python && uv run mypy src/agent_work_orders/agent_executor/agent_cli_executor.py
  • Verify no type errors in executor
  • Execute: cd python && uv run mypy src/agent_work_orders/workflow_engine/workflow_operations.py
  • Verify no type errors in workflow operations

Run Linting

  • Execute: cd python && uv run ruff check src/agent_work_orders/models.py
  • Execute: cd python && uv run ruff check src/agent_work_orders/agent_executor/agent_cli_executor.py
  • Execute: cd python && uv run ruff check src/agent_work_orders/workflow_engine/workflow_operations.py
  • Fix any linting issues if found

Run End-to-End Integration Test

  • Start server: cd python && uv run uvicorn src.agent_work_orders.main:app --port 8888 &
  • Wait for startup: sleep 5
  • Test health: curl http://localhost:8888/health
  • Create work order:
    WORK_ORDER_ID=$(curl -X POST http://localhost:8888/agent-work-orders \
      -H "Content-Type: application/json" \
      -d '{
        "repository_url": "https://github.com/Wirasm/dylan.git",
        "sandbox_type": "git_branch",
        "workflow_type": "agent_workflow_plan",
        "github_issue_number": "1"
      }' | jq -r '.agent_work_order_id')
    echo "Work Order ID: $WORK_ORDER_ID"
    
  • Monitor: sleep 30
  • Check status: curl http://localhost:8888/agent-work-orders/$WORK_ORDER_ID | jq
  • Check steps: curl http://localhost:8888/agent-work-orders/$WORK_ORDER_ID/steps | jq '.steps[] | {step: .step, agent: .agent_name, success: .success, output: .output[:50]}'
  • Verify:
    • Classifier step shows output: "/feature" (NOT JSONL)
    • Planner step succeeded (received clean classification)
    • All subsequent steps executed
    • Final status is "completed" or shows specific error
  • Inspect logs: ls -la /tmp/agent-work-orders/*/
  • Check artifacts: cat /tmp/agent-work-orders/$WORK_ORDER_ID/outputs/*.jsonl | grep '"result"'
  • Stop server: pkill -f "uvicorn.*8888"

Validation Commands

Execute every command to validate the feature works correctly with zero regressions.

  • cd python && uv run pytest tests/agent_work_orders/test_models.py -v - Verify model tests pass
  • cd python && uv run pytest tests/agent_work_orders/test_agent_executor.py -v - Verify executor tests pass
  • cd python && uv run pytest tests/agent_work_orders/test_workflow_operations.py -v - Verify workflow operations tests pass
  • cd python && uv run pytest tests/agent_work_orders/ -v - All agent work orders tests
  • cd python && uv run pytest - Entire backend test suite (zero regressions)
  • cd python && uv run mypy src/agent_work_orders/ - Type check all modified code
  • cd python && uv run ruff check src/agent_work_orders/ - Lint all modified code
  • End-to-end test: Start server and create work order as documented above
  • Verify classifier returns clean "/feature" not JSONL
  • Verify planner receives correct classification
  • Verify workflow completes successfully

Testing Strategy

Unit Tests

CommandExecutionResult Model

  • Test result_text field accepts string values
  • Test result_text field accepts None (optional)
  • Test model serialization with result_text
  • Test backward compatibility (result_text=None works)

AgentCLIExecutor Result Extraction

  • Test extraction from valid JSONL with result field
  • Test extraction when result is string
  • Test extraction when result is number (should stringify)
  • Test extraction when result is object (should stringify)
  • Test no extraction when JSONL has no result message
  • Test no extraction when result message missing "result" field
  • Test handles malformed JSONL gracefully

AgentCLIExecutor Argument Replacement

  • Test $ARGUMENTS with single argument
  • Test $ARGUMENTS with multiple arguments
  • Test $1, $2, $3 positional replacement
  • Test mixed placeholders in one file
  • Test no replacement when args is None
  • Test no replacement when args is empty
  • Test command without placeholders

Workflow Operations

  • Test each operation uses result_text
  • Test each operation handles None result_text
  • Test fallback to stdout works
  • Test clean output flows to next step

Integration Tests

Complete Workflow

  • Test full workflow with real JSONL parsing
  • Test classifier → planner data flow
  • Test each step receives clean input
  • Test step history contains result_text values
  • Test error handling when result_text is None

Error Scenarios

  • Test malformed JSONL output
  • Test missing result field in JSONL
  • Test agent returns error in result
  • Test $ARGUMENTS not in command file (should still work)

Edge Cases

JSONL Parsing

  • Result message not last in stream
  • Multiple result messages
  • Result with is_error:true
  • Result value is null
  • Result value is boolean true/false
  • Result value is large object
  • Result value contains newlines

Argument Replacement

  • $ARGUMENTS appears multiple times
  • Positional args exceed provided args count
  • Args contain special characters
  • Args contain literal $ character
  • Very long arguments (>10KB)
  • Empty string arguments

Backward Compatibility

  • Old commands without placeholders
  • Workflow handles result_text=None gracefully
  • stdout still accessible for debugging

Acceptance Criteria

Core Functionality:

  • CommandExecutionResult model has result_text field
  • result_text extracted from JSONL "result" field
  • $ARGUMENTS placeholder replaced with arguments
  • $1, $2, $3 positional placeholders replaced
  • All 7 workflow operations use result_text
  • stdout preserved for debugging (backward compatible)

Test Results:

  • All existing tests pass (zero regressions)
  • New model tests pass
  • New executor tests pass
  • Updated workflow operations tests pass
  • >80% test coverage for modified files

Code Quality:

  • Type checking passes with no errors
  • Linting passes with no warnings
  • Code follows existing patterns
  • Docstrings updated where needed

End-to-End:

  • Classifier returns clean output: "/feature", "/bug", or "/chore"
  • Planner receives correct issue class (not JSONL)
  • All workflow steps execute successfully
  • Step history shows clean result_text values
  • Logs show result extraction working
  • Complete workflow creates PR

Validation Commands

# Unit Tests
cd python && uv run pytest tests/agent_work_orders/test_models.py -v
cd python && uv run pytest tests/agent_work_orders/test_agent_executor.py -v
cd python && uv run pytest tests/agent_work_orders/test_workflow_operations.py -v

# Full Suite
cd python && uv run pytest tests/agent_work_orders/ -v --tb=short
cd python && uv run pytest tests/agent_work_orders/ --cov=src/agent_work_orders --cov-report=term-missing
cd python && uv run pytest  # All backend tests

# Quality Checks
cd python && uv run mypy src/agent_work_orders/
cd python && uv run ruff check src/agent_work_orders/

# Integration Test
cd python && uv run uvicorn src.agent_work_orders.main:app --port 8888 &
sleep 5
curl http://localhost:8888/health | jq

# Create test work order
WORK_ORDER=$(curl -X POST http://localhost:8888/agent-work-orders \
  -H "Content-Type: application/json" \
  -d '{"repository_url":"https://github.com/Wirasm/dylan.git","sandbox_type":"git_branch","workflow_type":"agent_workflow_plan","github_issue_number":"1"}' \
  | jq -r '.agent_work_order_id')

echo "Work Order: $WORK_ORDER"
sleep 20

# Check execution
curl http://localhost:8888/agent-work-orders/$WORK_ORDER | jq
curl http://localhost:8888/agent-work-orders/$WORK_ORDER/steps | jq '.steps[] | {step, agent_name, success, output}'

# Verify logs
ls /tmp/agent-work-orders/*/outputs/
cat /tmp/agent-work-orders/*/outputs/*.jsonl | grep '"result"'

# Cleanup
pkill -f "uvicorn.*8888"

Notes

Design Decisions:

  • Preserve stdout containing raw JSONL for debugging
  • result_text is the new preferred field for clean output
  • Fallback to stdout in some workflow operations (defensive)
  • Support both $ARGUMENTS and $1, $2, $3 for flexibility
  • Backward compatible - optional fields, graceful fallbacks

Why This Fixes the Issue:

Before Fix:
  Classifier stdout: '{"type":"result","result":"/feature","is_error":false}'
  Planner receives:  '{"type":"result","result":"/feature","is_error":false}' ❌
  Error: "Unknown issue class: {JSONL...}"

After Fix:
  Classifier stdout:      '{"type":"result","result":"/feature","is_error":false}'
  Classifier result_text: "/feature"
  Planner receives:       "/feature" ✅
  Success: Clean classification flows to next step

Claude CLI JSONL Format:

{"type":"session_started","session_id":"abc-123"}
{"type":"text","text":"I'm analyzing..."}
{"type":"result","result":"/feature","is_error":false}

Future Improvements:

  • Add result_json field for structured data
  • Support more placeholder patterns (${ISSUE_NUMBER}, etc.)
  • Validate command files have required placeholders
  • Add metrics for result_text extraction success rate
  • Consider streaming result extraction for long-running agents

Migration Path:

  1. Add result_text field (backward compatible)
  2. Extract in executor (backward compatible)
  3. Update workflow operations (backward compatible - fallback)
  4. Deploy and validate
  5. Future: Remove stdout usage entirely