Commit Graph

117 Commits

Author SHA1 Message Date
sean-eskerium
068018a6a3 Update work order table to show branch name, and the commit operations count bug that is showing commits of the whole main branch vs. the work order changes. 2025-10-31 22:42:00 -04:00
sean-eskerium
a292ce2dfb Code review updates and moving the prp-review step to before the Commit. 2025-10-31 22:21:40 -04:00
sean-eskerium
95791456cd Merge remote-tracking branch 'origin/feat/agent_work_orders' into ui/agent-work-order 2025-10-25 14:32:33 -04:00
Rasmus Widing
bd6613014b feat: add supabase persistence for agent work orders 2025-10-24 20:37:57 +03:00
Rasmus Widing
71393520dc feat: add repository configuration system with defensive validation
- Add archon_configured_repositories table migration with production-ready sandbox type constraints
- Implement SupabaseWorkOrderRepository for CRUD operations with comprehensive error handling
- Add defensive validation in _row_to_model with detailed logging for invalid enum values
- Implement granular exception handling (409 duplicates, 422 validation, 502 GitHub API errors)
- Document async/await pattern for interface consistency across repository implementations
- Add Supabase health check to verify table existence
- Expand test coverage from 10 to 17 tests with error handling and edge case validation
- Add supabase dependency to agent-work-orders group
- Enable ENABLE_AGENT_WORK_ORDERS flag in docker-compose for production deployment
2025-10-24 20:01:15 +03:00
Rasmus Widing
6a8e784aab feat: make agent work orders an optional feature
Add ENABLE_AGENT_WORK_ORDERS configuration flag to allow disabling the agent work orders microservice. Service discovery now gracefully handles unavailable services, and health checks return appropriate status when feature is disabled.

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-24 15:56:34 +03:00
Rasmus Widing
8728c67448 fix: linting issues in agent work orders tests
- Sort imports consistently
- Remove unused imports (pytest, MagicMock, patch, etc.)
- Update to datetime.UTC alias from timezone.utc
- Fix formatting and organization issues
2025-10-24 00:07:32 +03:00
Rasmus Widing
d80a12f395 refactor: port allocation from dual ports to flexible port ranges
- Change from fixed backend/frontend ports to 10-port ranges per work order
- Support 20 concurrent work orders (200 ports: 9000-9199)
- Add port availability checking with flexible allocation
- Make git_worktree default sandbox type
- Standardize API routes with /api/ prefix
- Add comprehensive port allocation tests
- Update environment file generation with PORT_0-PORT_9 variables
- Maintain backward compatibility with BACKEND_PORT/FRONTEND_PORT aliases
2025-10-23 23:17:43 +03:00
Rasmus Widing
799d5a9dd7 Revert "chore: remove example workflow directory"
This reverts commit c2a568e08c.
2025-10-23 22:38:46 +03:00
Rasmus Widing
c2a568e08c chore: remove example workflow directory 2025-10-23 22:37:15 +03:00
sean-eskerium
a378c43cee Merge pull request #810 from coleam00/fix/bug-report-repository-url
fix: Update bug report to use centralized repository configuration
2025-10-23 06:58:39 -04:00
Rasmus Widing
b1a5c06844 feat: add github authentication for agent work orders pr creation 2025-10-23 12:57:12 +03:00
Rasmus Widing
f07cefd1a1 feat: add agent work orders microservice with hybrid deployment 2025-10-23 12:46:57 +03:00
leex279
13796abbe8 feat: Improve discovery system with SSRF protection and optimize file detection
## Backend Improvements

### Discovery Service
- Fix SSRF protection: Use requests.Session() for max_redirects parameter
- Add comprehensive IP validation (_is_safe_ip, _resolve_and_validate_hostname)
- Add hostname DNS resolution validation before requests
- Fix llms.txt link following to crawl ALL same-domain pages (not just llms.txt files)
- Remove unused file variants: llms.md, llms.markdown, sitemap_index.xml, sitemap-index.xml
- Optimize DISCOVERY_PRIORITY based on real-world usage research
- Update priority: llms.txt > llms-full.txt > sitemap.xml > robots.txt

### URL Handler
- Fix .well-known path to be case-sensitive per RFC 8615
- Remove llms.md, llms.markdown, llms.mdx from variant detection
- Simplify link collection patterns to only .txt files (most common)
- Update llms_variants list to only include spec-compliant files

### Crawling Service
- Add tldextract for proper root domain extraction (handles .co.uk, .com.au, etc.)
- Replace naive domain extraction with robust get_root_domain() function
- Add tldextract>=5.0.0 to dependencies

## Frontend Improvements

### Type Safety
- Extend ActiveOperation type with discovery fields (discovered_file, discovered_file_type, linked_files)
- Remove all type casting (operation as any) from CrawlingProgress component
- Add proper TypeScript types for discovery information

### Security
- Create URL validation utility (urlValidation.ts)
- Only render clickable links for validated HTTP/HTTPS URLs
- Reject unsafe protocols (javascript:, data:, vbscript:, file:)
- Display invalid URLs as plain text instead of links

## Testing
- Update test mocks to include history and url attributes for redirect checking
- Fix .well-known case sensitivity tests (must be lowercase per RFC 8615)
- Update discovery priority tests to match new order
- Remove tests for deprecated file variants

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 15:31:08 +02:00
leex279
fe95a0ab00 feat: Add Markdown issue template to support bug report pre-filling
GitHub's YAML templates (.yml) don't support URL parameter pre-filling, but
Markdown templates (.md) do. This adds a structured bug report template that
allows the automated bug reporter to pre-fill all user-submitted data.

Changes:
- Create .github/ISSUE_TEMPLATE/auto_bug_report.md template
- Update bug_report_api.py to use template=auto_bug_report.md parameter
- Update tests to verify template parameter is included in URL
- Add explanatory comments about YAML vs Markdown template differences

Benefits:
- Users see a structured bug report template (not generic issue form)
- All bug report data is pre-filled from the UI form
- Template provides consistent formatting and organization
- Better UX than generic issue creation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-18 12:27:18 +02:00
leex279
2f6ad22235 fix: Remove template parameter from bug report URL to enable field pre-filling
GitHub's issue creation URL does not support the 'template' parameter for
pre-filling fields. When a template is specified, GitHub ignores other URL
parameters like title and body, preventing user-submitted data from being
pre-filled in the issue form.

Changes:
- Remove 'template=bug_report.yml' parameter (non-existent template)
- Remove 'labels' parameter (not supported via URL)
- Keep only 'title' and 'body' parameters for proper pre-filling
- Add explanatory comment about GitHub's URL parameter limitations
- Update tests to verify URL structure (no template parameter)

Now when users click "Report Bug", the GitHub issue form will be properly
pre-filled with their title and detailed bug report information.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-17 23:18:25 +02:00
leex279
a68cbec12e fix: Update bug report to use centralized repository configuration from version.py
Fixes #802

The bug report feature was redirecting users to the old repository URL
(dynamous-community/Archon-V2-Alpha) instead of the current repository
(coleam00/Archon). This occurred because hardcoded default values in the
bug report API were not updated during the Alpha-to-Beta rebranding.

Changes:
- Import GITHUB_REPO_OWNER and GITHUB_REPO_NAME from version.py
- Update GitHubService.__init__() to construct default from constants
- Update health check endpoint to use same centralized default
- Add comprehensive integration tests for bug report URL generation
- Document repository configuration in CLAUDE.md

The fix ensures single source of truth for repository information and
maintains backward compatibility with GITHUB_REPO environment variable override.

All tests pass (7/7) validating correct repository URL usage.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-17 23:11:26 +02:00
leex279
8ab6c754fe fix: Improve path detection and add progress validation
- Replace dot-based file detection with explicit extension checking
  in discovery service to correctly handle versioned directories
  like /docs.v2
- Add comprehensive validation for start_progress and end_progress
  parameters in crawl_markdown_file to ensure they are valid
  numeric values in range [0, 100] with start < end
- Validation runs before any async work or progress reporting begins
- Clear error messages indicate which parameter is invalid and why

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-17 22:57:55 +02:00
leex279
cdf4323534 feat: Implement llms.txt link following with discovery priority fix
Implements complete llms.txt link following functionality that crawls
linked llms.txt files on the same domain/subdomain, along with critical
bug fixes for discovery priority and variant detection.

Backend Core Functionality:
- Add _is_same_domain_or_subdomain method for subdomain matching
- Fix is_llms_variant to detect .txt files in /llms/ directories
- Implement llms.txt link extraction and following logic
- Add two-phase discovery: prioritize ALL llms.txt before sitemaps
- Enhanced progress reporting with discovery metadata

Critical Bug Fixes:
- Discovery priority: Fixed sitemap.xml being found before llms.txt
- is_llms_variant: Now matches /llms/guides.txt, /llms/swift.txt, etc.
- These were blocking bugs preventing link following from working

Frontend UI:
- Add discovery and linked files display to CrawlingProgress component
- Update progress types to include discoveredFile, linkedFiles fields
- Add new crawl types: llms_txt_with_linked_files, discovery_*
- Add "discovery" to ProgressStatus enum and active statuses

Testing:
- 8 subdomain matching unit tests (test_crawling_service_subdomain.py)
- 7 integration tests for link following (test_llms_txt_link_following.py)
- All 15 tests passing
- Validated against real Supabase llms.txt structure (1 main + 8 linked)

Files Modified:
Backend:
- crawling_service.py: Core link following logic (lines 744-788, 862-920)
- url_handler.py: Fixed variant detection (lines 633-665)
- discovery_service.py: Two-phase discovery (lines 137-214)
- 2 new comprehensive test files

Frontend:
- progress/types/progress.ts: Updated types with new fields
- progress/components/CrawlingProgress.tsx: Added UI sections

Real-world testing: Crawling supabase.com/docs now discovers
/docs/llms.txt and automatically follows 8 linked llms.txt files,
indexing complete documentation from all files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-17 22:05:15 +02:00
leex279
a03ce1e4fd fix: Respect llms.txt priority over robots.txt sitemap declarations
Remove the special case that gave robots.txt sitemap declarations highest
priority, which incorrectly overrode the global priority order. Now properly
respects the intended priority: llms-full.txt > llms.txt > llms.md > llms.mdx >
sitemap.xml > robots.txt.

This fixes the issue where supabase.com/docs would return sitemap.xml instead
of llms.txt even though both files exist at /docs/ and llms.txt should have
higher priority.

Changes:
- Removed robots.txt early return that bypassed priority order
- Updated test to verify llms files take precedence over robots.txt sitemaps
- All discovery now follows consistent DISCOVERY_PRIORITY order

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-17 19:37:14 +02:00
leex279
8777e9456c feat: Prioritize same-directory discovery for llms.txt and sitemaps
Improve discovery logic to check the same directory as the base URL first before
falling back to root-level and subdirectories. This ensures files like
https://supabase.com/docs/llms.txt are found when crawling
https://supabase.com/docs.

Changes:
- Check same directory as base_url first (e.g., /docs/llms.txt for /docs URL)
- Fall back to root-level urljoin behavior
- Include base directory name in subdirectory checks (e.g., /docs subdirectory)
- Maintain priority order: same-dir > root > subdirectories
- Log discovery location for better debugging

This addresses cases where documentation directories contain their own llms.txt
or sitemap files that should take precedence over root-level files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-17 19:26:24 +02:00
leex279
e5160dde5c fix: Address CodeRabbit feedback for discovery service
- Preserve URL case in robots.txt parsing by only lowercasing the sitemap: prefix check
- Add support for relative sitemap paths in robots.txt using urljoin()
- Fix HTML meta tag parsing to use case-insensitive regex instead of lowercasing content
- Add URL scheme validation for discovered sitemaps (http/https only)
- Fix discovery target domain filtering to use discovered URL's domain instead of input URL
- Clean up whitespace and improve dict comprehension usage

These changes improve discovery reliability and prevent URL corruption while maintaining
backward compatibility with existing discovery behavior.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-17 19:03:25 +02:00
Rasmus Widing
edf3a51fa5 fix: resolve agent work orders api routing and defensive coding
- add trailing slashes to agent-work-orders endpoints to prevent FastAPI mount() redirects
- add defensive null check for repository_url in detail view
- fix backend routes to use relative paths with app.mount()
- resolves ERR_NAME_NOT_RESOLVED when accessing agent work orders
2025-10-17 09:52:58 +03:00
Rasmus Widing
fd81505908 refactor: simplify workflow to user-selectable 6-command architecture
Simplifies the workflow orchestrator from hardcoded 11-step atomic operations
to user-selectable 6-command workflow with context passing.

Core changes:
- WorkflowStep enum: 11 steps → 6 commands (create-branch, planning, execute, commit, create-pr, prp-review)
- workflow_orchestrator.py: 367 lines → 200 lines with command stitching loop
- Remove workflow_type field, add selected_commands parameter
- Simplify agent names from 11 → 6 constants
- Remove test/review phase config flags (now optional commands)

Deletions:
- Remove test_workflow.py, review_workflow.py, workflow_phase_tracker.py
- Remove 32 old command files from .claude/commands
- Remove PRPs/specs and PRD files from version control
- Update .gitignore to exclude specs, features, and validation markdown files

Breaking changes:
- AgentWorkOrder no longer has workflow_type field
- CreateAgentWorkOrderRequest now uses selected_commands instead of workflow_type
- WorkflowStep enum values incompatible with old step history

56 files changed, 625 insertions(+), 15,007 deletions(-)
2025-10-16 19:18:32 +03:00
Rasmus Widing
1c0020946b feat: Implement phases 3-5 of compositional workflow architecture
Completes the implementation of test/review workflows with automatic resolution
and integrates them into the orchestrator.

**Phase 3: Test Workflow with Resolution**
- Created test_workflow.py with automatic test failure resolution
- Implements retry loop with max 4 attempts (configurable via MAX_TEST_RETRY_ATTEMPTS)
- Parses JSON test results and resolves failures one by one
- Uses existing test.md and resolve_failed_test.md commands
- Added run_tests() and resolve_test_failure() to workflow_operations.py

**Phase 4: Review Workflow with Resolution**
- Created review_workflow.py with automatic blocker issue resolution
- Implements retry loop with max 3 attempts (configurable via MAX_REVIEW_RETRY_ATTEMPTS)
- Categorizes issues by severity (blocker/tech_debt/skippable)
- Only blocks on blocker issues - tech_debt and skippable allowed to pass
- Created review_runner.md and resolve_failed_review.md commands
- Added run_review() and resolve_review_issue() to workflow_operations.py
- Supports screenshot capture for UI review (configurable via ENABLE_SCREENSHOT_CAPTURE)

**Phase 5: Compositional Integration**
- Updated workflow_orchestrator.py to integrate test and review phases
- Test phase runs between commit and PR creation (if ENABLE_TEST_PHASE=true)
- Review phase runs after tests (if ENABLE_REVIEW_PHASE=true)
- Both phases are optional and controlled by config flags
- Step history tracks test and review execution results
- Proper error handling and logging for all phases

**Supporting Changes**
- Updated agent_names.py to add REVIEWER constant
- Added configuration flags to config.py for test/review phases
- All new code follows structured logging patterns
- Maintains compatibility with existing workflow steps

**Files Changed**: 19 files, 3035+ lines
- New: test_workflow.py, review_workflow.py, review commands
- Modified: orchestrator, workflow_operations, agent_names, config
- Phases 1-2 files (worktree, state, port allocation) also staged

The implementation is complete and ready for testing. All phases now support
parallel execution via worktree isolation with deterministic port allocation.
2025-10-16 19:18:03 +03:00
Rasmus Widing
9a60d6ae89 sauce aow 2025-10-16 19:17:18 +03:00
leex279
968e5b73fe Add SSL verification and response size limits to discovery service
- Enable SSL certificate verification (verify=True) for all HTTP requests
- Implement streaming with size limits (10MB default) to prevent memory exhaustion
- Add _read_response_with_limit() helper for secure response reading
- Update all test mocks to support streaming API with iter_content()
- Fix test assertions to expect new security parameters
- Enforce deterministic rounding in progress mapper tests

Security improvements:
- Prevents MITM attacks through SSL verification
- Guards against DoS via oversized responses
- Ensures proper resource cleanup with response.close()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-14 22:31:19 +02:00
leex279
d696918ff0 Merge main into feature/automatic-discovery-llms-sitemap-430
Resolved merge conflicts by integrating features from both branches:
- Added page_storage_ops service initialization from main
- Merged link text extraction with discovery mode features
- Preserved discovery single-file mode and domain filtering
- Maintained link text fallbacks for title extraction

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-11 09:31:24 +02:00
Cole Medin
77e9342c27 Updating title exxtraction for llms.txt 2025-10-10 18:16:03 -05:00
sean-eskerium
7c3823e08f Fixes: crawl code storage issue with <think> tags for ollama models. (#775)
* Fixes: crawl code storage issue with <think> tags for ollama models.

* updates from code rabbit review
2025-10-10 17:09:53 -05:00
Cole Medin
bfd0a84f64 RAG Enhancements (Page Level Retrieval) (#767)
* Initial commit for RAG by document

* Phase 2

* Adding migrations

* Fixing page IDs for chunk metadata

* Fixing unit tests, adding tool to list pages for source

* Fixing page storage upsert issues

* Max file length for retrieval

* Fixing title issue

* Fixing tests
2025-10-09 19:39:27 -05:00
Wirasm
489415d723 Fix: Database timeout when deleting large sources (#737)
* fix: implement CASCADE DELETE for source deletion timeout issue

- Add migration 009 to add CASCADE DELETE constraints to foreign keys
- Simplify delete_source() to only delete parent record
- Database now handles cascading deletes efficiently
- Fixes timeout issues when deleting sources with thousands of pages

* chore: update complete_setup.sql to include CASCADE DELETE constraints

- Add ON DELETE CASCADE to foreign keys in initial setup
- Include migration 009 in the migrations tracking
- Ensures new installations have CASCADE DELETE from the start
2025-10-09 17:52:06 +03:00
Josh
a580fdfe66 Feature/LLM-Providers-UI-Polished (#736)
* Add Anthropic and Grok provider support

* feat: Add crucial GPT-5 and reasoning model support for OpenRouter

- Add requires_max_completion_tokens() function for GPT-5, o1, o3, Grok-3 series
- Add prepare_chat_completion_params() for reasoning model compatibility
- Implement max_tokens → max_completion_tokens conversion for reasoning models
- Add temperature handling for reasoning models (must be 1.0 default)
- Enhanced provider validation and API key security in provider endpoints
- Streamlined retry logic (3→2 attempts) for faster issue detection
- Add failure tracking and circuit breaker analysis for debugging
- Support OpenRouter format detection (openai/gpt-5-nano, openai/o1-mini)
- Improved Grok provider empty response handling with structured fallbacks
- Enhanced contextual embedding with provider-aware model selection

Core provider functionality:
- OpenRouter, Grok, Anthropic provider support with full embedding integration
- Provider-specific model defaults and validation
- Secure API connectivity testing endpoints
- Provider context passing for code generation workflows

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fully working model providers, addressing securtiy and code related concerns, throughly hardening our code

* added multiprovider support, embeddings model support, cleaned the pr, need to fix health check, asnyico tasks errors, and contextual embeddings error

* fixed contextual embeddings issue

* - Added inspect-aware shutdown handling so get_llm_client always closes the underlying AsyncOpenAI / httpx.AsyncClient while the loop is   still alive, with defensive logging if shutdown happens late (python/src/server/services/llm_provider_service.py:14, python/src/server/    services/llm_provider_service.py:520).

* - Restructured get_llm_client so client creation and usage live in separate try/finally blocks; fallback clients now close without         logging spurious Error creating LLM client when downstream code raises (python/src/server/services/llm_provider_service.py:335-556).    - Close logic now sanitizes provider names consistently and awaits whichever aclose/close coroutine the SDK exposes, keeping the loop      shut down cleanly (python/src/server/services/llm_provider_service.py:530-559).                                                                                                                                                                                                       Robust JSON Parsing                                                                                                                                                                                                                                                                   - Added _extract_json_payload to strip code fences / extra text returned by Ollama before json.loads runs, averting the markdown-induced   decode errors you saw in logs (python/src/server/services/storage/code_storage_service.py:40-63).                                          - Swapped the direct parse call for the sanitized payload and emit a debug preview when cleanup alters the content (python/src/server/     services/storage/code_storage_service.py:858-864).

* added provider connection support

* added provider api key not being configured warning

* Updated get_llm_client so missing OpenAI keys automatically fall back to Ollama (matching existing tests) and so unsupported providers     still raise the legacy ValueError the suite expects. The fallback now reuses _get_optimal_ollama_instance and rethrows ValueError(OpenAI  API key not found and Ollama fallback failed) when it cant connect.  Adjusted test_code_extraction_source_id.py to accept the new optional argument on the mocked extractor (and confirm its None when         present).

* Resolved a few needed code rabbit suggestion   - Updated the knowledge API key validation to call create_embedding with the provider argument and removed the hard-coded OpenAI fallback  (python/src/server/api_routes/knowledge_api.py).                                                                                           - Broadened embedding provider detection so prefixed OpenRouter/OpenAI model names route through the correct client (python/src/server/    services/embeddings/embedding_service.py, python/src/server/services/llm_provider_service.py).                                             - Removed the duplicate helper definitions from llm_provider_service.py, eliminating the stray docstring that was causing the import-time  syntax error.

* updated via code rabbit PR review, code rabbit in my IDE found no issues and no nitpicks with the updates! what was done:    Credential service now persists the provider under the uppercase key LLM_PROVIDER, matching the read path (no new EMBEDDING_PROVIDER     usage introduced).                                                                                                                          Embedding batch creation stops inserting blank strings, logging failures and skipping invalid items before they ever hit the provider    (python/src/server/services/embeddings/embedding_service.py).                                                                               Contextual embedding prompts use real newline characters everywhereboth when constructing the batch prompt and when parsing the         models response (python/src/server/services/embeddings/contextual_embedding_service.py).                                                   Embedding provider routing already recognizes OpenRouter-prefixed OpenAI models via is_openai_embedding_model; no further change needed  there.                                                                                                                                      Embedding insertion now skips unsupported vector dimensions instead of forcing them into the 1536-column, and the backoff loop uses      await asyncio.sleep so we no longer block the event loop (python/src/server/services/storage/code_storage_service.py).                      RAG settings props were extended to include LLM_INSTANCE_NAME and OLLAMA_EMBEDDING_INSTANCE_NAME, and the debug log no longer prints     API-key prefixes (the rest of the TanStack refactor/EMBEDDING_PROVIDER support remains deferred).

* test fix

* enhanced Openrouters parsing logic to automatically detect reasoning models and parse regardless of json output or not. this commit creates a robust way for archons parsing to work throughly with openrouter automatically, regardless of the model youre using, to ensure proper functionality with out breaking any generation capabilities!

* updated ui llm interface, added seprate embeddings provider, made the system fully capabale of mix and matching llm providers (local and non local) for chat & embeddings. updated the ragsettings.tsx ui mainly, along with core functionality

* added warning labels and updated ollama health checks

* ready for review, fixed som error warnings and consildated ollama status health checks

* fixed FAILED test_async_embedding_service.py

* code rabbit fixes

* Separated the code-summary LLM provider from the embedding provider, so code example storage now forwards a dedicated embedding provider override end-to-end without hijacking the embedding pipeline. this fixes code rabbits (Preserve provider override in create_embeddings_batch) suggesting

* - Swapped API credential storage to booleans so decrypted keys never sit in React state (archon-ui-main/src/components/
  settings/RAGSettings.tsx).
  - Normalized Ollama instance URLs and gated the metrics effect on real state changes to avoid mis-counts and duplicate
  fetches (RAGSettings.tsx).
  - Tightened crawl progress scaling and indented-block parsing to handle min_length=None safely (python/src/server/
  services/crawling/code_extraction_service.py:160, python/src/server/services/crawling/code_extraction_service.py:911).
  - Added provider-agnostic embedding rate-limit retries so Google and friends back off gracefully (python/src/server/
  services/embeddings/embedding_service.py:427).
  - Made the orchestration registry async + thread-safe and updated every caller to await it (python/src/server/services/
  crawling/crawling_service.py:34, python/src/server/api_routes/knowledge_api.py:1291).

* Update RAGSettings.tsx - header for 'LLM Settings' is now 'LLM Provider Settings'

* (RAG Settings)

  - Ollama Health Checks & Metrics
      - Added a 10-second timeout to the health fetch so it doesn't hang.
      - Adjusted logic so metric refreshes run for embedding-only Ollama setups too.
      - Initial page load now checks Ollama if either chat or embedding provider uses it.
      - Metrics and alerts now respect which provider (chat/embedding) is currently selected.
  - Provider Sync & Alerts
      - Fixed a sync bug so the very first provider change updates settings as expected.
      - Alerts now track the active provider (chat vs embedding) rather than only the LLM provider.
      - Warnings about missing credentials now skip whichever provider is currently selected.
  - Modals & Types
      - Normalize URLs before handing them to selection modals to keep consistent data.
      - Strengthened helper function types (getDisplayedChatModel, getModelPlaceholder, etc.).

 (Crawling Service)

  - Made the orchestration registry lock lazy-initialized to avoid issues in Python 3.12 and wrapped registry commands
  (register, unregister) in async calls. This keeps things thread-safe even during concurrent crawling and cancellation.

* - migration/complete_setup.sql:101 seeds Google/OpenRouter/Anthropic/Grok API key rows so fresh databases expose every
  provider by default.
  - migration/0.1.0/009_add_provider_placeholders.sql:1 backfills the same rows for existing Supabase instances and
  records the migration.
  - archon-ui-main/src/components/settings/RAGSettings.tsx:121 introduces a shared credentialprovider map,
  reloadApiCredentials runs through all five providers, and the status poller includes the new keys.
  - archon-ui-main/src/components/settings/RAGSettings.tsx:353 subscribes to the archon:credentials-updated browser event
  so adding/removing a key immediately refetches credential status and pings the corresponding connectivity test.
  - archon-ui-main/src/components/settings/RAGSettings.tsx:926 now treats missing Anthropic/OpenRouter/Grok keys as
  missing, preventing stale connected badges when a key is removed.

* - archon-ui-main/src/components/settings/RAGSettings.tsx:90 adds a simple display-name map and reuses one red alert
  style.
  - archon-ui-main/src/components/settings/RAGSettings.tsx:1016 now shows exactly one red banner when the active provider
  - Removed the old duplicate Missing API Key Configuration block, so the panel no longer stacks two warnings.

* Update credentialsService.ts default model

* updated the google embedding adapter for multi dimensional rag querying

* thought this micro fix in the google embedding pushed with the embedding update the other day, it didnt. pushing now

---------

Co-authored-by: Chillbruhhh <joshchesser97@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-10-05 13:49:09 -05:00
leex279
d3cecd2b1d Merge branch 'main' into feature/automatic-discovery-llms-sitemap-430 2025-09-22 22:35:36 +02:00
Cole Medin
3ff3f7f2dc Migrations and version APIs (#718)
* Preparing migration folder for the migration alert implementation

* Migrations and version APIs initial

* Touching up update instructions in README and UI

* Unit tests for migrations and version APIs

* Splitting up the Ollama migration scripts

* Removing temporary PRPs

---------

Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
2025-09-22 12:25:58 +03:00
Josh
394ac1befa Feat:Openrouter/Anthropic/grok-support (#231)
* Add Anthropic and Grok provider support

* feat: Add crucial GPT-5 and reasoning model support for OpenRouter

- Add requires_max_completion_tokens() function for GPT-5, o1, o3, Grok-3 series
- Add prepare_chat_completion_params() for reasoning model compatibility
- Implement max_tokens → max_completion_tokens conversion for reasoning models
- Add temperature handling for reasoning models (must be 1.0 default)
- Enhanced provider validation and API key security in provider endpoints
- Streamlined retry logic (3→2 attempts) for faster issue detection
- Add failure tracking and circuit breaker analysis for debugging
- Support OpenRouter format detection (openai/gpt-5-nano, openai/o1-mini)
- Improved Grok provider empty response handling with structured fallbacks
- Enhanced contextual embedding with provider-aware model selection

Core provider functionality:
- OpenRouter, Grok, Anthropic provider support with full embedding integration
- Provider-specific model defaults and validation
- Secure API connectivity testing endpoints
- Provider context passing for code generation workflows

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fully working model providers, addressing securtiy and code related concerns, throughly hardening our code

* added multiprovider support, embeddings model support, cleaned the pr, need to fix health check, asnyico tasks errors, and contextual embeddings error

* fixed contextual embeddings issue

* - Added inspect-aware shutdown handling so get_llm_client always closes the underlying AsyncOpenAI / httpx.AsyncClient while the loop is   still alive, with defensive logging if shutdown happens late (python/src/server/services/llm_provider_service.py:14, python/src/server/    services/llm_provider_service.py:520).

* - Restructured get_llm_client so client creation and usage live in separate try/finally blocks; fallback clients now close without         logging spurious Error creating LLM client when downstream code raises (python/src/server/services/llm_provider_service.py:335-556).    - Close logic now sanitizes provider names consistently and awaits whichever aclose/close coroutine the SDK exposes, keeping the loop      shut down cleanly (python/src/server/services/llm_provider_service.py:530-559).                                                                                                                                                                                                       Robust JSON Parsing                                                                                                                                                                                                                                                                   - Added _extract_json_payload to strip code fences / extra text returned by Ollama before json.loads runs, averting the markdown-induced   decode errors you saw in logs (python/src/server/services/storage/code_storage_service.py:40-63).                                          - Swapped the direct parse call for the sanitized payload and emit a debug preview when cleanup alters the content (python/src/server/     services/storage/code_storage_service.py:858-864).

* added provider connection support

* added provider api key not being configured warning

* Updated get_llm_client so missing OpenAI keys automatically fall back to Ollama (matching existing tests) and so unsupported providers     still raise the legacy ValueError the suite expects. The fallback now reuses _get_optimal_ollama_instance and rethrows ValueError(OpenAI  API key not found and Ollama fallback failed) when it cant connect.  Adjusted test_code_extraction_source_id.py to accept the new optional argument on the mocked extractor (and confirm its None when         present).

* Resolved a few needed code rabbit suggestion   - Updated the knowledge API key validation to call create_embedding with the provider argument and removed the hard-coded OpenAI fallback  (python/src/server/api_routes/knowledge_api.py).                                                                                           - Broadened embedding provider detection so prefixed OpenRouter/OpenAI model names route through the correct client (python/src/server/    services/embeddings/embedding_service.py, python/src/server/services/llm_provider_service.py).                                             - Removed the duplicate helper definitions from llm_provider_service.py, eliminating the stray docstring that was causing the import-time  syntax error.

* updated via code rabbit PR review, code rabbit in my IDE found no issues and no nitpicks with the updates! what was done:    Credential service now persists the provider under the uppercase key LLM_PROVIDER, matching the read path (no new EMBEDDING_PROVIDER     usage introduced).                                                                                                                          Embedding batch creation stops inserting blank strings, logging failures and skipping invalid items before they ever hit the provider    (python/src/server/services/embeddings/embedding_service.py).                                                                               Contextual embedding prompts use real newline characters everywhereboth when constructing the batch prompt and when parsing the         models response (python/src/server/services/embeddings/contextual_embedding_service.py).                                                   Embedding provider routing already recognizes OpenRouter-prefixed OpenAI models via is_openai_embedding_model; no further change needed  there.                                                                                                                                      Embedding insertion now skips unsupported vector dimensions instead of forcing them into the 1536-column, and the backoff loop uses      await asyncio.sleep so we no longer block the event loop (python/src/server/services/storage/code_storage_service.py).                      RAG settings props were extended to include LLM_INSTANCE_NAME and OLLAMA_EMBEDDING_INSTANCE_NAME, and the debug log no longer prints     API-key prefixes (the rest of the TanStack refactor/EMBEDDING_PROVIDER support remains deferred).

* test fix

* enhanced Openrouters parsing logic to automatically detect reasoning models and parse regardless of json output or not. this commit creates a robust way for archons parsing to work throughly with openrouter automatically, regardless of the model youre using, to ensure proper functionality with out breaking any generation capabilities!

---------

Co-authored-by: Chillbruhhh <joshchesser97@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-09-22 10:36:30 +03:00
John Fitzpatrick
d4e80a945a fix: Change Ollama default URL to host.docker.internal for Docker compatibility
- Changed default Ollama URL from localhost:11434 to host.docker.internal:11434
- This allows Docker containers to connect to Ollama running on the host machine
- Updated in backend services, frontend components, migration scripts, and documentation
- Most users run Archon in Docker but Ollama as a local binary, making this a better default
2025-09-20 13:36:33 -07:00
Cole Medin
e9c08d2fe9 Updating RAG SIMILARITY_THRESHOLD to 0.05 2025-09-20 13:57:13 -05:00
Cole Medin
b1085a53df Removing junk from sitemap and full site (recursive) crawls (#711)
* Removing junk from sitemap and full site (recursive) crawls

* Small typo fix for result.markdown
2025-09-20 12:58:42 -05:00
Cole Medin
c3be65322b Improved MCP and global rules instructions (#705) 2025-09-20 12:58:20 -05:00
leex279
597fc86c39 fix: Skip link extraction for discovery targets (single-file mode)
When a file is selected through discovery, it should be crawled as a single file without
following any links contained within it. This preserves the efficiency gains of the
discovery feature.

Changes:
- Skip link extraction when is_discovery_target is true for link collection files
- Return sitemap metadata without crawling URLs when is_discovery_target is true
- Add clear logging to indicate single-file mode is active

This ensures discovered files (llms.txt, sitemap.xml, etc.) are processed as single
authoritative sources rather than starting recursive crawls, which aligns with the
PR's objective of efficient single-file discovery and crawling.
2025-09-20 13:55:15 +02:00
leex279
c1677a9220 fix: Skip discovery when user provides direct discovery file URLs
When a user directly provides a URL to a discovery file (sitemap.xml, llms.txt, robots.txt, etc.),
the system now skips the discovery phase and uses the provided file directly.

This prevents unnecessary discovery attempts and respects the user's explicit choice.

Changes:
- Check if the URL is already a discovery target before running discovery
- Skip discovery for: sitemap files, llms variants, robots.txt, well-known files, and any .txt files
- Add logging to indicate when discovery is skipped

Example: When crawling 'xyz.com/sitemap.xml' directly, the system will now use that sitemap
instead of trying to discover a different file like llms.txt
2025-09-20 13:34:07 +02:00
leex279
7f74aea476 fix: Discovery now respects given URL path and fix method signature mismatches
Two critical fixes for the automatic discovery feature:

1. Discovery Service path handling:
   - Changed from always using root domain (/) to respecting given URL path
   - e.g., for 'supabase.com/docs', now checks 'supabase.com/docs/robots.txt'
   - Previously incorrectly checked 'supabase.com/robots.txt'
   - Fixed all urljoin calls to use relative paths instead of absolute paths

2. Method signature mismatches:
   - Removed start_progress and end_progress parameters from crawl_batch_with_progress
   - Removed same parameters from crawl_recursive_with_progress
   - Fixed all calls to these methods to match the strategy implementations

These fixes ensure discovery works correctly for subdirectory URLs and prevents TypeError crashes during crawling.
2025-09-20 13:06:41 +02:00
leex279
2be44d1bf4 fix: Resolve syntax error from merge conflict resolution
- Fixed duplicate else statement in crawling_service.py
- Corrected indentation for extracted links processing logic
- Syntax now validates successfully
2025-09-20 09:33:47 +02:00
leex279
8072066ee6 Merge main into feature/automatic-discovery-llms-sitemap-430
- Resolved conflicts in progress_mapper.py to include discovery stage (3-4%)
- Resolved conflicts in crawling_service.py to maintain both discovery feature and main improvements
- Resolved conflicts in test_progress_mapper.py to include tests for discovery stage
- Kept all optimizations and improvements from main
- Maintained discovery feature functionality with proper integration
2025-09-20 09:27:36 +02:00
DIY Smart Code
6abb8831f7 fix: enable code examples extraction for manual file uploads (#626)
* fix: enable code examples extraction for manual file uploads

- Add extract_code_examples parameter to upload API endpoint (default: true)
- Integrate CodeExtractionService into DocumentStorageService.upload_document()
- Add code extraction after document storage with progress tracking
- Map code extraction progress to 85-95% range in upload progress
- Include code_examples_stored in upload results and logging
- Support extract_code_examples in batch document upload via store_documents()
- Handle code extraction errors gracefully without failing upload

Fixes issue where code examples were only extracted for URL crawls
but not for manual file uploads, despite using the same underlying
CodeExtractionService that supports both HTML and text formats.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Fix code extraction for uploaded markdown files

- Provide file content in both html and markdown fields for crawl_results
- This ensures markdown files (.md) use the correct text file extraction path
- The CodeExtractionService checks html_content first for text files
- Fixes issue where uploaded .md files didn't extract code examples properly

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* debug: Add comprehensive logging to trace code extraction issue

- Add detailed debug logging to upload code extraction flow
- Log extract_code_examples parameter value
- Log crawl_results structure and content length
- Log progress callbacks from extraction service
- Log final extraction count with more context
- Enhanced error logging with full stack traces

This will help identify exactly where the extraction is failing for uploaded files.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Remove invalid start_progress/end_progress parameters

The extract_and_store_code_examples method doesn't accept start_progress
and end_progress parameters, causing TypeError during file uploads.

This was the root cause preventing code extraction from working - the
method was failing with a signature mismatch before any extraction logic
could run.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Preserve code blocks across PDF page boundaries

PDF extraction was breaking markdown code blocks by inserting page separators:

```python
def hello():
--- Page 2 ---
    return "world"
```

This made code blocks unrecognizable to extraction patterns.

Solution:
- Add _preserve_code_blocks_across_pages() function
- Detect split code blocks using regex pattern matching
- Remove page separators that appear within code blocks
- Apply to both pdfplumber and PyPDF2 extraction paths

Now PDF uploads should properly extract code examples just like markdown files.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Add PDF-specific code extraction for files without markdown delimiters

Root cause: PDFs lose markdown code block delimiters (``` ) during text extraction,
making standard markdown patterns fail to detect code.

Solution:
1. Add _extract_pdf_code_blocks() method with plain-text code detection patterns:
   - Python import blocks and function definitions
   - YAML configuration blocks
   - Shell command sequences
   - Multi-line indented code blocks

2. Add PDF detection logic in _extract_code_blocks_from_documents()
3. Set content_type properly for PDF files in storage service
4. Add debug logging to PDF text extraction process

This allows extraction of code from PDFs that contain technical documentation
with code examples, even when markdown formatting is lost during PDF->text conversion.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Enhanced PDF code extraction to match markdown extraction results

Problem: PDF extraction only found 1 code example vs 9 from same markdown content
Root cause: PDF extraction patterns were too restrictive and specific

Enhanced solution:
1. **Multi-line code block detection**: Scans for consecutive "code-like" lines
   - Variable assignments, imports, function calls, method calls
   - Includes comments, control flow, YAML keys, shell commands
   - Handles indented continuation lines and empty lines within blocks

2. **Smarter block boundary detection**:
   - Excludes prose lines with narrative indicators
   - Allows natural code block boundaries
   - Preserves context around extracted blocks

3. **Comprehensive pattern coverage**:
   - Python scripts and functions
   - YAML configuration blocks
   - Shell command sequences
   - JavaScript functions

This approach should extract the same ~9 code examples from PDFs as from
markdown files, since it detects code patterns without relying on markdown delimiters.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Simplify PDF extraction to section-based approach

Changed from complex line-by-line analysis to simpler section-based approach:

1. Split PDF content by natural boundaries (paragraphs, page breaks)
2. Score each section for code vs prose indicators
3. Extract sections that score high on code indicators
4. Add comprehensive logging to debug section classification

Code indicators include:
- Python imports, functions, classes (high weight)
- Variable assignments, method calls (medium weight)
- Package management commands, lambda functions

This should better match the 9 code examples found in markdown version
by treating each logical code segment as a separate extractable block.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Add explicit HTML file detection and extraction path

Problem: HTML files (0 code examples extracted) weren't being routed to HTML extraction

Root cause: HTML files (.html, .htm) weren't explicitly detected, so they fell through
to generic extraction logic instead of using the robust HTML code block patterns.

Solution:
1. Add HTML file detection: is_html_file = source_url.endswith(('.html', '.htm'))
2. Add explicit HTML extraction path before fallback logic
3. Set proper content_type: "text/html" for HTML files in storage service
4. Ensure HTML content is passed to _extract_html_code_blocks method

The existing HTML extraction already has comprehensive patterns for:
- <pre><code class="lang-python"> (syntax highlighted)
- <pre><code> (standard)
- Various code highlighting libraries (Prism, highlight.js, etc.)

This should now extract all code blocks from HTML files just like URL crawls do.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Add HTML tag cleanup and proper code extraction for HTML files

Problem: HTML uploads had 0 code examples and contained HTML tags in RAG chunks

Solution:
1. **HTML Tag Cleanup**: Added _clean_html_to_text() function that:
   - Preserves code blocks by temporarily replacing them with placeholders
   - Removes all HTML tags, scripts, styles from prose content
   - Converts HTML structure (headers, paragraphs, lists) to clean text
   - Restores code blocks as markdown format (```language)
   - Cleans HTML entities (&lt;, &gt;, etc.)

2. **Unified Text Processing**: HTML files now processed as text files since they:
   - Have clean text for RAG chunking (no HTML tags)
   - Have markdown-style code blocks for extraction
   - Use existing text file extraction path

3. **Content Type Mapping**: Set text/markdown for cleaned HTML files

Result: HTML files now extract code examples like markdown files while providing
clean text for RAG without HTML markup pollution.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: Add HTML file support to upload dialog

- Add .html and .htm to accepted file types in AddKnowledgeDialog
- Users can now see and select HTML files in the file picker by default
- HTML files will be processed with tag cleanup and code extraction

Previously HTML files had to be manually typed or dragged, now they appear
in the standard file picker alongside other supported formats.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Prevent HTML extraction path confusion in crawl_results payload

Problem: Setting both 'markdown' and 'html' fields to same content could trigger
HTML extraction regexes when we want text/markdown extraction.

Solution:
- markdown: Contains cleaned plaintext/markdown content
- html: Empty string to prevent HTML extraction path
- content_type: Proper type (application/pdf, text/markdown, text/plain)

This ensures HTML files (now cleaned to markdown format) use the text file
extraction path with backtick patterns, not HTML regex patterns.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-18 20:06:48 +03:00
John C Fitzpatrick
9ffca825ff feat: Universal clipboard utility with improved copy functionality (#663)
* feat: Add universal clipboard utility with enhanced copy functionality

- Add comprehensive clipboard utility (src/utils/clipboard.ts) with:
  - Modern Clipboard API with automatic fallback to document.execCommand
  - Cross-browser compatibility and security context handling
  - Detailed error reporting and debugging capabilities
  - Support for secure (HTTPS) and insecure (HTTP/localhost) contexts

- Update components to use new clipboard utility:
  - BugReportModal: Enhanced copy functionality with error handling
  - CodeViewerModal: Improved copy-to-clipboard for code snippets
  - IDEGlobalRules: Robust clipboard operations for rule copying
  - McpConfigSection: Enhanced config and command copying
  - DocumentCard: Reliable ID copying functionality
  - KnowledgeInspector: Improved content copying
  - ButtonPlayground: Enhanced CSS style copying

- Benefits:
  - Consistent copy behavior across all browser environments
  - Better error handling and user feedback
  - Improved accessibility and security context support
  - Enhanced debugging capabilities

Fixes #662

* fix: Improve clipboard utility robustness and add missing host configuration

Clipboard utility improvements:
- Prevent textarea element leak in clipboard fallback with proper cleanup
- Add SSR compatibility with typeof guards for navigator/document
- Use finally block to ensure cleanup in all error cases

Host configuration fixes:
- Update MCP API to use ARCHON_HOST environment variable instead of hardcoded localhost
- Add ARCHON_HOST to docker-compose environment variables
- Ensures MCP configuration shows correct hostname in different deployment environments

Addresses CodeRabbit feedback and restores missing host functionality

* fix: Use relative URLs for Vite proxy in development

- Update getApiUrl() to return empty string when VITE_API_URL is unset
- Ensures all API requests use relative paths (/api/...) in development
- Prevents bypassing Vite proxy with absolute URLs (host:port)
- Maintains existing functionality for explicit VITE_API_URL configuration
- Fix TypeScript error by using bracket notation for environment access

Addresses CodeRabbit feedback about dev setup relying on Vite proxy

* fix: Resolve TypeScript error in API configuration

Use proper type assertion to access VITE_API_URL environment variable

* Address PR review comments: Move clipboard utility to features architecture

- Move clipboard.ts from src/utils/ to src/features/shared/utils/
- Remove copyTextToClipboard backward compatibility function (dead code)
- Update all import statements to use new file location
- Maintain full clipboard functionality with modern API and fallbacks

Addresses:
- Review comment r2348420743: Move to new architecture location
- Review comment r2348422625: Remove unused backward compatibility function

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix SSR safety issue in clipboard utility

- Add typeof navigator !== 'undefined' guard before accessing navigator.clipboard
- Add typeof document !== 'undefined' guard before using document.execCommand fallback
- Ensure proper error handling when running in server-side environment
- Maintain existing functionality while preventing ReferenceError during SSR/prerender

Addresses CodeRabbit feedback: Navigator access needs SSR-safe guards

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-18 10:04:46 -07:00
Wirasm
89fa9b4b49 Include description in tasks polling ETag (#698)
* Include description in tasks polling ETag

* Align tasks endpoint headers with HTTP cache expectations
2025-09-18 15:18:53 +03:00
Wirasm
f4ad785439 refactor: Phase 2 Query Keys Standardization - Complete TanStack Query v5 patterns implementation (#692)
* refactor: complete Phase 2 Query Keys Standardization

Standardize query keys across all features following vertical slice architecture,
ensuring they mirror backend API structure exactly with no backward compatibility.

Key Changes:
- Refactor all query key factories to follow consistent patterns
- Move progress feature from knowledge/progress to top-level /features/progress
- Create shared query patterns for consistency (DISABLED_QUERY_KEY, STALE_TIMES)
- Remove all hardcoded stale times and disabled keys
- Update all imports after progress feature relocation

Query Key Factories Standardized:
- projectKeys: removed task-related keys (tasks, taskCounts)
- taskKeys: added dual nature support (global via lists(), project-scoped via byProject())
- knowledgeKeys: removed redundant methods (details, summary)
- progressKeys: new top-level feature with consistent factory
- documentKeys: full factory pattern with versions support
- mcpKeys: complete with health endpoint

Shared Patterns Implementation:
- STALE_TIMES: instant (0), realtime (3s), frequent (5s), normal (30s), rare (5m), static (∞)
- DISABLED_QUERY_KEY: consistent disabled query pattern across all features
- Removed unused createQueryOptions helper

Testing:
- Added comprehensive tests for progress hooks
- Updated all test mocks to include new STALE_TIMES values
- All 81 feature tests passing

Documentation:
- Created QUERY_PATTERNS.md guide for future implementations
- Clear patterns, examples, and migration checklist

Breaking Changes:
- Progress imports moved from knowledge/progress to progress
- Query key structure changes (cache will reset)
- No backward compatibility maintained

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: establish single source of truth for tags in metadata

- Remove ambiguous top-level tags field from KnowledgeItem interface
- Update all UI components to use metadata.tags exclusively
- Fix mutations to correctly update tags in metadata object
- Remove duplicate tags field from backend KnowledgeSummaryService
- Fix test setup issue with QueryClient instance in knowledge tests
- Add TODO comments for filter-blind optimistic updates (Phase 3)

This eliminates the ambiguity identified in Phase 2 where both item.tags
and metadata.tags existed, establishing metadata.tags as the single
source of truth across the entire stack.

* fix: comprehensive progress hooks improvements

- Integrate useSmartPolling for all polling queries
- Fix memory leaks from uncleaned timeouts
- Replace string-based error checking with status codes
- Remove TypeScript any usage with proper types
- Fix unstable dependencies with sorted JSON serialization
- Add staleTime to document queries for consistency

* feat: implement flexible assignee system for dynamic agents

- Changed assignee from restricted enum to flexible string type
- Renamed "AI IDE Agent" to "Coding Agent" for clarity
- Enhanced ComboBox with Radix UI best practices:
  - Full ARIA compliance (roles, labels, keyboard nav)
  - Performance optimizations (memoization, useCallback)
  - Improved UX (auto-scroll, keyboard shortcuts)
  - Fixed event bubbling preventing unintended modal opens
- Updated MCP server docs to reflect flexible assignee capability
- Removed unnecessary UI elements (arrows, helper text)
- Styled ComboBox to match priority selector aesthetic

This allows external MCP clients to create and assign custom sub-agents
dynamically, supporting advanced agent orchestration workflows.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: complete Phase 2 summariesPrefix usage for cache consistency

- Fix all knowledgeKeys.summaries() calls to use summariesPrefix() for operations targeting multiple summary caches
- Update cancelQueries, getQueriesData, setQueriesData, invalidateQueries, and refetchQueries calls
- Fix critical cache invalidation bug where filtered summaries weren't being cleared
- Update test expectations to match new factory patterns
- Address CodeRabbit review feedback on cache stability issues

This completes the Phase 2 Query Keys Standardization work documented in PRPs/local/frontend-state-management-refactor.md

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: update MCP task tools documentation for Coding Agent rename

Update task assignee documentation from "AI IDE Agent" to "Coding Agent"
to match frontend changes for consistency across the system.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: implement assignee filtering in MCP find_tasks function

Add missing implementation for filter_by="assignee" that was documented
but not coded. The filter now properly passes the assignee parameter to
the backend API, matching the existing pattern used for status filtering.

Fixes documentation/implementation mismatch identified by CodeRabbit.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Phase 2 cleanup - address review comments and improve code quality

Changes made:
- Reduced smart polling interval from 60s to 5s for background tabs (better responsiveness)
- Fixed cache coherence bug in knowledge queries (missing limit parameter)
- Standardized "Coding Agent" naming (was inconsistently "AI IDE Agent")
- Improved task queries with 2s polling, type safety, and proper invalidation
- Enhanced combobox accessibility with proper ARIA attributes and IDs
- Delegated useCrawlProgressPolling to useActiveOperations (removed duplication)
- Added exact: true to progress query removals (prevents sibling removal)
- Fixed invalid Tailwind class ml-4.5 to ml-4

All changes align with Phase 2 query key standardization goals and improve
overall code quality, accessibility, and performance.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-18 11:05:03 +03:00
DIY Smart Code
c45842f0bb feat: decouple task priority from task order (#652)
* feat: decouple task priority from task order

This implements a dedicated priority system that operates independently
from the existing task_order system, allowing users to set task priority
without affecting visual drag-and-drop positioning.

## Changes Made

### Database
- Add priority column to archon_tasks table with enum type (critical, high, medium, low)
- Create database migration with safe enum handling and data backfill
- Add proper indexing for performance

### Backend
- Update UpdateTaskRequest to include priority field
- Add priority validation in TaskService with enum checking
- Include priority field in task list responses and ETag generation
- Fix cache invalidation for priority updates

### Frontend
- Update TaskPriority type from "urgent" to "critical" for consistency
- Add changePriority method to useTaskActions hook
- Update TaskCard to use direct priority field instead of task_order conversion
- Update TaskEditModal priority form to use direct priority values
- Fix TaskPriorityComponent to use correct priority enum values
- Update buildTaskUpdates to include priority field changes
- Add priority field to Task interface as required field
- Update test fixtures to include priority field

## Key Features
-  Users can change task priority without affecting drag-and-drop order
-  Users can drag tasks to reorder without changing priority level
-  Priority persists correctly in database with dedicated column
-  All existing priority functionality continues working identically
-  Cache invalidation works properly for priority changes
-  Both TaskCard priority button and TaskEditModal priority work

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: add priority column to complete_setup.sql for fresh installations

- Add task_priority enum type (low, medium, high, critical)
- Add priority column to archon_tasks table with default 'medium'
- Add index for priority column performance
- Add documentation comment for priority field

This ensures fresh installations include the priority system without
needing to run the separate migration.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: include priority field in task creation payload

When creating new tasks via TaskEditModal, the buildCreateRequest function
was not including the priority field, causing new tasks to fall back to
the database default ('medium') instead of respecting the user's selected
priority in the modal.

Added priority: localTask.priority || 'medium' to ensure the user's
chosen priority is sent to the API during task creation.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: make priority migration safe and idempotent

Replaced destructive DROP TYPE CASCADE with safe migration patterns:

- Use DO blocks with EXCEPTION handling for enum and column creation
- Prevent conflicts with complete_setup.sql for fresh installations
- Enhanced backfill logic to preserve user-modified priorities
- Only update tasks that haven't been modified (updated_at = created_at)
- Add comprehensive error handling with informative notices
- Migration can now be run multiple times safely

This ensures the migration works for both existing installations
(incremental migration) and fresh installations (complete_setup.sql)
without data loss or conflicts.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: enforce NOT NULL constraint on priority column

Data integrity improvements:

Migration (add_priority_column_to_tasks.sql):
- Add column as nullable first with DEFAULT 'medium'
- Update any NULL values to 'medium'
- Set NOT NULL constraint to enforce application invariants
- Safe handling for existing columns with proper constraint checking

Complete Setup (complete_setup.sql):
- Priority column now DEFAULT 'medium' NOT NULL for fresh installations
- Ensures consistency between migration and fresh install paths

Both paths now enforce priority field as required, matching the
frontend Task interface where priority is a required field.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: add priority support to task creation API

Complete priority support for task creation:

API Routes (projects_api.py):
- Add priority field to CreateTaskRequest Pydantic model
- Pass request.priority to TaskService.create_task call

Task Service (task_service.py):
- Add priority parameter to create_task method signature
- Add priority validation using existing validate_priority method
- Include priority field in database INSERT task_data
- Include priority field in API response task object

This ensures that new tasks created via TaskEditModal respect the
user's selected priority instead of falling back to database default.

Validation ensures only valid priority values (low, medium, high, critical)
are accepted and stored in the database.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: implement clean slate priority migration (no backward compatibility)

Remove all task_order to priority mapping logic for true decoupling:

- All existing tasks get 'medium' priority (clean slate)
- No complex CASE logic or task_order relationships
- Users explicitly set priorities as needed after migration
- Truly independent priority and visual ordering systems
- Simpler, safer migration with no coupling logic

This approach prioritizes clean architecture over preserving
implied user intentions from the old coupled system.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor: rename TaskPriority.tsx to TaskPriorityComponent.tsx for consistency

- Renamed file to match the exported component name
- Updated import in index.ts barrel export
- Maintains consistency with other component naming patterns

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
2025-09-17 13:44:25 +03:00