archon

mirror of https://github.com/coleam00/Archon.git synced 2025-12-23 18:29:18 -05:00

Author	SHA1	Message	Date
sean-eskerium	0013336ee3	Merge branch 'main' of https://github.com/coleam00/Archon into feature/ui-style-guide	2025-10-09 20:42:08 -04:00
sean-eskerium	ad82f6e9f6	Another round of Coderabbit feedback.	2025-10-09 20:40:47 -04:00
Cole Medin	bfd0a84f64	RAG Enhancements (Page Level Retrieval) (#767 ) * Initial commit for RAG by document * Phase 2 * Adding migrations * Fixing page IDs for chunk metadata * Fixing unit tests, adding tool to list pages for source * Fixing page storage upsert issues * Max file length for retrieval * Fixing title issue * Fixing tests	2025-10-09 19:39:27 -05:00
sean-eskerium	c3f42504ea	code rabbit updates	2025-10-09 20:19:51 -04:00
sean-eskerium	98946817b4	Merge remote-tracking branch 'origin/main' into feature/ui-style-guide	2025-10-09 17:43:43 -04:00
sean-eskerium	02533dc37c	Fixing Code Rabbit suggestions.	2025-10-09 16:23:32 -04:00
DIY Smart Code	e6d538fdd8	Merge pull request #769 from coleam00/crawl4ai-update chore: update crawl4ai from 0.6.2 to 0.7.4	2025-10-09 21:52:36 +02:00
sean-eskerium	daf915c083	Fixes from biome and consistency review.	2025-10-09 14:26:37 -04:00
sean-eskerium	4e6116fa2f	Fix consistency and biome formatting issues	2025-10-09 13:49:12 -04:00
sean-eskerium	9e4c7eaf4e	Updating documentation and the review command refinement.	2025-10-09 13:35:24 -04:00
sean-eskerium	db538a5f46	Remove dead code	2025-10-09 12:14:36 -04:00
sean-eskerium	5c7924f43d	Merge main into feature/ui-style-guide - Resolved package-lock.json conflict - Kept Tailwind 4.1.2 upgrade from feature branch - Merged main's updates (react-icons, file reorganization, new features) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-09 11:53:27 -04:00
sean-eskerium	6f173e403d	remove prp docs	2025-10-09 11:49:41 -04:00
sean-eskerium	bebe4c1037	candidate for release	2025-10-09 11:49:03 -04:00
Wirasm	489415d723	Fix: Database timeout when deleting large sources (#737 ) * fix: implement CASCADE DELETE for source deletion timeout issue - Add migration 009 to add CASCADE DELETE constraints to foreign keys - Simplify delete_source() to only delete parent record - Database now handles cascading deletes efficiently - Fixes timeout issues when deleting sources with thousands of pages * chore: update complete_setup.sql to include CASCADE DELETE constraints - Add ON DELETE CASCADE to foreign keys in initial setup - Include migration 009 in the migrations tracking - Ensures new installations have CASCADE DELETE from the start	2025-10-09 17:52:06 +03:00
DIY Smart Code	00fe2599ad	Delete python/test_url_resolution_fix.py	2025-10-09 16:05:37 +02:00
DIY Smart Code	f9a506b9c9	Delete CRAWL4AI_UPDATE.md	2025-10-09 16:04:58 +02:00
sean-eskerium	2e68403db0	update styles of the primitives.	2025-10-09 09:51:50 -04:00
sean-eskerium	80992ca975	Epgrade to Tailwind 4	2025-10-09 09:31:47 -04:00
sean-eskerium	70b6e70a95	trying to make the ui reviews programmatic	2025-10-09 07:59:54 -04:00
sean-eskerium	4cb7c46d6e	fixing document browser and updating primitive tab styles.	2025-10-09 00:15:29 -04:00
sean-eskerium	17ca62ceb4	refining	2025-10-08 23:43:43 -04:00
sean-eskerium	5b839a1465	command for UI review, and settings to use primitives.	2025-10-08 18:38:12 -04:00
sean-eskerium	0727245c9d	Udate the projects layout. And style guide.	2025-10-08 17:37:29 -04:00
leex279	8deee6fd7a	chore: update crawl4ai from 0.6.2 to 0.7.4 Updates crawl4ai dependency to latest stable version with performance and stability improvements. Key improvements in 0.7.4: - LLM-powered table extraction with intelligent chunking - Fixed dispatcher bug for better concurrent processing - Resolved browser manager race conditions - Enhanced URL processing and proxy support All existing tests pass (18/18). No breaking changes identified. API remains backward compatible. ⚠️ IMPORTANT: URL Resolution Bug Status A critical bug in v0.6.2 where ../../ paths only go up ONE directory instead of TWO has been documented (see crawler-test branch). Status in v0.7.4 is UNKNOWN - testing required before production deployment. Test script provided: python/test_url_resolution_fix.py Related issues fixed in v0.7.x: - #570: General relative URL handling - #1268: URLs after redirects - #1323: Trailing slash base URL handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-08 22:27:15 +02:00
sean-eskerium	6e86fd0d9b	updates to style guide components	2025-10-08 13:50:04 -04:00
Josh	a580fdfe66	Feature/LLM-Providers-UI-Polished (#736 ) * Add Anthropic and Grok provider support * feat: Add crucial GPT-5 and reasoning model support for OpenRouter - Add requires_max_completion_tokens() function for GPT-5, o1, o3, Grok-3 series - Add prepare_chat_completion_params() for reasoning model compatibility - Implement max_tokens → max_completion_tokens conversion for reasoning models - Add temperature handling for reasoning models (must be 1.0 default) - Enhanced provider validation and API key security in provider endpoints - Streamlined retry logic (3→2 attempts) for faster issue detection - Add failure tracking and circuit breaker analysis for debugging - Support OpenRouter format detection (openai/gpt-5-nano, openai/o1-mini) - Improved Grok provider empty response handling with structured fallbacks - Enhanced contextual embedding with provider-aware model selection Core provider functionality: - OpenRouter, Grok, Anthropic provider support with full embedding integration - Provider-specific model defaults and validation - Secure API connectivity testing endpoints - Provider context passing for code generation workflows 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fully working model providers, addressing securtiy and code related concerns, throughly hardening our code * added multiprovider support, embeddings model support, cleaned the pr, need to fix health check, asnyico tasks errors, and contextual embeddings error * fixed contextual embeddings issue * - Added inspect-aware shutdown handling so get_llm_client always closes the underlying AsyncOpenAI / httpx.AsyncClient while the loop is still alive, with defensive logging if shutdown happens late (python/src/server/services/llm_provider_service.py:14, python/src/server/ services/llm_provider_service.py:520). * - Restructured get_llm_client so client creation and usage live in separate try/finally blocks; fallback clients now close without logging spurious Error creating LLM client when downstream code raises (python/src/server/services/llm_provider_service.py:335-556). - Close logic now sanitizes provider names consistently and awaits whichever aclose/close coroutine the SDK exposes, keeping the loop shut down cleanly (python/src/server/services/llm_provider_service.py:530-559). Robust JSON Parsing - Added _extract_json_payload to strip code fences / extra text returned by Ollama before json.loads runs, averting the markdown-induced decode errors you saw in logs (python/src/server/services/storage/code_storage_service.py:40-63). - Swapped the direct parse call for the sanitized payload and emit a debug preview when cleanup alters the content (python/src/server/ services/storage/code_storage_service.py:858-864). * added provider connection support * added provider api key not being configured warning * Updated get_llm_client so missing OpenAI keys automatically fall back to Ollama (matching existing tests) and so unsupported providers still raise the legacy ValueError the suite expects. The fallback now reuses _get_optimal_ollama_instance and rethrows ValueError(OpenAI API key not found and Ollama fallback failed) when it cant connect. Adjusted test_code_extraction_source_id.py to accept the new optional argument on the mocked extractor (and confirm its None when present). * Resolved a few needed code rabbit suggestion - Updated the knowledge API key validation to call create_embedding with the provider argument and removed the hard-coded OpenAI fallback (python/src/server/api_routes/knowledge_api.py). - Broadened embedding provider detection so prefixed OpenRouter/OpenAI model names route through the correct client (python/src/server/ services/embeddings/embedding_service.py, python/src/server/services/llm_provider_service.py). - Removed the duplicate helper definitions from llm_provider_service.py, eliminating the stray docstring that was causing the import-time syntax error. * updated via code rabbit PR review, code rabbit in my IDE found no issues and no nitpicks with the updates! what was done: Credential service now persists the provider under the uppercase key LLM_PROVIDER, matching the read path (no new EMBEDDING_PROVIDER usage introduced). Embedding batch creation stops inserting blank strings, logging failures and skipping invalid items before they ever hit the provider (python/src/server/services/embeddings/embedding_service.py). Contextual embedding prompts use real newline characters everywhereboth when constructing the batch prompt and when parsing the models response (python/src/server/services/embeddings/contextual_embedding_service.py). Embedding provider routing already recognizes OpenRouter-prefixed OpenAI models via is_openai_embedding_model; no further change needed there. Embedding insertion now skips unsupported vector dimensions instead of forcing them into the 1536-column, and the backoff loop uses await asyncio.sleep so we no longer block the event loop (python/src/server/services/storage/code_storage_service.py). RAG settings props were extended to include LLM_INSTANCE_NAME and OLLAMA_EMBEDDING_INSTANCE_NAME, and the debug log no longer prints API-key prefixes (the rest of the TanStack refactor/EMBEDDING_PROVIDER support remains deferred). * test fix * enhanced Openrouters parsing logic to automatically detect reasoning models and parse regardless of json output or not. this commit creates a robust way for archons parsing to work throughly with openrouter automatically, regardless of the model youre using, to ensure proper functionality with out breaking any generation capabilities! * updated ui llm interface, added seprate embeddings provider, made the system fully capabale of mix and matching llm providers (local and non local) for chat & embeddings. updated the ragsettings.tsx ui mainly, along with core functionality * added warning labels and updated ollama health checks * ready for review, fixed som error warnings and consildated ollama status health checks * fixed FAILED test_async_embedding_service.py * code rabbit fixes * Separated the code-summary LLM provider from the embedding provider, so code example storage now forwards a dedicated embedding provider override end-to-end without hijacking the embedding pipeline. this fixes code rabbits (Preserve provider override in create_embeddings_batch) suggesting * - Swapped API credential storage to booleans so decrypted keys never sit in React state (archon-ui-main/src/components/ settings/RAGSettings.tsx). - Normalized Ollama instance URLs and gated the metrics effect on real state changes to avoid mis-counts and duplicate fetches (RAGSettings.tsx). - Tightened crawl progress scaling and indented-block parsing to handle min_length=None safely (python/src/server/ services/crawling/code_extraction_service.py:160, python/src/server/services/crawling/code_extraction_service.py:911). - Added provider-agnostic embedding rate-limit retries so Google and friends back off gracefully (python/src/server/ services/embeddings/embedding_service.py:427). - Made the orchestration registry async + thread-safe and updated every caller to await it (python/src/server/services/ crawling/crawling_service.py:34, python/src/server/api_routes/knowledge_api.py:1291). * Update RAGSettings.tsx - header for 'LLM Settings' is now 'LLM Provider Settings' * (RAG Settings) - Ollama Health Checks & Metrics - Added a 10-second timeout to the health fetch so it doesn't hang. - Adjusted logic so metric refreshes run for embedding-only Ollama setups too. - Initial page load now checks Ollama if either chat or embedding provider uses it. - Metrics and alerts now respect which provider (chat/embedding) is currently selected. - Provider Sync & Alerts - Fixed a sync bug so the very first provider change updates settings as expected. - Alerts now track the active provider (chat vs embedding) rather than only the LLM provider. - Warnings about missing credentials now skip whichever provider is currently selected. - Modals & Types - Normalize URLs before handing them to selection modals to keep consistent data. - Strengthened helper function types (getDisplayedChatModel, getModelPlaceholder, etc.). (Crawling Service) - Made the orchestration registry lock lazy-initialized to avoid issues in Python 3.12 and wrapped registry commands (register, unregister) in async calls. This keeps things thread-safe even during concurrent crawling and cancellation. * - migration/complete_setup.sql:101 seeds Google/OpenRouter/Anthropic/Grok API key rows so fresh databases expose every provider by default. - migration/0.1.0/009_add_provider_placeholders.sql:1 backfills the same rows for existing Supabase instances and records the migration. - archon-ui-main/src/components/settings/RAGSettings.tsx:121 introduces a shared credentialprovider map, reloadApiCredentials runs through all five providers, and the status poller includes the new keys. - archon-ui-main/src/components/settings/RAGSettings.tsx:353 subscribes to the archon:credentials-updated browser event so adding/removing a key immediately refetches credential status and pings the corresponding connectivity test. - archon-ui-main/src/components/settings/RAGSettings.tsx:926 now treats missing Anthropic/OpenRouter/Grok keys as missing, preventing stale connected badges when a key is removed. * - archon-ui-main/src/components/settings/RAGSettings.tsx:90 adds a simple display-name map and reuses one red alert style. - archon-ui-main/src/components/settings/RAGSettings.tsx:1016 now shows exactly one red banner when the active provider - Removed the old duplicate Missing API Key Configuration block, so the panel no longer stacks two warnings. * Update credentialsService.ts default model * updated the google embedding adapter for multi dimensional rag querying * thought this micro fix in the google embedding pushed with the embedding update the other day, it didnt. pushing now --------- Co-authored-by: Chillbruhhh <joshchesser97@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-10-05 13:49:09 -05:00
leex279	d3cecd2b1d	Merge branch 'main' into feature/automatic-discovery-llms-sitemap-430	2025-09-22 22:35:36 +02:00
sean-eskerium	2150076f14	Making the style guide a feature to turn on and off.	2025-09-22 12:51:44 -04:00
Wirasm	63a92cf7d7	refactor: reorganize features/shared directory for better maintainability (#730 ) * refactor: reorganize features/shared directory structure - Created organized subdirectories for better code organization: - api/ - API clients and HTTP utilities (renamed apiWithEtag.ts to apiClient.ts) - config/ - Configuration files (queryClient, queryPatterns) - types/ - Shared type definitions (errors) - utils/ - Pure utility functions (optimistic, clipboard) - hooks/ - Shared React hooks (already existed) - Updated all import paths across the codebase (~40+ files) - Updated all AI documentation in PRPs/ai_docs/ to reflect new structure - All tests passing, build successful, no functional changes This improves maintainability and follows vertical slice architecture patterns. Co-Authored-By: Claude <noreply@anthropic.com> * fix: address PR review comments and code improvements - Update imports to use @/features alias path for optimistic utils - Fix optimistic upload item replacement by matching on source_id instead of id - Clean up test suite naming and remove meta-terms from comments - Only set Content-Type header on requests with body - Add explicit TypeScript typing to useProjectFeatures hook - Complete Phase 4 improvements with proper query typing * fix: address additional PR review feedback - Clear feature queries when deleting project to prevent cache memory leaks - Update KnowledgeCard comments to follow documentation guidelines - Add explanatory comment for accessibility pattern in KnowledgeCard --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-22 14:59:33 +03:00
Wirasm	d3a5c3311a	refactor: move shared hooks from ui/hooks to shared/hooks (#729 ) Reorganize hook structure to follow vertical slice architecture: - Move useSmartPolling, useThemeAware, useToast to features/shared/hooks - Update 38+ import statements across codebase - Update test file mocks to reference new locations - Remove old ui/hooks directory This change aligns shared utilities with the architectural pattern where truly shared code resides in the shared directory. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>	2025-09-22 12:54:55 +03:00
Cole Medin	3ff3f7f2dc	Migrations and version APIs (#718 ) * Preparing migration folder for the migration alert implementation * Migrations and version APIs initial * Touching up update instructions in README and UI * Unit tests for migrations and version APIs * Splitting up the Ollama migration scripts * Removing temporary PRPs --------- Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>	2025-09-22 12:25:58 +03:00
Jonah Gray	7a4c67cf90	fix: resolve TypeScript strict mode errors in providerErrorHandler.ts (#720 ) * fix: resolve TypeScript strict mode errors in providerErrorHandler.ts - Add proper type guards for error object property access - Create ErrorWithStatus and ErrorWithMessage interfaces - Implement hasStatusProperty() and hasMessageProperty() type guards - Replace unsafe object property access with type-safe checks - All 8 TypeScript strict mode errors now resolved - Maintains existing functionality for LLM provider error handling Fixes #686 * fix: apply biome linting improvements to providerErrorHandler.ts - Use optional chaining instead of logical AND for property access - Improve formatting for better readability - Maintain all existing functionality while addressing linter warnings * chore: remove .claude-flow directory - Remove unnecessary .claude-flow metrics files - Clean up repository structure * Add comprehensive test coverage for providerErrorHandler TypeScript strict mode fixes - Added 24 comprehensive tests for parseProviderError and getProviderErrorMessage - Tests cover all error scenarios: basic errors, status codes, structured provider errors, malformed JSON, null/undefined handling, and TypeScript strict mode compliance - Fixed null/undefined handling in parseProviderError to properly return fallback messages - All tests passing (24/24) ensuring TypeScript strict mode fixes work correctly - Validates error handling for OpenAI, Google AI, Anthropic, and other LLM providers Related to PR #720 TypeScript strict mode compliance --------- Co-authored-by: OmniNode CI <noreply@omninode.ai>	2025-09-22 11:23:20 +03:00
Josh	394ac1befa	Feat:Openrouter/Anthropic/grok-support (#231 ) * Add Anthropic and Grok provider support * feat: Add crucial GPT-5 and reasoning model support for OpenRouter - Add requires_max_completion_tokens() function for GPT-5, o1, o3, Grok-3 series - Add prepare_chat_completion_params() for reasoning model compatibility - Implement max_tokens → max_completion_tokens conversion for reasoning models - Add temperature handling for reasoning models (must be 1.0 default) - Enhanced provider validation and API key security in provider endpoints - Streamlined retry logic (3→2 attempts) for faster issue detection - Add failure tracking and circuit breaker analysis for debugging - Support OpenRouter format detection (openai/gpt-5-nano, openai/o1-mini) - Improved Grok provider empty response handling with structured fallbacks - Enhanced contextual embedding with provider-aware model selection Core provider functionality: - OpenRouter, Grok, Anthropic provider support with full embedding integration - Provider-specific model defaults and validation - Secure API connectivity testing endpoints - Provider context passing for code generation workflows 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fully working model providers, addressing securtiy and code related concerns, throughly hardening our code * added multiprovider support, embeddings model support, cleaned the pr, need to fix health check, asnyico tasks errors, and contextual embeddings error * fixed contextual embeddings issue * - Added inspect-aware shutdown handling so get_llm_client always closes the underlying AsyncOpenAI / httpx.AsyncClient while the loop is still alive, with defensive logging if shutdown happens late (python/src/server/services/llm_provider_service.py:14, python/src/server/ services/llm_provider_service.py:520). * - Restructured get_llm_client so client creation and usage live in separate try/finally blocks; fallback clients now close without logging spurious Error creating LLM client when downstream code raises (python/src/server/services/llm_provider_service.py:335-556). - Close logic now sanitizes provider names consistently and awaits whichever aclose/close coroutine the SDK exposes, keeping the loop shut down cleanly (python/src/server/services/llm_provider_service.py:530-559). Robust JSON Parsing - Added _extract_json_payload to strip code fences / extra text returned by Ollama before json.loads runs, averting the markdown-induced decode errors you saw in logs (python/src/server/services/storage/code_storage_service.py:40-63). - Swapped the direct parse call for the sanitized payload and emit a debug preview when cleanup alters the content (python/src/server/ services/storage/code_storage_service.py:858-864). * added provider connection support * added provider api key not being configured warning * Updated get_llm_client so missing OpenAI keys automatically fall back to Ollama (matching existing tests) and so unsupported providers still raise the legacy ValueError the suite expects. The fallback now reuses _get_optimal_ollama_instance and rethrows ValueError(OpenAI API key not found and Ollama fallback failed) when it cant connect. Adjusted test_code_extraction_source_id.py to accept the new optional argument on the mocked extractor (and confirm its None when present). * Resolved a few needed code rabbit suggestion - Updated the knowledge API key validation to call create_embedding with the provider argument and removed the hard-coded OpenAI fallback (python/src/server/api_routes/knowledge_api.py). - Broadened embedding provider detection so prefixed OpenRouter/OpenAI model names route through the correct client (python/src/server/ services/embeddings/embedding_service.py, python/src/server/services/llm_provider_service.py). - Removed the duplicate helper definitions from llm_provider_service.py, eliminating the stray docstring that was causing the import-time syntax error. * updated via code rabbit PR review, code rabbit in my IDE found no issues and no nitpicks with the updates! what was done: Credential service now persists the provider under the uppercase key LLM_PROVIDER, matching the read path (no new EMBEDDING_PROVIDER usage introduced). Embedding batch creation stops inserting blank strings, logging failures and skipping invalid items before they ever hit the provider (python/src/server/services/embeddings/embedding_service.py). Contextual embedding prompts use real newline characters everywhereboth when constructing the batch prompt and when parsing the models response (python/src/server/services/embeddings/contextual_embedding_service.py). Embedding provider routing already recognizes OpenRouter-prefixed OpenAI models via is_openai_embedding_model; no further change needed there. Embedding insertion now skips unsupported vector dimensions instead of forcing them into the 1536-column, and the backoff loop uses await asyncio.sleep so we no longer block the event loop (python/src/server/services/storage/code_storage_service.py). RAG settings props were extended to include LLM_INSTANCE_NAME and OLLAMA_EMBEDDING_INSTANCE_NAME, and the debug log no longer prints API-key prefixes (the rest of the TanStack refactor/EMBEDDING_PROVIDER support remains deferred). * test fix * enhanced Openrouters parsing logic to automatically detect reasoning models and parse regardless of json output or not. this commit creates a robust way for archons parsing to work throughly with openrouter automatically, regardless of the model youre using, to ensure proper functionality with out breaking any generation capabilities! --------- Co-authored-by: Chillbruhhh <joshchesser97@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-09-22 10:36:30 +03:00
sean-eskerium	f188d3a27a	First implementation of the style guide.	2025-09-22 02:52:45 -04:00
sean-eskerium	4c910c1471	Merge pull request #721 from coleam00/fix/ollama-default-docker-address fix: Change Ollama default URL to host.docker.internal for Docker compatibility	2025-09-20 15:17:09 -07:00
John Fitzpatrick	aaca437fdc	fix: Update remaining localhost placeholder in OllamaConfigurationPanel Missed updating the placeholder text for new instance URL input field. Changed from localhost:11434 to host.docker.internal:11434 for consistency.	2025-09-20 13:51:24 -07:00
John Fitzpatrick	2f486e5b21	test: Update test expectations for new Ollama default URL Updated test_async_llm_provider_service.py to expect host.docker.internal instead of localhost for Ollama URLs to match the new default configuration.	2025-09-20 13:44:23 -07:00
John Fitzpatrick	d4e80a945a	fix: Change Ollama default URL to host.docker.internal for Docker compatibility - Changed default Ollama URL from localhost:11434 to host.docker.internal:11434 - This allows Docker containers to connect to Ollama running on the host machine - Updated in backend services, frontend components, migration scripts, and documentation - Most users run Archon in Docker but Ollama as a local binary, making this a better default	2025-09-20 13:36:33 -07:00
Cole Medin	035f90e721	Codex mcp instructions (#719 ) * Add Codex MCP configuration instructions - Added Codex as a supported IDE in the MCP configuration UI - Removed Augment (duplicate of Cursor configuration) - Positioned Codex between Gemini and Cursor in the tab order - Added platform-specific configuration support for Windows vs Linux/macOS - Includes step-by-step instructions for installing mcp-remote and configuring Codex - Shows appropriate TOML configuration based on detected platform 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Finalizing Codex instructions --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-20 14:51:07 -05:00
Cole Medin	e9c08d2fe9	Updating RAG SIMILARITY_THRESHOLD to 0.05	2025-09-20 13:57:13 -05:00
Cole Medin	b1085a53df	Removing junk from sitemap and full site (recursive) crawls (#711 ) * Removing junk from sitemap and full site (recursive) crawls * Small typo fix for result.markdown	2025-09-20 12:58:42 -05:00
Cole Medin	c3be65322b	Improved MCP and global rules instructions (#705 )	2025-09-20 12:58:20 -05:00
leex279	597fc86c39	fix: Skip link extraction for discovery targets (single-file mode) When a file is selected through discovery, it should be crawled as a single file without following any links contained within it. This preserves the efficiency gains of the discovery feature. Changes: - Skip link extraction when is_discovery_target is true for link collection files - Return sitemap metadata without crawling URLs when is_discovery_target is true - Add clear logging to indicate single-file mode is active This ensures discovered files (llms.txt, sitemap.xml, etc.) are processed as single authoritative sources rather than starting recursive crawls, which aligns with the PR's objective of efficient single-file discovery and crawling.	2025-09-20 13:55:15 +02:00
leex279	c1677a9220	fix: Skip discovery when user provides direct discovery file URLs When a user directly provides a URL to a discovery file (sitemap.xml, llms.txt, robots.txt, etc.), the system now skips the discovery phase and uses the provided file directly. This prevents unnecessary discovery attempts and respects the user's explicit choice. Changes: - Check if the URL is already a discovery target before running discovery - Skip discovery for: sitemap files, llms variants, robots.txt, well-known files, and any .txt files - Add logging to indicate when discovery is skipped Example: When crawling 'xyz.com/sitemap.xml' directly, the system will now use that sitemap instead of trying to discover a different file like llms.txt	2025-09-20 13:34:07 +02:00
leex279	7f74aea476	fix: Discovery now respects given URL path and fix method signature mismatches Two critical fixes for the automatic discovery feature: 1. Discovery Service path handling: - Changed from always using root domain (/) to respecting given URL path - e.g., for 'supabase.com/docs', now checks 'supabase.com/docs/robots.txt' - Previously incorrectly checked 'supabase.com/robots.txt' - Fixed all urljoin calls to use relative paths instead of absolute paths 2. Method signature mismatches: - Removed start_progress and end_progress parameters from crawl_batch_with_progress - Removed same parameters from crawl_recursive_with_progress - Fixed all calls to these methods to match the strategy implementations These fixes ensure discovery works correctly for subdirectory URLs and prevents TypeError crashes during crawling.	2025-09-20 13:06:41 +02:00
leex279	0a2c43f6b4	fix: Update test assertions for proper rounding behavior in progress mapper The progress mapper uses Python's round() function which rounds to nearest even number (banker's rounding). Updated test assertions to match actual rounding behavior: - 3.5 rounds to 4 (not 3) - 7.63 rounds to 8 (not 7) - 9.5 rounds to 10 (not 9) All tests now pass successfully.	2025-09-20 09:47:43 +02:00
leex279	2be44d1bf4	fix: Resolve syntax error from merge conflict resolution - Fixed duplicate else statement in crawling_service.py - Corrected indentation for extracted links processing logic - Syntax now validates successfully	2025-09-20 09:33:47 +02:00
leex279	8072066ee6	Merge main into feature/automatic-discovery-llms-sitemap-430 - Resolved conflicts in progress_mapper.py to include discovery stage (3-4%) - Resolved conflicts in crawling_service.py to maintain both discovery feature and main improvements - Resolved conflicts in test_progress_mapper.py to include tests for discovery stage - Kept all optimizations and improvements from main - Maintained discovery feature functionality with proper integration	2025-09-20 09:27:36 +02:00
Wirasm	37994191fc	refactor: Phase 5 - Remove manual cache invalidations (#707 ) * chore, cleanup leftovers of tanstack refactoring * refactor: Complete Phase 5 - Remove manual cache invalidations - Removed all manual cache invalidations from knowledge queries - Updated task queries to rely on backend consistency - Fixed optimistic update utilities to handle edge cases - Cleaned up unused imports and test utilities - Fixed minor TypeScript issues in UI components Backend now ensures data consistency through proper transaction handling, eliminating the need for frontend cache coordination. * docs: Enhance TODO comment for knowledge optimistic update issue - Added comprehensive explanation of the query key mismatch issue - Documented current behavior and impact on user experience - Listed potential solutions with tradeoffs - Created detailed PRP story in PRPs/local/ for future implementation - References specific line numbers and implementation details This documents a known limitation where optimistic updates to knowledge items are invisible because mutations update the wrong query cache.	2025-09-19 14:26:05 +03:00

1 2 3 4 5 ...

356 Commits