Enhanced validation to catch malformed responses early:
- Validate total_count is non-negative number
- Verify total_count matches embedding_models.length
- Validate first model has required fields (id, provider, dimensions)
- Check dimensions are positive numbers
- Validate provider names are from expected set
- Provide specific error messages for each validation failure
Prevents caching invalid data and provides better debugging information.
Addresses CodeRabbit nitpick comment on PR #852🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implemented comprehensive validation to prevent crashes from corrupted cache:
- Created isCacheEntry() type guard to validate cache structure
- Parse JSON into unknown type (TypeScript strict mode compliant)
- Validate timestamp is number and data has OpenRouterModelListResponse shape
- Validate each model has all required fields with correct types
- Remove corrupted cache entries to avoid repeated failures
- No 'any' types used, full strict mode compliance
Prevents crashes from malformed cache data while maintaining type safety.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
1. Lazy initialization of baseUrl via getBaseUrl() method
- Allows API URL to be updated at runtime without stale URL issues
2. Runtime validation of API response structure
- Validates embedding_models array exists before caching
- Prevents invalid responses from being cached
Addresses CodeRabbit nitpick comments on PR #852🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Changed grid-cols-3 to grid-cols-4 for embedding provider selection
so all 4 embedding-capable providers (OpenAI, Google, OpenRouter, Ollama)
fit on one line, matching the chat provider layout.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implements OpenRouter as an embedding provider option, enabling access to multiple
embedding models (OpenAI, Google Gemini, Qwen3, Mistral) through a single API key.
Backend changes:
- Add validate_openrouter_api_key() for API key validation (sk-or-v1- format)
- Add OpenRouterErrorAdapter for error sanitization
- Add openrouter to valid providers in llm_provider_service
- Create openrouter_discovery_service with hardcoded model list
- Create /api/openrouter/models endpoint for model discovery
- Register OpenRouter router in FastAPI main app
Frontend changes:
- Create openrouterService.ts for model discovery API client
- Add OpenRouter to RAGSettings.tsx provider options
- Configure default models with provider prefix (openai/text-embedding-3-small)
- Add OpenRouter to embedding-capable providers list
Documentation:
- Update .env.example with OPENROUTER_API_KEY documentation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add WorkOrderLogsPanel with SSE streaming support
- Add RealTimeStats component for live metrics
- Add useWorkOrderLogs hook for SSE log streaming
- Add useLogStats hook for real-time statistics
- Update WorkOrderDetailView to display logs panel
- Add comprehensive tests for new components
- Configure Vite test environment
- Fix test mocks to use requests.Session for _check_url_exists
- Add url parameter to create_mock_response to prevent MagicMock issues
- Update all test scenarios to mock both requests.get and session.get
- Remove redundant UNSAFE_PROTOCOLS check in URL validation
- Fix test assertions to match new priority order (llms.txt > llms-full.txt)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Backend Improvements
### Discovery Service
- Fix SSRF protection: Use requests.Session() for max_redirects parameter
- Add comprehensive IP validation (_is_safe_ip, _resolve_and_validate_hostname)
- Add hostname DNS resolution validation before requests
- Fix llms.txt link following to crawl ALL same-domain pages (not just llms.txt files)
- Remove unused file variants: llms.md, llms.markdown, sitemap_index.xml, sitemap-index.xml
- Optimize DISCOVERY_PRIORITY based on real-world usage research
- Update priority: llms.txt > llms-full.txt > sitemap.xml > robots.txt
### URL Handler
- Fix .well-known path to be case-sensitive per RFC 8615
- Remove llms.md, llms.markdown, llms.mdx from variant detection
- Simplify link collection patterns to only .txt files (most common)
- Update llms_variants list to only include spec-compliant files
### Crawling Service
- Add tldextract for proper root domain extraction (handles .co.uk, .com.au, etc.)
- Replace naive domain extraction with robust get_root_domain() function
- Add tldextract>=5.0.0 to dependencies
## Frontend Improvements
### Type Safety
- Extend ActiveOperation type with discovery fields (discovered_file, discovered_file_type, linked_files)
- Remove all type casting (operation as any) from CrawlingProgress component
- Add proper TypeScript types for discovery information
### Security
- Create URL validation utility (urlValidation.ts)
- Only render clickable links for validated HTTP/HTTPS URLs
- Reject unsafe protocols (javascript:, data:, vbscript:, file:)
- Display invalid URLs as plain text instead of links
## Testing
- Update test mocks to include history and url attributes for redirect checking
- Fix .well-known case sensitivity tests (must be lowercase per RFC 8615)
- Update discovery priority tests to match new order
- Remove tests for deprecated file variants
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implements complete llms.txt link following functionality that crawls
linked llms.txt files on the same domain/subdomain, along with critical
bug fixes for discovery priority and variant detection.
Backend Core Functionality:
- Add _is_same_domain_or_subdomain method for subdomain matching
- Fix is_llms_variant to detect .txt files in /llms/ directories
- Implement llms.txt link extraction and following logic
- Add two-phase discovery: prioritize ALL llms.txt before sitemaps
- Enhanced progress reporting with discovery metadata
Critical Bug Fixes:
- Discovery priority: Fixed sitemap.xml being found before llms.txt
- is_llms_variant: Now matches /llms/guides.txt, /llms/swift.txt, etc.
- These were blocking bugs preventing link following from working
Frontend UI:
- Add discovery and linked files display to CrawlingProgress component
- Update progress types to include discoveredFile, linkedFiles fields
- Add new crawl types: llms_txt_with_linked_files, discovery_*
- Add "discovery" to ProgressStatus enum and active statuses
Testing:
- 8 subdomain matching unit tests (test_crawling_service_subdomain.py)
- 7 integration tests for link following (test_llms_txt_link_following.py)
- All 15 tests passing
- Validated against real Supabase llms.txt structure (1 main + 8 linked)
Files Modified:
Backend:
- crawling_service.py: Core link following logic (lines 744-788, 862-920)
- url_handler.py: Fixed variant detection (lines 633-665)
- discovery_service.py: Two-phase discovery (lines 137-214)
- 2 new comprehensive test files
Frontend:
- progress/types/progress.ts: Updated types with new fields
- progress/components/CrawlingProgress.tsx: Added UI sections
Real-world testing: Crawling supabase.com/docs now discovers
/docs/llms.txt and automatically follows 8 linked llms.txt files,
indexing complete documentation from all files.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- add trailing slashes to agent-work-orders endpoints to prevent FastAPI mount() redirects
- add defensive null check for repository_url in detail view
- fix backend routes to use relative paths with app.mount()
- resolves ERR_NAME_NOT_RESOLVED when accessing agent work orders