- Fix discovery service tests to match new single-file return format
- Remove obsolete tests for removed discovery methods
- Update progress mapper tests for new discovery stage ranges
- Fix stage range expectations after adding discovery stage (2,3)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add DiscoveryService with single-file priority selection
- Priority: llms-full.txt > llms.txt > llms.md > llms.mdx > sitemap.xml > robots.txt
- All files contain similar AI/crawling guidance, so only best one is needed
- Robots.txt sitemap declarations have highest priority
- Fallback to subdirectories for llms files
- Enhance URLHandler with discovery helper methods
- Add is_robots_txt, is_llms_variant, is_well_known_file, get_base_url methods
- Follow existing patterns with proper error handling
- Integrate discovery into CrawlingService orchestration
- When discovery finds file: crawl ONLY discovered file (not main URL)
- When no discovery: crawl main URL normally
- Fixes issue where both main URL + discovered file were crawled
- Add discovery stage to progress mapping
- New "discovery" stage in progress flow
- Clear progress messages for discovered files
- Comprehensive test coverage
- Tests for priority-based selection logic
- Tests for robots.txt priority and fallback behavior
- Updated existing tests for new return formats
Resolves efficient crawling by selecting single best guidance file instead
of crawling redundant content from multiple similar files.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>