mirror of
https://github.com/coleam00/Archon.git
synced 2025-12-24 10:49:27 -05:00
- Preserve URL case in robots.txt parsing by only lowercasing the sitemap: prefix check - Add support for relative sitemap paths in robots.txt using urljoin() - Fix HTML meta tag parsing to use case-insensitive regex instead of lowercasing content - Add URL scheme validation for discovered sitemaps (http/https only) - Fix discovery target domain filtering to use discovered URL's domain instead of input URL - Clean up whitespace and improve dict comprehension usage These changes improve discovery reliability and prevent URL corruption while maintaining backward compatibility with existing discovery behavior. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>