mirror of
https://github.com/coleam00/Archon.git
synced 2025-12-24 10:49:27 -05:00
Backend fixes for crawling stability: - Add comment clarifying DomainFilter doesn't need init params - Improve base URL selection in recursive strategy: - Check start_urls length before indexing - Use appropriate base URL for domain checks - Fallback to original_url when start_urls is empty - Add error handling for domain filter: - Wrap is_url_allowed in try/except block - Log exceptions and conservatively skip URLs on error - Prevents domain filter exceptions from crashing crawler - Better handling of relative URL resolution These changes ensure more robust crawling especially when: - start_urls array is empty - Domain filter encounters unexpected URLs - Relative links need proper base URL resolution