archon

mirror of https://github.com/coleam00/Archon.git synced 2025-12-23 18:29:18 -05:00

Author	SHA1	Message	Date
sean-eskerium	a378c43cee	Merge pull request #810 from coleam00/fix/bug-report-repository-url fix: Update bug report to use centralized repository configuration	2025-10-23 06:58:39 -04:00
DIY Smart Code	68fb4a8866	Merge pull request #622 from coleam00/feature/automatic-discovery-llms-sitemap-430 feat: Implement priority-based automatic discovery of llms.txt and sitemap.xml files	2025-10-19 16:09:24 +02:00
leex279	35c9ea9080	fix: update test to use 'pages' terminology for llms.txt Aligns test expectations with the llms.txt specification which uses 'pages' rather than 'files' terminology. The implementation correctly uses "llms_txt_with_linked_pages" - this updates the test to match. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-19 16:02:39 +02:00
leex279	46ae55310f	fix: add tldextract to all dependency group The tldextract package was missing from the 'all' dependency group, causing CI test failures. It was already in the 'server' group but needed in 'all' for running unit tests in CI/CD. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-19 15:52:59 +02:00
leex279	957d8b94fb	fix: Update tests for requests.Session mock and cleanup URL validation - Fix test mocks to use requests.Session for _check_url_exists - Add url parameter to create_mock_response to prevent MagicMock issues - Update all test scenarios to mock both requests.get and session.get - Remove redundant UNSAFE_PROTOCOLS check in URL validation - Fix test assertions to match new priority order (llms.txt > llms-full.txt) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-19 15:43:12 +02:00
leex279	13796abbe8	feat: Improve discovery system with SSRF protection and optimize file detection ## Backend Improvements ### Discovery Service - Fix SSRF protection: Use requests.Session() for max_redirects parameter - Add comprehensive IP validation (_is_safe_ip, _resolve_and_validate_hostname) - Add hostname DNS resolution validation before requests - Fix llms.txt link following to crawl ALL same-domain pages (not just llms.txt files) - Remove unused file variants: llms.md, llms.markdown, sitemap_index.xml, sitemap-index.xml - Optimize DISCOVERY_PRIORITY based on real-world usage research - Update priority: llms.txt > llms-full.txt > sitemap.xml > robots.txt ### URL Handler - Fix .well-known path to be case-sensitive per RFC 8615 - Remove llms.md, llms.markdown, llms.mdx from variant detection - Simplify link collection patterns to only .txt files (most common) - Update llms_variants list to only include spec-compliant files ### Crawling Service - Add tldextract for proper root domain extraction (handles .co.uk, .com.au, etc.) - Replace naive domain extraction with robust get_root_domain() function - Add tldextract>=5.0.0 to dependencies ## Frontend Improvements ### Type Safety - Extend ActiveOperation type with discovery fields (discovered_file, discovered_file_type, linked_files) - Remove all type casting (operation as any) from CrawlingProgress component - Add proper TypeScript types for discovery information ### Security - Create URL validation utility (urlValidation.ts) - Only render clickable links for validated HTTP/HTTPS URLs - Reject unsafe protocols (javascript:, data:, vbscript:, file:) - Display invalid URLs as plain text instead of links ## Testing - Update test mocks to include history and url attributes for redirect checking - Fix .well-known case sensitivity tests (must be lowercase per RFC 8615) - Update discovery priority tests to match new order - Remove tests for deprecated file variants 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-19 15:31:08 +02:00
leex279	ddcd364cb5	docs: Remove PRPs/llms-txt-link-following.md - not needed in repo	2025-10-19 11:30:42 +02:00
DIY Smart Code	dead282dc9	Merge pull request #809 from coleam00/quick-test-url-update Update quick test URL to llms.txt	2025-10-19 11:19:59 +02:00
leex279	fe95a0ab00	feat: Add Markdown issue template to support bug report pre-filling GitHub's YAML templates (.yml) don't support URL parameter pre-filling, but Markdown templates (.md) do. This adds a structured bug report template that allows the automated bug reporter to pre-fill all user-submitted data. Changes: - Create .github/ISSUE_TEMPLATE/auto_bug_report.md template - Update bug_report_api.py to use template=auto_bug_report.md parameter - Update tests to verify template parameter is included in URL - Add explanatory comments about YAML vs Markdown template differences Benefits: - Users see a structured bug report template (not generic issue form) - All bug report data is pre-filled from the UI form - Template provides consistent formatting and organization - Better UX than generic issue creation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-18 12:27:18 +02:00
leex279	2f6ad22235	fix: Remove template parameter from bug report URL to enable field pre-filling GitHub's issue creation URL does not support the 'template' parameter for pre-filling fields. When a template is specified, GitHub ignores other URL parameters like title and body, preventing user-submitted data from being pre-filled in the issue form. Changes: - Remove 'template=bug_report.yml' parameter (non-existent template) - Remove 'labels' parameter (not supported via URL) - Keep only 'title' and 'body' parameters for proper pre-filling - Add explanatory comment about GitHub's URL parameter limitations - Update tests to verify URL structure (no template parameter) Now when users click "Report Bug", the GitHub issue form will be properly pre-filled with their title and detailed bug report information. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-17 23:18:25 +02:00
leex279	a68cbec12e	fix: Update bug report to use centralized repository configuration from version.py Fixes #802 The bug report feature was redirecting users to the old repository URL (dynamous-community/Archon-V2-Alpha) instead of the current repository (coleam00/Archon). This occurred because hardcoded default values in the bug report API were not updated during the Alpha-to-Beta rebranding. Changes: - Import GITHUB_REPO_OWNER and GITHUB_REPO_NAME from version.py - Update GitHubService.__init__() to construct default from constants - Update health check endpoint to use same centralized default - Add comprehensive integration tests for bug report URL generation - Document repository configuration in CLAUDE.md The fix ensures single source of truth for repository information and maintains backward compatibility with GITHUB_REPO environment variable override. All tests pass (7/7) validating correct repository URL usage. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-17 23:11:26 +02:00
leex279	8ab6c754fe	fix: Improve path detection and add progress validation - Replace dot-based file detection with explicit extension checking in discovery service to correctly handle versioned directories like /docs.v2 - Add comprehensive validation for start_progress and end_progress parameters in crawl_markdown_file to ensure they are valid numeric values in range [0, 100] with start < end - Validation runs before any async work or progress reporting begins - Clear error messages indicate which parameter is invalid and why 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-17 22:57:55 +02:00
leex279	cdf4323534	feat: Implement llms.txt link following with discovery priority fix Implements complete llms.txt link following functionality that crawls linked llms.txt files on the same domain/subdomain, along with critical bug fixes for discovery priority and variant detection. Backend Core Functionality: - Add _is_same_domain_or_subdomain method for subdomain matching - Fix is_llms_variant to detect .txt files in /llms/ directories - Implement llms.txt link extraction and following logic - Add two-phase discovery: prioritize ALL llms.txt before sitemaps - Enhanced progress reporting with discovery metadata Critical Bug Fixes: - Discovery priority: Fixed sitemap.xml being found before llms.txt - is_llms_variant: Now matches /llms/guides.txt, /llms/swift.txt, etc. - These were blocking bugs preventing link following from working Frontend UI: - Add discovery and linked files display to CrawlingProgress component - Update progress types to include discoveredFile, linkedFiles fields - Add new crawl types: llms_txt_with_linked_files, discovery_* - Add "discovery" to ProgressStatus enum and active statuses Testing: - 8 subdomain matching unit tests (test_crawling_service_subdomain.py) - 7 integration tests for link following (test_llms_txt_link_following.py) - All 15 tests passing - Validated against real Supabase llms.txt structure (1 main + 8 linked) Files Modified: Backend: - crawling_service.py: Core link following logic (lines 744-788, 862-920) - url_handler.py: Fixed variant detection (lines 633-665) - discovery_service.py: Two-phase discovery (lines 137-214) - 2 new comprehensive test files Frontend: - progress/types/progress.ts: Updated types with new fields - progress/components/CrawlingProgress.tsx: Added UI sections Real-world testing: Crawling supabase.com/docs now discovers /docs/llms.txt and automatically follows 8 linked llms.txt files, indexing complete documentation from all files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-17 22:05:15 +02:00
leex279	a03ce1e4fd	fix: Respect llms.txt priority over robots.txt sitemap declarations Remove the special case that gave robots.txt sitemap declarations highest priority, which incorrectly overrode the global priority order. Now properly respects the intended priority: llms-full.txt > llms.txt > llms.md > llms.mdx > sitemap.xml > robots.txt. This fixes the issue where supabase.com/docs would return sitemap.xml instead of llms.txt even though both files exist at /docs/ and llms.txt should have higher priority. Changes: - Removed robots.txt early return that bypassed priority order - Updated test to verify llms files take precedence over robots.txt sitemaps - All discovery now follows consistent DISCOVERY_PRIORITY order 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-17 19:37:14 +02:00
leex279	8777e9456c	feat: Prioritize same-directory discovery for llms.txt and sitemaps Improve discovery logic to check the same directory as the base URL first before falling back to root-level and subdirectories. This ensures files like https://supabase.com/docs/llms.txt are found when crawling https://supabase.com/docs. Changes: - Check same directory as base_url first (e.g., /docs/llms.txt for /docs URL) - Fall back to root-level urljoin behavior - Include base directory name in subdirectory checks (e.g., /docs subdirectory) - Maintain priority order: same-dir > root > subdirectories - Log discovery location for better debugging This addresses cases where documentation directories contain their own llms.txt or sitemap files that should take precedence over root-level files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-17 19:26:24 +02:00
leex279	e5160dde5c	fix: Address CodeRabbit feedback for discovery service - Preserve URL case in robots.txt parsing by only lowercasing the sitemap: prefix check - Add support for relative sitemap paths in robots.txt using urljoin() - Fix HTML meta tag parsing to use case-insensitive regex instead of lowercasing content - Add URL scheme validation for discovered sitemaps (http/https only) - Fix discovery target domain filtering to use discovered URL's domain instead of input URL - Clean up whitespace and improve dict comprehension usage These changes improve discovery reliability and prevent URL corruption while maintaining backward compatibility with existing discovery behavior. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-17 19:03:25 +02:00
Claude Code	a98f778b9c	Update quick test URL to use llms.txt instead of llms-full.txt Changed the example URL in the Quick Test section from https://ai.pydantic.dev/llms-full.txt to https://ai.pydantic.dev/llms.txt for a more concise test example. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-17 16:39:24 +00:00
leex279	968e5b73fe	Add SSL verification and response size limits to discovery service - Enable SSL certificate verification (verify=True) for all HTTP requests - Implement streaming with size limits (10MB default) to prevent memory exhaustion - Add _read_response_with_limit() helper for secure response reading - Update all test mocks to support streaming API with iter_content() - Fix test assertions to expect new security parameters - Enforce deterministic rounding in progress mapper tests Security improvements: - Prevents MITM attacks through SSL verification - Guards against DoS via oversized responses - Ensures proper resource cleanup with response.close() 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-14 22:31:19 +02:00
Cole Medin	3f0815b686	Add YouTube setup tutorial and example workflow link to README 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-11 18:00:26 -05:00
Cole Medin	c8dda39eb7	adding number one repo of the day to readme 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-11 08:37:50 -05:00
Cole Medin	a9e2430c20	fix: Use direct_prompt instead of custom_instructions for agent mode	2025-10-11 07:56:25 -05:00
Cole Medin	7a4f49a20c	fix: Set Claude Code action to agent mode for workflow_dispatch	2025-10-11 07:54:20 -05:00
Cole Medin	49f23a9b84	fix: Change release notes workflow to manual trigger only (workflow_dispatch)	2025-10-11 07:52:35 -05:00
Cole Medin	2a75b9902e	fix: Switch release notes workflow to use github_token instead of OIDC	2025-10-11 07:50:42 -05:00
Cole Medin	dc03ec1904	Adjusting permissions in release note workflow	2025-10-11 07:41:43 -05:00
Cole Medin	cc38f87705	Adding GitHub token to release workflow.	2025-10-11 07:22:05 -05:00
DIY Smart Code	f83786f723	feat: AI-powered release notes generator with Claude (#773 ) * feat: Add AI-powered release notes generator with Claude Implements automated release notes generation using Claude AI for releases and branch comparisons. ## Features - GitHub Action Workflow: Automatically generates release notes when tags are pushed or releases are created - Local Testing Script: Test release note generation locally before pushing - Branch Comparison Support: Compare stable vs main branches or any two refs (tags, branches, commits) - Smart Branch Resolution: Automatically resolves local/remote branches (e.g., stable → origin/stable) - Comprehensive Release Notes: Includes features, improvements, bug fixes, technical changes, statistics, and contributors ## Files Added - `.github/workflows/release-notes.yml` - GitHub Action for automated release notes - `.github/RELEASE_NOTES_SETUP.md` - Complete setup guide and usage documentation - `.github/test-release-notes.sh` - Local testing script with branch comparison support - `.gitignore` - Exclude local release notes test files ## Usage ### Local Testing ```bash export ANTHROPIC_API_KEY="sk-ant-..." ./.github/test-release-notes.sh # Compare origin/stable..main ./.github/test-release-notes.sh stable main # Explicit branches ./.github/test-release-notes.sh v1.0.0 v2.0.0 # Compare tags ``` ### GitHub Action 1. Add `ANTHROPIC_API_KEY` to repository secrets 2. Push a tag: `git tag v1.0.0 && git push origin v1.0.0` 3. Release notes are automatically generated and added to the GitHub release ## Technical Details - Uses Claude Sonnet 4 for intelligent content analysis - Properly escapes JSON using jq for robust handling of special characters - Supports multiple comparison formats: tags, branches, commit hashes - Cost: ~$0.003 per release (~$0.036/year for monthly releases) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * refactor: Use Claude Code OAuth token for GitHub Action, keep API key for local testing Changes the GitHub Actions workflow to use Claude Code OAuth authentication (consistent with claude-review workflow) while keeping direct API key authentication for local testing. ## Changes ### GitHub Actions Workflow - Before: Direct API calls with `ANTHROPIC_API_KEY` - After: Uses `anthropics/claude-code-action@beta` with `CLAUDE_CODE_OAUTH_TOKEN` ### Benefits - ✅ Consistent authentication with existing `claude-review` workflow - ✅ Better GitHub integration through Claude Code Action - ✅ No additional API costs (included in Claude Code subscription) - ✅ Same secret (`CLAUDE_CODE_OAUTH_TOKEN`) works for both workflows ### Local Testing - Unchanged: Still uses `ANTHROPIC_API_KEY` for direct API calls - Simple, fast iteration during development - No dependency on Claude Code Action locally ## Implementation Details The workflow now: 1. Prepares all release context in a `release-context.md` file 2. Uses Claude Code Action to read the context and generate release notes 3. Writes output to `release_notes.md` 4. Validates the generated file before creating/updating the release ## Documentation Updates - Updated setup instructions to use `CLAUDE_CODE_OAUTH_TOKEN` - Added section explaining authentication differences - Clarified cost implications (OAuth has no additional costs) - Notes that same token works for both `claude-review` and release notes workflows 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-11 07:15:02 -05:00
leex279	d696918ff0	Merge main into feature/automatic-discovery-llms-sitemap-430 Resolved merge conflicts by integrating features from both branches: - Added page_storage_ops service initialization from main - Merged link text extraction with discovery mode features - Preserved discovery single-file mode and domain filtering - Maintained link text fallbacks for title extraction 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-11 09:31:24 +02:00
sean-eskerium	4050c3540a	Merge pull request #777 from coleam00/refactor/projects-ui Refactor the UI and add Documents back.	2025-10-10 21:58:25 -04:00
Developer	ef4262681f	Code rabbit issues fix again	2025-10-10 21:54:04 -04:00
Cole Medin	77e9342c27	Updating title exxtraction for llms.txt	2025-10-10 18:16:03 -05:00
Cole Medin	4a9ed51cff	Adjusting table creation order in complete_setup.sql	2025-10-10 17:55:30 -05:00
Cole Medin	571e7c18c4	Correcting migrations in complete_setup.sql	2025-10-10 17:52:14 -05:00
Cole Medin	710909eecd	Fixing up migration order	2025-10-10 17:50:41 -05:00
Developer	913f47ba62	code rabbit feedback	2025-10-10 18:40:25 -04:00
Developer	20c57acb00	Code rabbit feedback	2025-10-10 18:30:12 -04:00
DIY Smart Code	3168c8b69f	fix: Set explicit PLAYWRIGHT_BROWSERS_PATH to fix browser installation (#765 ) * fix: Set explicit PLAYWRIGHT_BROWSERS_PATH to fix browser installation Fixes Playwright browser not found error during web crawling. The issue was introduced in the uv migration (`9f22659`) where the browser installation path was not explicitly set as a persistent environment variable. Changes: - Add ENV PLAYWRIGHT_BROWSERS_PATH=/ms-playwright - Add --with-deps flag to playwright install command - Add comprehensive root cause analysis document Without this fix, Playwright installed browsers to a default location at build time but couldn't find them at runtime, causing crawling operations to fail with "Executable doesn't exist" errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: Remove --with-deps flag to prevent build conflicts The --with-deps flag was causing build failures on some systems because: - We already manually install all Playwright dependencies (lines 26-49) - --with-deps attempts to reinstall these packages - This causes package conflicts and build failures on Windows/WSL The core fix (ENV PLAYWRIGHT_BROWSERS_PATH) remains the same. * Delete PLAYWRIGHT_FIX_ANALYSIS.md --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Cole Medin <cole@dynamous.ai>	2025-10-10 17:11:52 -05:00
sean-eskerium	7c3823e08f	Fixes: crawl code storage issue with <think> tags for ollama models. (#775 ) * Fixes: crawl code storage issue with <think> tags for ollama models. * updates from code rabbit review	2025-10-10 17:09:53 -05:00
Developer	8ff39fa1d5	Merge branch 'main' into refactor/projects-ui Merged in PR #776 (refactor/knowledge-ui) from main. No conflicts - different features.	2025-10-10 17:08:05 -04:00
sean-eskerium	94e28f85fd	Merge pull request #776 from coleam00/refactor/knowledge-ui Refactoring the UI for consistent styling	2025-10-10 17:03:05 -04:00
sean-eskerium	e22c6c3836	fix code rabbit suggestions.	2025-10-10 14:42:01 -04:00
sean-eskerium	a860b27848	Refactor the UI and add Documents back.	2025-10-10 14:24:09 -04:00
sean-eskerium	691adccc12	Refactoring the UI for consistent styling	2025-10-10 03:36:35 -04:00
sean-eskerium	4ad1fb0808	Merge pull request #772 from coleam00/feature/ui-style-guide Feature/UI style guide	2025-10-09 21:21:17 -04:00
sean-eskerium	88cb8d7f03	Update archon-ui-main/src/features/style-guide/layouts/ProjectsLayoutExample.tsx Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-10-09 21:17:00 -04:00
sean-eskerium	f0030699a8	Update archon-ui-main/src/features/style-guide/layouts/ProjectsLayoutExample.tsx Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-10-09 21:15:14 -04:00
sean-eskerium	59f4568fda	another round of code rabbit feedback	2025-10-09 21:05:12 -04:00
sean-eskerium	0013336ee3	Merge branch 'main' of https://github.com/coleam00/Archon into feature/ui-style-guide	2025-10-09 20:42:08 -04:00
sean-eskerium	ad82f6e9f6	Another round of Coderabbit feedback.	2025-10-09 20:40:47 -04:00
Cole Medin	bfd0a84f64	RAG Enhancements (Page Level Retrieval) (#767 ) * Initial commit for RAG by document * Phase 2 * Adding migrations * Fixing page IDs for chunk metadata * Fixing unit tests, adding tool to list pages for source * Fixing page storage upsert issues * Max file length for retrieval * Fixing title issue * Fixing tests	2025-10-09 19:39:27 -05:00

1 2 3 4 5 ...

303 Commits