Commit Graph

233 Commits

Author SHA1 Message Date
leex279
277ac35e0d fix: Improve dialog scrolling and add debug logging for config persistence
- Fix scrolling in Edit Crawler Configuration dialog when content is expanded
- Remove negative margins and simplify padding for better scroll behavior
- Add console logging to debug crawl_config persistence issues
- Ensure save button is always visible with proper scrollbar
2025-09-22 20:46:05 +02:00
leex279
45c394beb6 fix: Proper deletion and persistence for crawler configuration updates
- Fix deletion of existing documents before recrawl with new configuration
- Use SourceManagementService.delete_source() instead of non-existent delete_item()
- Ensure all crawl configuration fields are persisted to metadata
- Store knowledge_type, max_depth, tags, and original_url in metadata
- Add proper logging for successful deletions
- Ensure crawl_config is persisted and reloaded correctly
2025-09-22 20:41:38 +02:00
leex279
487c83d57c fix: Keyboard event propagation and auto-expand advanced config
- Add onKeyDown stopPropagation to prevent Enter key from opening document browser
- Auto-expand Advanced Configuration when existing config is present
- Fix issue where pressing Enter in tag input would trigger card action
- Improve UX by showing loaded configuration immediately when editing
2025-09-22 20:35:20 +02:00
leex279
b76c7cf38b fix: Event propagation and data loading in Edit Crawler Configuration
- Add stopPropagation wrapper to prevent dialog clicks from bubbling to card
- Include crawl configuration fields at top level of knowledge item response
- Ensure max_depth, tags, and crawl_config are accessible for edit dialog
- Fix issue where clicking inside edit dialog would open document browser
2025-09-22 20:24:28 +02:00
leex279
4876cc977c fix: Use original_url for Edit Crawler Configuration
- Use metadata.original_url when available to show the actual crawled URL
- Falls back to item.url if original_url is not present
- Ensures the edit dialog shows the same URL that was originally crawled
2025-09-22 20:16:46 +02:00
leex279
3ed3dc6126 fix: Add missing GET endpoint for single knowledge item
- Add GET /knowledge-items/{source_id} endpoint to fetch single item
- Fixes 'Method Not Allowed' error in Edit Crawler Configuration dialog
- Returns full item data including metadata for configuration editing
2025-09-22 20:13:27 +02:00
leex279
a7768c29d8 fix: Improve Edit Crawler Configuration dialog layout and data loading
- Increase dialog max height from 85vh to 90vh for better content visibility
- Improve scrolling behavior with proper padding adjustments
- Fix data loading to properly query item only when dialog is open
- Add comprehensive fallback checks for max_depth, tags, and crawl_config
- Add error handling for failed configuration loads
- Ensure arrays are properly validated before assignment
- Remove debug console.log statements
2025-09-22 20:11:13 +02:00
leex279
1053d5cba6 fix: Replace asterisk characters in JSDoc to prevent TypeScript parsing errors
- Replace * with [star] in all JSDoc examples to avoid parser conflicts
- Maintain documentation clarity while ensuring proper compilation
- Fix issue across all pattern examples in CrawlConfig documentation
2025-09-22 19:59:56 +02:00
leex279
42913856f6 fix: Resolve TypeScript parsing error in JSDoc comments
- Remove code blocks from JSDoc to prevent parsing conflicts
- Simplify example format to avoid asterisk/slash interpretation issues
- Maintain documentation clarity while ensuring TypeScript compilation
2025-09-22 16:28:53 +02:00
leex279
ec01e915ea fix: Address CodeRabbit review comments
- Wire crawl_config from request to service instance for domain filtering
- Add domain filtering to BatchCrawlStrategy to ensure all crawls respect filters
- Replace 'any' types with proper CrawlConfig TypeScript types
- Add memoization for filtered chunks in DocumentBrowser for performance
- Improve domain normalization (strip leading www only, handle ports)
- Use Pydantic model validation directly in API endpoints

These fixes ensure domain filtering works consistently across all crawl strategies
and improves type safety and performance throughout the codebase.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-22 14:18:29 +02:00
leex279
2dbf6c32b9 fix: Ensure Edit Crawler Configuration loads initial data properly
Fixed bug where edit configuration dialog wasn't showing the existing crawl config:

1. **Enhanced data loading in EditCrawlConfigDialog**:
   - Check for data at both top-level and metadata level
   - Properly handle max_depth, tags, and crawl_config fields
   - Ensure crawl_config has the right shape with all arrays initialized
   - Reset form state when dialog opens without data

2. **Improved type definitions**:
   - Added crawl_config and max_depth to KnowledgeItemMetadata interface
   - Added optional fields to KnowledgeItem for top-level storage
   - Added index signature to allow additional untyped fields from backend

3. **Better state management**:
   - Reset form when dialog opens to prevent stale data
   - Only load data when both item exists and dialog is open
   - Initialize empty arrays for all crawl_config fields

This ensures that when editing an existing knowledge item:
- All original crawl settings are properly loaded
- Advanced configuration shows the domain filters and patterns
- Form is pre-filled with the exact configuration used for initial crawl
2025-09-22 13:58:13 +02:00
leex279
dbe5855d84 docs: Add comprehensive JSDoc documentation for CrawlConfig
Added detailed documentation for CrawlConfig interface including:

## Documentation improvements:
- Clear precedence rules (excluded_domains > allowed_domains > exclude_patterns > include_patterns)
- Pattern syntax explanation (glob patterns with fnmatch for URLs, wildcards for domains)
- Comprehensive examples showing common use cases:
  - Single subdomain with path exclusions
  - Multiple subdomains with specific exclusions
  - File type and directory blocking
- Individual property documentation with examples

## Code improvements:
- Refactored DocumentBrowser to avoid repeated URL/domain computation
- Extract resolvedUrl and resolvedDomain once as constants
- Improved readability and performance

This documentation helps developers understand:
- How conflicting rules are resolved (blacklist always wins)
- What pattern syntax to use (glob patterns)
- How to compose allow/deny lists effectively
2025-09-22 13:52:15 +02:00
leex279
7ea4d99a27 fix: Improve domain filter robustness in crawling service
Backend fixes for crawling stability:

- Add comment clarifying DomainFilter doesn't need init params
- Improve base URL selection in recursive strategy:
  - Check start_urls length before indexing
  - Use appropriate base URL for domain checks
  - Fallback to original_url when start_urls is empty
- Add error handling for domain filter:
  - Wrap is_url_allowed in try/except block
  - Log exceptions and conservatively skip URLs on error
  - Prevents domain filter exceptions from crashing crawler
- Better handling of relative URL resolution

These changes ensure more robust crawling especially when:
- start_urls array is empty
- Domain filter encounters unexpected URLs
- Relative links need proper base URL resolution
2025-09-22 13:44:10 +02:00
leex279
476e15ab67 fix: Correct ActiveOperationsResponse handling in useCrawlUrlV2
- Fixed missing count and timestamp fields in optimistic updates
- Preserve all ActiveOperationsResponse fields when updating progress IDs
- Fixed incorrect field comparison (source_id vs id) when replacing temp IDs
- Added query invalidation for progress queries in v2 implementation
- Ensures proper data shape consistency with backend API

These fixes ensure that:
1. ActiveOperationsResponse always has required count/timestamp fields
2. Optimistic entities are correctly matched and updated with real IDs
3. Progress queries are properly refreshed after crawl starts
2025-09-22 13:42:39 +02:00
leex279
9138fefaf8 fix: Remove View Source links from document sidebar
- Removed redundant View Source links from InspectorSidebar
- Main content area already contains source links
- Cleaned up unused ExternalLink import
2025-09-22 13:40:55 +02:00
leex279
1bc2d64c22 feat: Add comprehensive edit crawler configuration and metadata viewing
## New Features

### Edit Crawler Configuration
- Added `EditCrawlConfigDialog` component for editing existing crawler settings
- Users can now modify URL, knowledge type, max depth, tags, and advanced crawl config
- Added "Edit Configuration" menu item to knowledge card actions
- Shows warning that recrawl will be triggered when saving changes
- Created `useUpdateCrawlConfig` hook with optimistic updates
- Added backend API endpoint `/api/knowledge-items/{source_id}/update-config`

### Enhanced Metadata Viewing
- Fixed metadata display to show ALL properties from backend (not just 3-5)
- Increased metadata panel height from 256px to 500px for better viewing
- Added proper scrolling with visible scrollbar for large metadata objects
- Now displays complete JSON including properties like:
  - `url`, `source`, `headers`, `filename`
  - `has_code`, `has_links`, `source_id`
  - `char_count`, `word_count`, `line_count`
  - `chunk_size`, `chunk_index`
  - `source_type`, `knowledge_type`
  - All other backend metadata fields

### Domain Filtering Improvements
- Converted domain filter from pills to dropdown with "All" option
- Added domain statistics showing document count per domain
- Improved InspectorSidebar with better domain filtering UX

## Technical Implementation

### Frontend Components
- `EditCrawlConfigDialog.tsx` - Edit configuration dialog with warning
- Enhanced `KnowledgeCardActions.tsx` - Added edit configuration menu item
- Updated `KnowledgeCard.tsx` - Integrated edit dialog
- Improved `ContentViewer.tsx` - Better metadata display with full JSON
- Fixed `KnowledgeInspector.tsx` - Pass complete metadata instead of filtered subset

### React Hooks & Services
- `useUpdateCrawlConfig()` - Mutation hook for updating crawler config
- `knowledgeService.updateCrawlConfig()` - API service method
- Optimistic updates with progress tracking

### Backend API
- POST `/api/knowledge-items/{source_id}/update-config` - Update crawler configuration
- Validates existing item, deletes old data, triggers new crawl with updated config
- Returns progress ID for tracking recrawl operation

## User Experience
- Users can now edit any existing crawler configuration
- Clear warning about recrawl being triggered
- Complete metadata viewing for debugging and analysis
- Improved domain filtering in document browser
- Progress tracking for configuration updates

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-22 12:53:42 +02:00
leex279
a4848dce8a fix: Move metadata panel to bottom and improve visibility
- Move metadata panel from top to bottom of content viewer
- Place scroll on outer container to show more metadata (max-h-64)
- Keep metadata section always accessible at the bottom
- Maintain clear visual separation with border-top
- Ensure better visibility of all metadata properties

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-22 11:15:04 +02:00
leex279
cc3c176a32 feat: Make metadata panel always visible as collapsible section
- Move metadata section to be always visible below header
- Add full-width collapsible button with hover effect
- Show property count in metadata header
- Add max-height and scroll to metadata content
- Separate content area from metadata for better visibility
- Improve UI with clear visual hierarchy

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-22 11:11:22 +02:00
leex279
c8ba28ee41 refactor: Move metadata display to content viewer area
- Move metadata display from sidebar list to content viewer where chunk content is shown
- Add collapsible metadata section in content viewer with better formatting
- Remove metadata button and panel from sidebar to reduce clutter
- Keep only 'View Source' link in sidebar for quick access
- Show domain information in content footer for better context
- Improve overall UI consistency and user experience

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-22 11:05:44 +02:00
leex279
8344ee0ebc fix: Resolve button nesting error and improve metadata display
- Fix button nesting error by using div with onClick instead of nested buttons
- Move metadata panel outside the clickable item area
- Add close button for metadata panel
- Ensure metadata displays within the UI (not opening in new tab)
- Fix type comparison for showMetadata state
- Improve layout with better spacing and scrollable metadata

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-22 11:00:07 +02:00
leex279
a9f54f2f64 feat: Improve domain filtering UI and add metadata viewer
- Change domain filter from pills to dropdown with 'All' option as default
- Add 'View Source' link for each document chunk
- Add 'View Metadata' button to view chunk metadata in expandable panel
- Improve UI consistency with smaller action buttons

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-22 09:39:50 +02:00
leex279
cc46b3422c feat: Add advanced web crawling with domain filtering
- Implement domain filtering for web crawler with whitelist/blacklist support
- Add URL pattern matching (glob-style) for include/exclude patterns
- Create AdvancedCrawlConfig UI component with collapsible panel
- Add domain filter to Knowledge Inspector sidebar for easy filtering
- Implement crawl-v2 API endpoint with backward compatibility
- Add comprehensive unit tests for domain filtering logic

Implements priority-based filtering:
1. Blacklist (excluded_domains) - highest priority
2. Whitelist (allowed_domains) - must match if provided
3. Exclude patterns - glob patterns to exclude
4. Include patterns - glob patterns to include

UI improvements:
- Advanced configuration section in Add Knowledge dialog
- Domain pills in Inspector sidebar showing document distribution
- Visual domain indicators on each document
- Responsive domain filtering with document counts

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-22 09:33:08 +02:00
sean-eskerium
4c910c1471 Merge pull request #721 from coleam00/fix/ollama-default-docker-address
fix: Change Ollama default URL to host.docker.internal for Docker compatibility
2025-09-20 15:17:09 -07:00
John Fitzpatrick
aaca437fdc fix: Update remaining localhost placeholder in OllamaConfigurationPanel
Missed updating the placeholder text for new instance URL input field.
Changed from localhost:11434 to host.docker.internal:11434 for consistency.
2025-09-20 13:51:24 -07:00
John Fitzpatrick
2f486e5b21 test: Update test expectations for new Ollama default URL
Updated test_async_llm_provider_service.py to expect host.docker.internal
instead of localhost for Ollama URLs to match the new default configuration.
2025-09-20 13:44:23 -07:00
John Fitzpatrick
d4e80a945a fix: Change Ollama default URL to host.docker.internal for Docker compatibility
- Changed default Ollama URL from localhost:11434 to host.docker.internal:11434
- This allows Docker containers to connect to Ollama running on the host machine
- Updated in backend services, frontend components, migration scripts, and documentation
- Most users run Archon in Docker but Ollama as a local binary, making this a better default
2025-09-20 13:36:33 -07:00
Cole Medin
035f90e721 Codex mcp instructions (#719)
* Add Codex MCP configuration instructions

- Added Codex as a supported IDE in the MCP configuration UI
- Removed Augment (duplicate of Cursor configuration)
- Positioned Codex between Gemini and Cursor in the tab order
- Added platform-specific configuration support for Windows vs Linux/macOS
- Includes step-by-step instructions for installing mcp-remote and configuring Codex
- Shows appropriate TOML configuration based on detected platform

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Finalizing Codex instructions

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-20 14:51:07 -05:00
Cole Medin
e9c08d2fe9 Updating RAG SIMILARITY_THRESHOLD to 0.05 2025-09-20 13:57:13 -05:00
Cole Medin
b1085a53df Removing junk from sitemap and full site (recursive) crawls (#711)
* Removing junk from sitemap and full site (recursive) crawls

* Small typo fix for result.markdown
2025-09-20 12:58:42 -05:00
Cole Medin
c3be65322b Improved MCP and global rules instructions (#705) 2025-09-20 12:58:20 -05:00
Wirasm
37994191fc refactor: Phase 5 - Remove manual cache invalidations (#707)
* chore, cleanup leftovers of tanstack refactoring

* refactor: Complete Phase 5 - Remove manual cache invalidations

- Removed all manual cache invalidations from knowledge queries
- Updated task queries to rely on backend consistency
- Fixed optimistic update utilities to handle edge cases
- Cleaned up unused imports and test utilities
- Fixed minor TypeScript issues in UI components

Backend now ensures data consistency through proper transaction handling,
eliminating the need for frontend cache coordination.


* docs: Enhance TODO comment for knowledge optimistic update issue

- Added comprehensive explanation of the query key mismatch issue
- Documented current behavior and impact on user experience
- Listed potential solutions with tradeoffs
- Created detailed PRP story in PRPs/local/ for future implementation
- References specific line numbers and implementation details

This documents a known limitation where optimistic updates to knowledge
items are invisible because mutations update the wrong query cache.
2025-09-19 14:26:05 +03:00
Wirasm
1b272ed2af docs: Update AI documentation to accurately reflect current codebase (#708)
* docs: Update AI documentation for accurate codebase reflection

- Replace obsolete POLLING_ARCHITECTURE.md with DATA_FETCHING_ARCHITECTURE.md
- Rewrite API_NAMING_CONVENTIONS.md with file references instead of code examples
- Condense ARCHITECTURE.md from 482 to 195 lines for clarity
- Update ETAG_IMPLEMENTATION.md to reflect actual implementation
- Update QUERY_PATTERNS.md to reflect completed Phase 5 (nanoid optimistic updates)
- Add PRPs/stories/ to .gitignore

All documentation now references actual files in codebase rather than
embedding potentially stale code examples.


* docs: Update CLAUDE.md and AGENTS.md with current patterns

- Update CLAUDE.md to reference documentation files instead of embedding code
- Replace Service Layer and Error Handling code examples with file references
- Add proper distinction between DATA_FETCHING_ARCHITECTURE and QUERY_PATTERNS docs
- Include ETag implementation reference
- Update environment variables section with .env.example reference


* docs: apply PR review improvements to AI documentation

- Fix punctuation, hyphenation, and grammar issues across all docs
- Add language tags to directory tree code blocks for proper markdown linting
- Clarify TanStack Query integration (not replacing polling, but integrating it)
- Add Cache-Control header documentation and browser vs non-browser fetch behavior
- Reference actual implementation files for polling intervals instead of hardcoding values
- Improve type-safety phrasing and remove line numbers from file references
- Clarify Phase 1 removed manual frontend ETag cache (backend ETags remain)
2025-09-19 13:29:46 +03:00
Wirasm
0502d378f0 refactor: Phase 4 - Configure centralized request deduplication (#700)
* refactor: Phase 4 - Configure centralized request deduplication

Implement centralized QueryClient configuration with domain-specific settings,
consistent retry logic, and optimized caching behavior.

Key changes:
- Create centralized queryClient.ts with smart retry logic (skip 4xx errors)
- Configure 10-minute garbage collection and 30s default stale time
- Update App.tsx to import shared queryClient instance
- Replace all hardcoded staleTime values with STALE_TIMES constants
- Add test-specific QueryClient factory for consistent test behavior
- Enable structural sharing for optimized React re-renders

Benefits:
- ~40-50% reduction in API calls through proper deduplication
- Smart retry logic avoids pointless retries on client errors
- Consistent caching behavior across entire application
- Single source of truth for cache configuration

All 89 tests passing. TypeScript compilation clean. Verified with React Query DevTools.

Co-Authored-By: Claude <noreply@anthropic.com>

* added proper stale time for project task count

* improve: Unified retry logic and task query enhancements

- Unified retry logic: Extract robust status detection for APIServiceError, fetch, and axios patterns
- Security: Fix sensitive data logging in task mutations (prevent title/description leakage)
- Real-time collaboration: Add smart polling to task counts for AI agent synchronization
- Type safety: Add explicit TypeScript generics for better mutation inference
- Inspector pagination: Fix fetchNextPage return type to match TanStack Query Promise signature
- Remove unused DISABLED_QUERY_OPTIONS export per KISS principles

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Correct useSmartPolling background interval logic

Fix critical polling inversion where background polling was faster than foreground.

- Background now uses Math.max(baseInterval * 1.5, 5000) instead of hardcoded 5000ms
- Ensures background is always slower than foreground across all base intervals
- Fixes task counts polling (10s→15s background) and other affected hooks
- Updates comprehensive test suite with edge case coverage
- No breaking changes - all consumers automatically benefit

Resolves CodeRabbit issue where useSmartPolling(10_000) caused 5s background < 10s foreground.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-18 22:46:11 +03:00
DIY Smart Code
6abb8831f7 fix: enable code examples extraction for manual file uploads (#626)
* fix: enable code examples extraction for manual file uploads

- Add extract_code_examples parameter to upload API endpoint (default: true)
- Integrate CodeExtractionService into DocumentStorageService.upload_document()
- Add code extraction after document storage with progress tracking
- Map code extraction progress to 85-95% range in upload progress
- Include code_examples_stored in upload results and logging
- Support extract_code_examples in batch document upload via store_documents()
- Handle code extraction errors gracefully without failing upload

Fixes issue where code examples were only extracted for URL crawls
but not for manual file uploads, despite using the same underlying
CodeExtractionService that supports both HTML and text formats.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Fix code extraction for uploaded markdown files

- Provide file content in both html and markdown fields for crawl_results
- This ensures markdown files (.md) use the correct text file extraction path
- The CodeExtractionService checks html_content first for text files
- Fixes issue where uploaded .md files didn't extract code examples properly

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* debug: Add comprehensive logging to trace code extraction issue

- Add detailed debug logging to upload code extraction flow
- Log extract_code_examples parameter value
- Log crawl_results structure and content length
- Log progress callbacks from extraction service
- Log final extraction count with more context
- Enhanced error logging with full stack traces

This will help identify exactly where the extraction is failing for uploaded files.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Remove invalid start_progress/end_progress parameters

The extract_and_store_code_examples method doesn't accept start_progress
and end_progress parameters, causing TypeError during file uploads.

This was the root cause preventing code extraction from working - the
method was failing with a signature mismatch before any extraction logic
could run.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Preserve code blocks across PDF page boundaries

PDF extraction was breaking markdown code blocks by inserting page separators:

```python
def hello():
--- Page 2 ---
    return "world"
```

This made code blocks unrecognizable to extraction patterns.

Solution:
- Add _preserve_code_blocks_across_pages() function
- Detect split code blocks using regex pattern matching
- Remove page separators that appear within code blocks
- Apply to both pdfplumber and PyPDF2 extraction paths

Now PDF uploads should properly extract code examples just like markdown files.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Add PDF-specific code extraction for files without markdown delimiters

Root cause: PDFs lose markdown code block delimiters (``` ) during text extraction,
making standard markdown patterns fail to detect code.

Solution:
1. Add _extract_pdf_code_blocks() method with plain-text code detection patterns:
   - Python import blocks and function definitions
   - YAML configuration blocks
   - Shell command sequences
   - Multi-line indented code blocks

2. Add PDF detection logic in _extract_code_blocks_from_documents()
3. Set content_type properly for PDF files in storage service
4. Add debug logging to PDF text extraction process

This allows extraction of code from PDFs that contain technical documentation
with code examples, even when markdown formatting is lost during PDF->text conversion.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Enhanced PDF code extraction to match markdown extraction results

Problem: PDF extraction only found 1 code example vs 9 from same markdown content
Root cause: PDF extraction patterns were too restrictive and specific

Enhanced solution:
1. **Multi-line code block detection**: Scans for consecutive "code-like" lines
   - Variable assignments, imports, function calls, method calls
   - Includes comments, control flow, YAML keys, shell commands
   - Handles indented continuation lines and empty lines within blocks

2. **Smarter block boundary detection**:
   - Excludes prose lines with narrative indicators
   - Allows natural code block boundaries
   - Preserves context around extracted blocks

3. **Comprehensive pattern coverage**:
   - Python scripts and functions
   - YAML configuration blocks
   - Shell command sequences
   - JavaScript functions

This approach should extract the same ~9 code examples from PDFs as from
markdown files, since it detects code patterns without relying on markdown delimiters.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Simplify PDF extraction to section-based approach

Changed from complex line-by-line analysis to simpler section-based approach:

1. Split PDF content by natural boundaries (paragraphs, page breaks)
2. Score each section for code vs prose indicators
3. Extract sections that score high on code indicators
4. Add comprehensive logging to debug section classification

Code indicators include:
- Python imports, functions, classes (high weight)
- Variable assignments, method calls (medium weight)
- Package management commands, lambda functions

This should better match the 9 code examples found in markdown version
by treating each logical code segment as a separate extractable block.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Add explicit HTML file detection and extraction path

Problem: HTML files (0 code examples extracted) weren't being routed to HTML extraction

Root cause: HTML files (.html, .htm) weren't explicitly detected, so they fell through
to generic extraction logic instead of using the robust HTML code block patterns.

Solution:
1. Add HTML file detection: is_html_file = source_url.endswith(('.html', '.htm'))
2. Add explicit HTML extraction path before fallback logic
3. Set proper content_type: "text/html" for HTML files in storage service
4. Ensure HTML content is passed to _extract_html_code_blocks method

The existing HTML extraction already has comprehensive patterns for:
- <pre><code class="lang-python"> (syntax highlighted)
- <pre><code> (standard)
- Various code highlighting libraries (Prism, highlight.js, etc.)

This should now extract all code blocks from HTML files just like URL crawls do.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Add HTML tag cleanup and proper code extraction for HTML files

Problem: HTML uploads had 0 code examples and contained HTML tags in RAG chunks

Solution:
1. **HTML Tag Cleanup**: Added _clean_html_to_text() function that:
   - Preserves code blocks by temporarily replacing them with placeholders
   - Removes all HTML tags, scripts, styles from prose content
   - Converts HTML structure (headers, paragraphs, lists) to clean text
   - Restores code blocks as markdown format (```language)
   - Cleans HTML entities (&lt;, &gt;, etc.)

2. **Unified Text Processing**: HTML files now processed as text files since they:
   - Have clean text for RAG chunking (no HTML tags)
   - Have markdown-style code blocks for extraction
   - Use existing text file extraction path

3. **Content Type Mapping**: Set text/markdown for cleaned HTML files

Result: HTML files now extract code examples like markdown files while providing
clean text for RAG without HTML markup pollution.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: Add HTML file support to upload dialog

- Add .html and .htm to accepted file types in AddKnowledgeDialog
- Users can now see and select HTML files in the file picker by default
- HTML files will be processed with tag cleanup and code extraction

Previously HTML files had to be manually typed or dragged, now they appear
in the standard file picker alongside other supported formats.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Prevent HTML extraction path confusion in crawl_results payload

Problem: Setting both 'markdown' and 'html' fields to same content could trigger
HTML extraction regexes when we want text/markdown extraction.

Solution:
- markdown: Contains cleaned plaintext/markdown content
- html: Empty string to prevent HTML extraction path
- content_type: Proper type (application/pdf, text/markdown, text/plain)

This ensures HTML files (now cleaned to markdown format) use the text file
extraction path with backtick patterns, not HTML regex patterns.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-18 20:06:48 +03:00
John C Fitzpatrick
85bd6bc012 Fix multi-dimensional vector hybrid search functions (#681)
Fixes critical bug where hybrid search functions referenced non-existent
cp.embedding and ce.embedding columns instead of dimension-specific columns.

Changes:
- Add new multi-dimensional hybrid search functions with dynamic column selection
- Maintain backward compatibility with existing legacy functions
- Support all embedding dimensions: 384, 768, 1024, 1536, 3072
- Proper error handling for unsupported dimensions

Resolves: #675 - RAG queries now work with multi-dimensional embeddings

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-18 10:06:10 -07:00
John C Fitzpatrick
9ffca825ff feat: Universal clipboard utility with improved copy functionality (#663)
* feat: Add universal clipboard utility with enhanced copy functionality

- Add comprehensive clipboard utility (src/utils/clipboard.ts) with:
  - Modern Clipboard API with automatic fallback to document.execCommand
  - Cross-browser compatibility and security context handling
  - Detailed error reporting and debugging capabilities
  - Support for secure (HTTPS) and insecure (HTTP/localhost) contexts

- Update components to use new clipboard utility:
  - BugReportModal: Enhanced copy functionality with error handling
  - CodeViewerModal: Improved copy-to-clipboard for code snippets
  - IDEGlobalRules: Robust clipboard operations for rule copying
  - McpConfigSection: Enhanced config and command copying
  - DocumentCard: Reliable ID copying functionality
  - KnowledgeInspector: Improved content copying
  - ButtonPlayground: Enhanced CSS style copying

- Benefits:
  - Consistent copy behavior across all browser environments
  - Better error handling and user feedback
  - Improved accessibility and security context support
  - Enhanced debugging capabilities

Fixes #662

* fix: Improve clipboard utility robustness and add missing host configuration

Clipboard utility improvements:
- Prevent textarea element leak in clipboard fallback with proper cleanup
- Add SSR compatibility with typeof guards for navigator/document
- Use finally block to ensure cleanup in all error cases

Host configuration fixes:
- Update MCP API to use ARCHON_HOST environment variable instead of hardcoded localhost
- Add ARCHON_HOST to docker-compose environment variables
- Ensures MCP configuration shows correct hostname in different deployment environments

Addresses CodeRabbit feedback and restores missing host functionality

* fix: Use relative URLs for Vite proxy in development

- Update getApiUrl() to return empty string when VITE_API_URL is unset
- Ensures all API requests use relative paths (/api/...) in development
- Prevents bypassing Vite proxy with absolute URLs (host:port)
- Maintains existing functionality for explicit VITE_API_URL configuration
- Fix TypeScript error by using bracket notation for environment access

Addresses CodeRabbit feedback about dev setup relying on Vite proxy

* fix: Resolve TypeScript error in API configuration

Use proper type assertion to access VITE_API_URL environment variable

* Address PR review comments: Move clipboard utility to features architecture

- Move clipboard.ts from src/utils/ to src/features/shared/utils/
- Remove copyTextToClipboard backward compatibility function (dead code)
- Update all import statements to use new file location
- Maintain full clipboard functionality with modern API and fallbacks

Addresses:
- Review comment r2348420743: Move to new architecture location
- Review comment r2348422625: Remove unused backward compatibility function

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix SSR safety issue in clipboard utility

- Add typeof navigator !== 'undefined' guard before accessing navigator.clipboard
- Add typeof document !== 'undefined' guard before using document.execCommand fallback
- Ensure proper error handling when running in server-side environment
- Maintain existing functionality while preventing ReferenceError during SSR/prerender

Addresses CodeRabbit feedback: Navigator access needs SSR-safe guards

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-18 10:04:46 -07:00
Wirasm
89fa9b4b49 Include description in tasks polling ETag (#698)
* Include description in tasks polling ETag

* Align tasks endpoint headers with HTTP cache expectations
2025-09-18 15:18:53 +03:00
Wirasm
31cf56a685 feat: Phase 3 - Fix optimistic updates with stable UUIDs and visual indicators (#695)
* feat: Phase 3 - Fix optimistic updates with stable UUIDs and visual indicators

- Replace timestamp-based temp IDs with stable nanoid UUIDs
- Create shared optimistic utilities module with type-safe functions
- Add visual indicators (OptimisticIndicator component) for pending items
- Update all mutation hooks (tasks, projects, knowledge) to use new utilities
- Add optimistic state styling to TaskCard, ProjectCard, and KnowledgeCard
- Add comprehensive unit tests for optimistic utilities
- All tests passing, validation complete

* docs: Update optimistic updates documentation with Phase 3 patterns

- Remove outdated optimistic_updates.md
- Create new concise documentation with file references
- Document shared utilities API and patterns
- Include performance characteristics and best practices
- Reference actual implementation files instead of code examples
- Add testing checklist and migration notes

* fix: resolve CodeRabbit review issues for Phase 3 optimistic updates

Address systematic review feedback on optimistic updates implementation:

**Knowledge Queries (useKnowledgeQueries.ts):**
- Add missing createOptimisticEntity import for type-safe optimistic creation
- Implement filter-aware cache updates for crawl/upload flows to prevent items appearing in wrong filtered views
- Fix total count calculation in deletion to accurately reflect removed items
- Replace manual optimistic item creation with createOptimisticEntity<KnowledgeItem>()

**Project Queries (useProjectQueries.ts):**
- Add proper TypeScript mutation typing with Awaited<ReturnType<>>
- Ensure type safety for createProject mutation response handling

**OptimisticIndicator Component:**
- Fix React.ComponentType import to use direct import instead of namespace
- Add proper TypeScript ComponentType import for HOC function
- Apply consistent Biome formatting

**Documentation:**
- Update performance characteristics with accurate bundlephobia metrics
- Improve nanoid benchmark references and memory usage details

All unit tests passing (90/90). Integration test failures expected without backend.

Co-Authored-By: CodeRabbit Review <noreply@coderabbit.ai>

* Adjust polling interval and clean knowledge cache

---------

Co-authored-by: CodeRabbit Review <noreply@coderabbit.ai>
2025-09-18 13:24:48 +03:00
Wirasm
f4ad785439 refactor: Phase 2 Query Keys Standardization - Complete TanStack Query v5 patterns implementation (#692)
* refactor: complete Phase 2 Query Keys Standardization

Standardize query keys across all features following vertical slice architecture,
ensuring they mirror backend API structure exactly with no backward compatibility.

Key Changes:
- Refactor all query key factories to follow consistent patterns
- Move progress feature from knowledge/progress to top-level /features/progress
- Create shared query patterns for consistency (DISABLED_QUERY_KEY, STALE_TIMES)
- Remove all hardcoded stale times and disabled keys
- Update all imports after progress feature relocation

Query Key Factories Standardized:
- projectKeys: removed task-related keys (tasks, taskCounts)
- taskKeys: added dual nature support (global via lists(), project-scoped via byProject())
- knowledgeKeys: removed redundant methods (details, summary)
- progressKeys: new top-level feature with consistent factory
- documentKeys: full factory pattern with versions support
- mcpKeys: complete with health endpoint

Shared Patterns Implementation:
- STALE_TIMES: instant (0), realtime (3s), frequent (5s), normal (30s), rare (5m), static (∞)
- DISABLED_QUERY_KEY: consistent disabled query pattern across all features
- Removed unused createQueryOptions helper

Testing:
- Added comprehensive tests for progress hooks
- Updated all test mocks to include new STALE_TIMES values
- All 81 feature tests passing

Documentation:
- Created QUERY_PATTERNS.md guide for future implementations
- Clear patterns, examples, and migration checklist

Breaking Changes:
- Progress imports moved from knowledge/progress to progress
- Query key structure changes (cache will reset)
- No backward compatibility maintained

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: establish single source of truth for tags in metadata

- Remove ambiguous top-level tags field from KnowledgeItem interface
- Update all UI components to use metadata.tags exclusively
- Fix mutations to correctly update tags in metadata object
- Remove duplicate tags field from backend KnowledgeSummaryService
- Fix test setup issue with QueryClient instance in knowledge tests
- Add TODO comments for filter-blind optimistic updates (Phase 3)

This eliminates the ambiguity identified in Phase 2 where both item.tags
and metadata.tags existed, establishing metadata.tags as the single
source of truth across the entire stack.

* fix: comprehensive progress hooks improvements

- Integrate useSmartPolling for all polling queries
- Fix memory leaks from uncleaned timeouts
- Replace string-based error checking with status codes
- Remove TypeScript any usage with proper types
- Fix unstable dependencies with sorted JSON serialization
- Add staleTime to document queries for consistency

* feat: implement flexible assignee system for dynamic agents

- Changed assignee from restricted enum to flexible string type
- Renamed "AI IDE Agent" to "Coding Agent" for clarity
- Enhanced ComboBox with Radix UI best practices:
  - Full ARIA compliance (roles, labels, keyboard nav)
  - Performance optimizations (memoization, useCallback)
  - Improved UX (auto-scroll, keyboard shortcuts)
  - Fixed event bubbling preventing unintended modal opens
- Updated MCP server docs to reflect flexible assignee capability
- Removed unnecessary UI elements (arrows, helper text)
- Styled ComboBox to match priority selector aesthetic

This allows external MCP clients to create and assign custom sub-agents
dynamically, supporting advanced agent orchestration workflows.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: complete Phase 2 summariesPrefix usage for cache consistency

- Fix all knowledgeKeys.summaries() calls to use summariesPrefix() for operations targeting multiple summary caches
- Update cancelQueries, getQueriesData, setQueriesData, invalidateQueries, and refetchQueries calls
- Fix critical cache invalidation bug where filtered summaries weren't being cleared
- Update test expectations to match new factory patterns
- Address CodeRabbit review feedback on cache stability issues

This completes the Phase 2 Query Keys Standardization work documented in PRPs/local/frontend-state-management-refactor.md

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: update MCP task tools documentation for Coding Agent rename

Update task assignee documentation from "AI IDE Agent" to "Coding Agent"
to match frontend changes for consistency across the system.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: implement assignee filtering in MCP find_tasks function

Add missing implementation for filter_by="assignee" that was documented
but not coded. The filter now properly passes the assignee parameter to
the backend API, matching the existing pattern used for status filtering.

Fixes documentation/implementation mismatch identified by CodeRabbit.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Phase 2 cleanup - address review comments and improve code quality

Changes made:
- Reduced smart polling interval from 60s to 5s for background tabs (better responsiveness)
- Fixed cache coherence bug in knowledge queries (missing limit parameter)
- Standardized "Coding Agent" naming (was inconsistently "AI IDE Agent")
- Improved task queries with 2s polling, type safety, and proper invalidation
- Enhanced combobox accessibility with proper ARIA attributes and IDs
- Delegated useCrawlProgressPolling to useActiveOperations (removed duplication)
- Added exact: true to progress query removals (prevents sibling removal)
- Fixed invalid Tailwind class ml-4.5 to ml-4

All changes align with Phase 2 query key standardization goals and improve
overall code quality, accessibility, and performance.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-18 11:05:03 +03:00
Wirasm
b383c8cbec refactor: remove ETag Map cache layer for TanStack Query single source of truth (#676)
* refactor: remove ETag Map cache layer for TanStack Query single source of truth

- Remove Map-based cache from apiWithEtag.ts to eliminate double-caching anti-pattern
- Move apiWithEtag.ts to shared location since used across multiple features
- Implement NotModifiedError for 304 responses to work with TanStack Query
- Remove invalidateETagCache calls from all service files
- Preserve browser ETag headers for bandwidth optimization (70-90% reduction)
- Add comprehensive test coverage (10 test cases)
- All existing functionality maintained with zero breaking changes

This addresses Phase 1 of frontend state management refactor, making TanStack Query
the sole authority for cache decisions while maintaining HTTP 304 performance benefits.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: increase API timeout to 20s for large delete operations

Temporary fix for database performance issue where DELETE operations on
crawled_pages table with 7K+ rows take 13+ seconds due to sequential scan.

Root cause analysis:
- Source '9529d5dabe8a726a' has 7,073 rows (98% of crawled_pages table)
- PostgreSQL uses sequential scan instead of index for large deletes
- Operation takes 13.4s but frontend timeout was 10s
- Results in frontend errors while backend eventually succeeds

This prevents timeout errors during knowledge item deletion until we
implement proper batch deletion or database optimization.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor: complete simplification of ETag handling (Option 3)

- Remove all explicit ETag handling code from apiWithEtag.ts
- Let browser handle ETags and 304 responses automatically
- Remove NotModifiedError class and associated retry logic
- Simplify QueryClient retry configuration in App.tsx
- Add comprehensive tests documenting browser caching behavior
- Fix missing generic type in knowledgeService searchKnowledgeBase

This completes Phase 1 of the frontend state management refactor.
TanStack Query is now the single source of truth for caching,
while browser handles HTTP cache/ETags transparently.

Benefits:
- 50+ lines of code removed
- Zero complexity for 304 handling
- Bandwidth optimization maintained (70-90% reduction)
- Data freshness guaranteed
- Perfect alignment with TanStack Query philosophy

* fix: resolve DOM nesting validation error in ProjectCard

Changed ProjectCard from motion.li to motion.div since it's already
wrapped in an li element by ProjectList. This fixes the React warning
about li elements being nested inside other li elements.

* fix: properly unwrap task mutation responses from backend

The backend returns wrapped responses for mutations:
{ message: string, task: Task }

But the frontend was expecting just the Task object, causing
description and other fields to not persist properly.

Fixed by:
- Updated createTask to unwrap response.task
- Updated updateTask to unwrap response.task
- Updated updateTaskStatus to unwrap response.task

This ensures all task data including descriptions persist correctly.

* test: add comprehensive tests for task service response unwrapping

Added 15 tests covering:
- createTask with response unwrapping
- updateTask with response unwrapping
- updateTaskStatus with response unwrapping
- deleteTask (no unwrapping needed)
- getTasksByProject (direct response)
- Error handling for all methods
- Regression tests ensuring description persistence
- Full field preservation when unwrapping responses

These tests verify that the backend's wrapped mutation responses
{ message: string, task: Task } are properly unwrapped to return
just the Task object to consumers.

* fix: add explicit event propagation stopping in ProjectCard

Added e.stopPropagation() at the ProjectCard level when passing
handlers to ProjectCardActions for pin and delete operations.

This provides defense in depth even though ProjectCardActions
already stops propagation internally. Ensures clicking action
buttons never triggers card selection.

* refactor: consolidate error handling into shared module

- Create shared/errors.ts with APIServiceError, ValidationError, MCPToolError
- Move error classes and utilities from projects/shared/api to shared location
- Update all imports to use shared error module
- Fix cross-feature dependencies (knowledge no longer depends on projects)
- Apply biome formatting to all modified files

This establishes a clean architecture where common errors are properly
located in the shared module, eliminating feature coupling.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

* test: improve test isolation and clean up assertions

- Preserve and restore global AbortSignal and fetch to prevent test pollution
- Rename test suite from "Simplified API Client (Option 3)" to "apiWithEtag"
- Optimize duplicate assertions by capturing promises once
- Use toThrowError with specific error instances for better assertions

This ensures tests don't affect each other and improves test maintainability.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor: Remove unused callAPI function and document 304 handling approach

- Delete unused callAPI function from projects/shared/api.ts (56 lines of dead code)
- Keep only the formatRelativeTime utility that's actively used
- Add comprehensive documentation explaining why we don't handle 304s explicitly
- Document that browser handles ETags/304s transparently and we use TanStack Query for cache control
- Update apiWithEtag.ts header to clarify the simplification strategy

This follows our beta principle of removing dead code immediately and maintains our simplified approach to HTTP caching where the browser handles 304s automatically.

* docs: Fix comment drift and clarify ETag/304 handling documentation

- Update header comment to be more technically accurate about Fetch API behavior
- Clarify that fetch (not browser generically) returns cached responses for 304s
- Explicitly document that we don't add If-None-Match headers
- Add note about browser's automatic ETag revalidation

These documentation updates prevent confusion about our simplified HTTP caching approach.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-17 16:45:23 +03:00
Rasmus Widing
c6696ac3d7 docs: Update TaskPriorityComponent docstring to reflect server-backed implementation 2025-09-17 13:46:39 +03:00
DIY Smart Code
c45842f0bb feat: decouple task priority from task order (#652)
* feat: decouple task priority from task order

This implements a dedicated priority system that operates independently
from the existing task_order system, allowing users to set task priority
without affecting visual drag-and-drop positioning.

## Changes Made

### Database
- Add priority column to archon_tasks table with enum type (critical, high, medium, low)
- Create database migration with safe enum handling and data backfill
- Add proper indexing for performance

### Backend
- Update UpdateTaskRequest to include priority field
- Add priority validation in TaskService with enum checking
- Include priority field in task list responses and ETag generation
- Fix cache invalidation for priority updates

### Frontend
- Update TaskPriority type from "urgent" to "critical" for consistency
- Add changePriority method to useTaskActions hook
- Update TaskCard to use direct priority field instead of task_order conversion
- Update TaskEditModal priority form to use direct priority values
- Fix TaskPriorityComponent to use correct priority enum values
- Update buildTaskUpdates to include priority field changes
- Add priority field to Task interface as required field
- Update test fixtures to include priority field

## Key Features
-  Users can change task priority without affecting drag-and-drop order
-  Users can drag tasks to reorder without changing priority level
-  Priority persists correctly in database with dedicated column
-  All existing priority functionality continues working identically
-  Cache invalidation works properly for priority changes
-  Both TaskCard priority button and TaskEditModal priority work

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: add priority column to complete_setup.sql for fresh installations

- Add task_priority enum type (low, medium, high, critical)
- Add priority column to archon_tasks table with default 'medium'
- Add index for priority column performance
- Add documentation comment for priority field

This ensures fresh installations include the priority system without
needing to run the separate migration.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: include priority field in task creation payload

When creating new tasks via TaskEditModal, the buildCreateRequest function
was not including the priority field, causing new tasks to fall back to
the database default ('medium') instead of respecting the user's selected
priority in the modal.

Added priority: localTask.priority || 'medium' to ensure the user's
chosen priority is sent to the API during task creation.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: make priority migration safe and idempotent

Replaced destructive DROP TYPE CASCADE with safe migration patterns:

- Use DO blocks with EXCEPTION handling for enum and column creation
- Prevent conflicts with complete_setup.sql for fresh installations
- Enhanced backfill logic to preserve user-modified priorities
- Only update tasks that haven't been modified (updated_at = created_at)
- Add comprehensive error handling with informative notices
- Migration can now be run multiple times safely

This ensures the migration works for both existing installations
(incremental migration) and fresh installations (complete_setup.sql)
without data loss or conflicts.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: enforce NOT NULL constraint on priority column

Data integrity improvements:

Migration (add_priority_column_to_tasks.sql):
- Add column as nullable first with DEFAULT 'medium'
- Update any NULL values to 'medium'
- Set NOT NULL constraint to enforce application invariants
- Safe handling for existing columns with proper constraint checking

Complete Setup (complete_setup.sql):
- Priority column now DEFAULT 'medium' NOT NULL for fresh installations
- Ensures consistency between migration and fresh install paths

Both paths now enforce priority field as required, matching the
frontend Task interface where priority is a required field.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: add priority support to task creation API

Complete priority support for task creation:

API Routes (projects_api.py):
- Add priority field to CreateTaskRequest Pydantic model
- Pass request.priority to TaskService.create_task call

Task Service (task_service.py):
- Add priority parameter to create_task method signature
- Add priority validation using existing validate_priority method
- Include priority field in database INSERT task_data
- Include priority field in API response task object

This ensures that new tasks created via TaskEditModal respect the
user's selected priority instead of falling back to database default.

Validation ensures only valid priority values (low, medium, high, critical)
are accepted and stored in the database.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: implement clean slate priority migration (no backward compatibility)

Remove all task_order to priority mapping logic for true decoupling:

- All existing tasks get 'medium' priority (clean slate)
- No complex CASE logic or task_order relationships
- Users explicitly set priorities as needed after migration
- Truly independent priority and visual ordering systems
- Simpler, safer migration with no coupling logic

This approach prioritizes clean architecture over preserving
implied user intentions from the old coupled system.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor: rename TaskPriority.tsx to TaskPriorityComponent.tsx for consistency

- Renamed file to match the exported component name
- Updated import in index.ts barrel export
- Maintains consistency with other component naming patterns

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
2025-09-17 13:44:25 +03:00
DIY Smart Code
9f2d70ae0e Fix Issue #362: Provider-agnostic error handling for all LLM providers (#650)
* feat: Provider-agnostic error handling for Issue #362

Implements generic error handling that works for OpenAI, Google AI,
Anthropic, and other LLM providers to prevent silent failures.

Essential files only:
1. Provider error adapters (new) - handles any LLM provider
2. Backend API key validation - detects invalid keys before operations
3. Frontend error handler - provider-aware error messages
4. Updated hooks - uses generic error handling

Core functionality:
 Validates API keys before expensive operations (crawl, upload, refresh)
 Shows clear provider-specific error messages
 Works with OpenAI: 'Please verify your OpenAI API key in Settings'
 Works with Google: 'Please verify your Google API key in Settings'
 Prevents 90-minute debugging sessions from Issue #362

No unnecessary changes - only essential error handling logic.

Fixes #362

* fix: Enhance API key validation with detailed logging and error handling

- Add comprehensive logging to trace validation flow
- Ensure validation actually blocks operations on authentication failures
- Improve error detection to catch wrapped OpenAI errors
- Fail fast on any validation errors to prevent wasted operations

This should ensure invalid API keys are caught before crawl starts,
not during embedding processing after documents are crawled.

* fix: Simplify API key validation to always fail on exceptions

- Remove complex provider adapter imports that cause module issues
- Simplified validation that fails fast on any embedding creation error
- Enhanced logging to trace exactly what's happening
- Always block operations when API key validation fails

This ensures invalid API keys are caught immediately before
crawl operations start, preventing silent failures.

* fix: Add API key validation to refresh and upload endpoints

The validation was only added to new crawl endpoint but missing from:
- Knowledge item refresh endpoint (/knowledge-items/{source_id}/refresh)
- Document upload endpoint (/documents/upload)

Now all three endpoints that create embeddings will validate API keys
before starting operations, preventing silent failures on refresh/upload.

* security: Implement core security fixes from CodeRabbit review

Enhanced sanitization and provider detection based on CodeRabbit feedback:

 Comprehensive regex patterns for all provider API keys
  - OpenAI: sk-[a-zA-Z0-9]{48} with case-insensitive matching
  - Google AI: AIza[a-zA-Z0-9_-]{35} with flexible matching
  - Anthropic: sk-ant-[a-zA-Z0-9_-]{10,} with variable length

 Enhanced provider detection with multiple patterns
  - Case-insensitive keyword matching (openai, google, anthropic)
  - Regex-based API key detection for reliable identification
  - Additional keywords (gpt, claude, vertex, googleapis)

 Improved sanitization patterns
  - Provider-specific URL sanitization (openai.com, googleapis.com, anthropic.com)
  - Organization and project ID redaction
  - OAuth token and bearer token sanitization
  - Sensitive keyword detection and generic fallback

 Sanitized error logging
  - All error messages sanitized before logging
  - Prevents sensitive data exposure in backend logs
  - Maintains debugging capability with redacted information

Core security improvements while maintaining simplicity for beta deployment.

* fix: Replace ad-hoc error sanitization with centralized ProviderErrorFactory

- Remove local _sanitize_provider_error implementation with inline regex patterns
- Add ProviderErrorFactory import from embeddings.provider_error_adapters
- Update _validate_provider_api_key calls to pass correct active embedding provider
- Replace sanitization call with ProviderErrorFactory.sanitize_provider_error()
- Eliminate duplicate logic and fixed-length key assumptions
- Ensure provider-specific, configurable sanitization patterns are used consistently

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* chore: Remove accidentally committed PRP file

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: address code review feedback

- Add barrel export for providerErrorHandler in utils/index.ts
- Change TypeScript typing from 'any' to 'unknown' for strict type safety

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
2025-09-17 13:13:41 +03:00
leex279
b2ec7df666 Fix list semantics and increase aurora padding
- Wrap ProjectCard components in <li> elements for proper ul > li structure
- Improve accessibility by fixing list semantics
- Increase left/right padding from pl-3/pr-3 to pl-6 md:pl-8 / pr-6 md:pr-8
- Ensures aurora effects (-inset-[100px] + blur-3xl) and shadows (15-20px) have adequate clearance
- Responsive padding: 24px mobile, 32px desktop for optimal glow visibility

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-16 14:22:56 +03:00
leex279
8a5f676668 Remove unnecessary comment per feedback 2025-09-16 14:22:56 +03:00
leex279
59bd0aed8d Fix project card left margin - prevent glow effect clipping
- Add pl-3 to flex container to prevent first card's left glow/shadow clipping
- Add pr-3 to container for symmetry and prevent right glow clipping during scroll
- Glow effects (shadow-[0_0_15px_rgba(168,85,247,0.4)] and blur-3xl) now have proper clearance space
- No breaking changes to spacing or layout behavior
- Maintains responsive behavior across all viewport sizes

Fixes #655

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-16 14:22:56 +03:00
leex279
e3a051f0b8 fix: Auto-save tags when using Enter key
- Add handleAddTagAndSave function that combines tag addition with immediate persistence
- Update handleKeyDown to auto-save when Enter is pressed with tag input
- Prevent tags from being lost when user cancels after using Enter
- Maintain existing behavior for empty input (save current state)
- Improve user experience with immediate persistence on Enter

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-16 14:22:18 +03:00
leex279
5a5f763795 refactor: Improve optimistic updates with proper TypeScript types
- Replace any types with proper KnowledgeItemsResponse typing
- Add support for title field updates in optimistic cache updates
- Ensure metadata synchronization with top-level fields (tags, knowledge_type)
- Add type guards for all update fields (string, array validation)
- Initialize metadata if missing to prevent undefined errors
- Maintain immutability with proper object spreading
- Protect tag editing state from external prop updates during editing

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-16 14:22:18 +03:00
leex279
09bb36f9b6 feat: Add optimistic updates and improve component reliability
- Add optimistic updates for knowledge_type changes in useUpdateKnowledgeItem
- Update both detail and summary caches to prevent visual reversion
- Refactor KnowledgeCardType to use controlled Radix Select component
- Remove manual click-outside detection in favor of Radix onOpenChange
- Protect tag editing state from being overwritten by external updates
- Ensure user input is preserved during active editing sessions

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-16 14:22:18 +03:00
leex279
53d4bf8804 polish: Simplify tag tooltip text
Remove verbose 'or hover to delete' text from tag tooltips.
Tooltips now show clean 'Click to edit "tagname"' message.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-16 14:22:18 +03:00