refactor: Remove Socket.IO and implement HTTP polling architecture

Complete removal of Socket.IO/WebSocket dependencies in favor of simple HTTP polling:

Frontend changes:
- Remove all WebSocket/Socket.IO references from KnowledgeBasePage
- Implement useCrawlProgressPolling hook for progress tracking
- Fix polling hook to prevent ERR_INSUFFICIENT_RESOURCES errors
- Add proper cleanup and state management for completed crawls
- Persist and restore active crawl progress across page refreshes
- Fix agent chat service to handle disabled agents gracefully

Backend changes:
- Remove python-socketio from requirements
- Convert ProgressTracker to in-memory state management
- Add /api/crawl-progress/{id} endpoint for polling
- Initialize ProgressTracker immediately when operations start
- Remove all Socket.IO event handlers and cleanup commented code
- Simplify agent_chat_api to basic REST endpoints

Bug fixes:
- Fix race condition where progress data wasn't available for polling
- Fix memory leaks from recreating polling callbacks
- Fix crawl progress URL mismatch between frontend and backend
- Add proper error filtering for expected 404s during initialization
- Stop polling when crawl operations complete

This change simplifies the architecture significantly and makes it more robust
by removing the complexity of WebSocket connections.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Rasmus Widing
2025-08-28 12:02:30 +03:00
parent af402c24f6
commit 970ba04f0d
16 changed files with 382 additions and 539 deletions

View File

@@ -83,11 +83,12 @@ def test_search_knowledge(client):
assert response.status_code in [200, 400, 404, 422, 500]
def test_websocket_connection(client):
"""Test WebSocket/Socket.IO endpoint exists."""
response = client.get("/socket.io/")
# Socket.IO returns specific status codes
assert response.status_code in [200, 400, 404]
def test_polling_endpoint(client):
"""Test polling endpoints exist for progress tracking."""
# Test crawl progress endpoint
response = client.get("/api/knowledge/crawl-progress/test-id")
# Should return 200 with not_found status or actual progress
assert response.status_code in [200, 404, 500]
def test_authentication(client):