mirror of
https://github.com/coleam00/Archon.git
synced 2026-01-02 12:48:54 -05:00
122 lines
3.2 KiB
Markdown
122 lines
3.2 KiB
Markdown
# Archon V1 - Basic Pydantic AI Agent to Build other Pydantic AI Agents
|
|
|
|
This is the first iteration of the Archon project - no use of LangGraph and built with a single AI agent to keep things very simple and introductory.
|
|
|
|
An intelligent documentation crawler and RAG (Retrieval-Augmented Generation) agent built using Pydantic AI and Supabase that is capable of building other Pydantic AI agents. The agent crawls the Pydantic AI documentation, stores content in a vector database, and provides Pydantic AI agent code by retrieving and analyzing relevant documentation chunks.
|
|
|
|
## Features
|
|
|
|
- Pydantic AI documentation crawling and chunking
|
|
- Vector database storage with Supabase
|
|
- Semantic search using OpenAI embeddings
|
|
- RAG-based question answering
|
|
- Support for code block preservation
|
|
- Streamlit UI for interactive querying
|
|
|
|
## Prerequisites
|
|
|
|
- Python 3.11+
|
|
- Supabase account and database
|
|
- OpenAI API key
|
|
- Streamlit (for web interface)
|
|
|
|
## Installation
|
|
|
|
1. Clone the repository:
|
|
```bash
|
|
git clone https://github.com/coleam00/archon.git
|
|
cd archon/iterations/v1-single-agent
|
|
```
|
|
|
|
2. Install dependencies (recommended to use a Python virtual environment):
|
|
```bash
|
|
python -m venv venv
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
3. Set up environment variables:
|
|
- Rename `.env.example` to `.env`
|
|
- Edit `.env` with your API keys and preferences:
|
|
```env
|
|
OPENAI_API_KEY=your_openai_api_key
|
|
SUPABASE_URL=your_supabase_url
|
|
SUPABASE_SERVICE_KEY=your_supabase_service_key
|
|
LLM_MODEL=gpt-4o-mini # or your preferred OpenAI model
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Database Setup
|
|
|
|
Execute the SQL commands in `site_pages.sql` to:
|
|
1. Create the necessary tables
|
|
2. Enable vector similarity search
|
|
3. Set up Row Level Security policies
|
|
|
|
In Supabase, do this by going to the "SQL Editor" tab and pasting in the SQL into the editor there. Then click "Run".
|
|
|
|
### Crawl Documentation
|
|
|
|
To crawl and store documentation in the vector database:
|
|
|
|
```bash
|
|
python crawl_pydantic_ai_docs.py
|
|
```
|
|
|
|
This will:
|
|
1. Fetch URLs from the documentation sitemap
|
|
2. Crawl each page and split into chunks
|
|
3. Generate embeddings and store in Supabase
|
|
|
|
### Streamlit Web Interface
|
|
|
|
For an interactive web interface to query the documentation:
|
|
|
|
```bash
|
|
streamlit run streamlit_ui.py
|
|
```
|
|
|
|
The interface will be available at `http://localhost:8501`
|
|
|
|
## Configuration
|
|
|
|
### Database Schema
|
|
|
|
The Supabase database uses the following schema:
|
|
```sql
|
|
CREATE TABLE site_pages (
|
|
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
|
|
url TEXT,
|
|
chunk_number INTEGER,
|
|
title TEXT,
|
|
summary TEXT,
|
|
content TEXT,
|
|
metadata JSONB,
|
|
embedding VECTOR(1536)
|
|
);
|
|
```
|
|
|
|
### Chunking Configuration
|
|
|
|
You can configure chunking parameters in `crawl_pydantic_ai_docs.py`:
|
|
```python
|
|
chunk_size = 5000 # Characters per chunk
|
|
```
|
|
|
|
The chunker intelligently preserves:
|
|
- Code blocks
|
|
- Paragraph boundaries
|
|
- Sentence boundaries
|
|
|
|
## Project Structure
|
|
|
|
- `crawl_pydantic_ai_docs.py`: Documentation crawler and processor
|
|
- `pydantic_ai_expert.py`: RAG agent implementation
|
|
- `streamlit_ui.py`: Web interface
|
|
- `site_pages.sql`: Database setup commands
|
|
- `requirements.txt`: Project dependencies
|
|
|
|
## Contributing
|
|
|
|
Contributions are welcome! Please feel free to submit a Pull Request. |