mirror of
https://github.com/coleam00/Archon.git
synced 2025-12-24 02:39:17 -05:00
Delete PLAYWRIGHT_FIX_ANALYSIS.md
This commit is contained in:
@@ -1,114 +0,0 @@
|
|||||||
# Playwright Browser Installation Issue - Root Cause Analysis
|
|
||||||
|
|
||||||
## The Problem
|
|
||||||
|
|
||||||
When attempting to crawl websites, Playwright fails with the error:
|
|
||||||
```
|
|
||||||
playwright._impl._errors.Error: BrowserType.launch: Executable doesn't exist at /root/.cache/ms-playwright/chromium-1187/chrome-linux/chrome
|
|
||||||
```
|
|
||||||
|
|
||||||
## Root Cause
|
|
||||||
|
|
||||||
The issue was introduced during the migration from `pip` to `uv` for package management (commit 9f22659).
|
|
||||||
|
|
||||||
### What Changed
|
|
||||||
|
|
||||||
**Old Dockerfile (pip-based):**
|
|
||||||
```dockerfile
|
|
||||||
# Copy Python packages from builder
|
|
||||||
COPY --from=builder /root/.local /root/.local
|
|
||||||
|
|
||||||
# Install Playwright browsers
|
|
||||||
ENV PATH=/root/.local/bin:$PATH
|
|
||||||
RUN playwright install chromium
|
|
||||||
```
|
|
||||||
|
|
||||||
**New Dockerfile (uv-based):**
|
|
||||||
```dockerfile
|
|
||||||
# Copy the virtual environment from builder
|
|
||||||
COPY --from=builder /venv /venv
|
|
||||||
|
|
||||||
# Install Playwright browsers
|
|
||||||
ENV PATH=/venv/bin:$PATH
|
|
||||||
RUN playwright install chromium
|
|
||||||
```
|
|
||||||
|
|
||||||
### Why It Broke
|
|
||||||
|
|
||||||
1. **Default Browser Location**: When `PLAYWRIGHT_BROWSERS_PATH` is not explicitly set, Playwright uses a default location (`/root/.cache/ms-playwright/`)
|
|
||||||
|
|
||||||
2. **Build vs Runtime Discrepancy**:
|
|
||||||
- At **build time**: Playwright installs browsers to `/root/.cache/ms-playwright/`
|
|
||||||
- At **runtime**: Playwright looks for browsers in the default location, but due to Docker layer caching or environment differences, the browsers are not accessible
|
|
||||||
|
|
||||||
3. **Missing Environment Variable**: The `PLAYWRIGHT_BROWSERS_PATH` environment variable was never set as a persistent ENV variable, only used during the RUN command (which doesn't persist to runtime)
|
|
||||||
|
|
||||||
### Why It Worked Before
|
|
||||||
|
|
||||||
The old pip-based system happened to work because:
|
|
||||||
- The user home directory (`/root`) was consistent between build and runtime
|
|
||||||
- The `.cache` directory in `/root` was implicitly included in the Docker layers
|
|
||||||
- There were fewer environmental differences between build and runtime contexts
|
|
||||||
|
|
||||||
However, this was **fragile** and relied on Docker's implicit behavior rather than explicit configuration.
|
|
||||||
|
|
||||||
## The Fix
|
|
||||||
|
|
||||||
Set `PLAYWRIGHT_BROWSERS_PATH` as a persistent environment variable:
|
|
||||||
|
|
||||||
```dockerfile
|
|
||||||
# Install Playwright browsers
|
|
||||||
ENV PATH=/venv/bin:$PATH
|
|
||||||
ENV PLAYWRIGHT_BROWSERS_PATH=/ms-playwright
|
|
||||||
RUN playwright install chromium
|
|
||||||
```
|
|
||||||
|
|
||||||
### Why This Works
|
|
||||||
|
|
||||||
1. **Explicit Location**: `/ms-playwright` is clearly defined and consistent
|
|
||||||
2. **Build-time**: Playwright installs browsers to `/ms-playwright`
|
|
||||||
3. **Runtime**: Playwright looks for browsers in `/ms-playwright` (same location!)
|
|
||||||
4. **Persistence**: The ENV variable persists into the running container
|
|
||||||
|
|
||||||
### Why We Don't Use `--with-deps`
|
|
||||||
|
|
||||||
The Dockerfile already manually installs all required Playwright system dependencies (lines 26-49). Using `--with-deps` would attempt to reinstall these packages, which can:
|
|
||||||
- Cause package conflicts
|
|
||||||
- Fail on certain platforms (especially Windows/WSL)
|
|
||||||
- Significantly increase build time
|
|
||||||
- Lead to build failures
|
|
||||||
|
|
||||||
## Affected Branches
|
|
||||||
|
|
||||||
- ✅ **main branch**: Fixed (commit pending)
|
|
||||||
- ✅ **feature/advanced-crawl-domain-filtering**: Fixed
|
|
||||||
|
|
||||||
## Testing
|
|
||||||
|
|
||||||
To verify the fix works:
|
|
||||||
```bash
|
|
||||||
# Rebuild and restart the server
|
|
||||||
docker compose up --build -d archon-server
|
|
||||||
|
|
||||||
# Try crawling any website
|
|
||||||
# It should now work without browser errors
|
|
||||||
```
|
|
||||||
|
|
||||||
## Lessons Learned
|
|
||||||
|
|
||||||
1. **Always set environment variables explicitly** - Don't rely on defaults
|
|
||||||
2. **ENV vs ARG**: ENV variables persist to runtime, ARG only exists at build time
|
|
||||||
3. **Test after infrastructure changes** - Package manager migrations can have subtle side effects
|
|
||||||
4. **Document non-obvious requirements** - Playwright's browser path requirement should be explicit
|
|
||||||
|
|
||||||
## Related Files
|
|
||||||
|
|
||||||
- `python/Dockerfile.server` - Main fix location
|
|
||||||
- `python/src/server/services/crawling/crawling_service.py` - Crawling service that uses Playwright
|
|
||||||
- `python/pyproject.toml` - Dependencies including crawl4ai (which uses Playwright)
|
|
||||||
|
|
||||||
## References
|
|
||||||
|
|
||||||
- Playwright documentation: https://playwright.dev/python/docs/browsers
|
|
||||||
- Docker ENV vs ARG: https://docs.docker.com/engine/reference/builder/#env
|
|
||||||
- crawl4ai library: https://github.com/unclecode/crawl4ai
|
|
||||||
Reference in New Issue
Block a user