mirror of
https://github.com/samanhappy/mcphub.git
synced 2025-12-24 02:39:19 -05:00
Add comprehensive cluster documentation and configuration examples
Co-authored-by: samanhappy <2755122+samanhappy@users.noreply.github.com>
This commit is contained in:
committed by
samanhappy
parent
f4bac3adc0
commit
1f535c3d7f
516
docs/cluster-deployment.md
Normal file
516
docs/cluster-deployment.md
Normal file
@@ -0,0 +1,516 @@
|
||||
# Cluster Deployment Guide
|
||||
|
||||
MCPHub supports cluster deployment, allowing you to run multiple nodes that work together as a unified system. This enables:
|
||||
|
||||
- **High Availability**: Distribute MCP servers across multiple nodes for redundancy
|
||||
- **Load Distribution**: Balance requests across multiple replicas of the same MCP server
|
||||
- **Sticky Sessions**: Ensure client sessions are routed to the same node consistently
|
||||
- **Centralized Management**: One coordinator manages the entire cluster
|
||||
|
||||
## Architecture
|
||||
|
||||
MCPHub cluster has three operating modes:
|
||||
|
||||
1. **Standalone Mode** (Default): Single node operation, no cluster features
|
||||
2. **Coordinator Mode**: Central node that manages the cluster, routes requests, and maintains session affinity
|
||||
3. **Node Mode**: Worker nodes that register with the coordinator and run MCP servers
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────┐
|
||||
│ Coordinator Node │
|
||||
│ - Manages cluster state │
|
||||
│ - Routes client requests │
|
||||
│ - Maintains session affinity │
|
||||
│ - Health monitoring │
|
||||
└───────────┬─────────────────────────────┘
|
||||
│
|
||||
┌───────┴───────────────────┐
|
||||
│ │
|
||||
┌───▼────────┐ ┌────────▼────┐
|
||||
│ Node 1 │ │ Node 2 │
|
||||
│ - MCP A │ │ - MCP A │
|
||||
│ - MCP B │ │ - MCP C │
|
||||
└────────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Coordinator Configuration
|
||||
|
||||
Create or update `mcp_settings.json` on the coordinator node:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
// Optional: coordinator can also run MCP servers
|
||||
"example": {
|
||||
"command": "npx",
|
||||
"args": ["-y", "example-mcp-server"]
|
||||
}
|
||||
},
|
||||
"systemConfig": {
|
||||
"cluster": {
|
||||
"enabled": true,
|
||||
"mode": "coordinator",
|
||||
"coordinator": {
|
||||
"nodeTimeout": 15000,
|
||||
"cleanupInterval": 30000,
|
||||
"stickySessionTimeout": 3600000
|
||||
},
|
||||
"stickySession": {
|
||||
"enabled": true,
|
||||
"strategy": "consistent-hash",
|
||||
"cookieName": "MCPHUB_NODE",
|
||||
"headerName": "X-MCPHub-Node"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Configuration Options:**
|
||||
|
||||
- `nodeTimeout`: Time (ms) before marking a node as unhealthy (default: 15000)
|
||||
- `cleanupInterval`: Interval (ms) for cleaning up inactive nodes (default: 30000)
|
||||
- `stickySessionTimeout`: Session affinity timeout (ms) (default: 3600000 - 1 hour)
|
||||
- `stickySession.enabled`: Enable sticky session routing (default: true)
|
||||
- `stickySession.strategy`: Session affinity strategy:
|
||||
- `consistent-hash`: Hash-based routing (default)
|
||||
- `cookie`: Cookie-based routing
|
||||
- `header`: Header-based routing
|
||||
|
||||
### Node Configuration
|
||||
|
||||
Create or update `mcp_settings.json` on each worker node:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"amap": {
|
||||
"command": "npx",
|
||||
"args": ["-y", "@amap/amap-maps-mcp-server"]
|
||||
},
|
||||
"playwright": {
|
||||
"command": "npx",
|
||||
"args": ["@playwright/mcp@latest", "--headless"]
|
||||
}
|
||||
},
|
||||
"systemConfig": {
|
||||
"cluster": {
|
||||
"enabled": true,
|
||||
"mode": "node",
|
||||
"node": {
|
||||
"id": "node-1",
|
||||
"name": "Worker Node 1",
|
||||
"coordinatorUrl": "http://coordinator:3000",
|
||||
"heartbeatInterval": 5000,
|
||||
"registerOnStartup": true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Configuration Options:**
|
||||
|
||||
- `node.id`: Unique node identifier (auto-generated if not provided)
|
||||
- `node.name`: Human-readable node name (defaults to hostname)
|
||||
- `node.coordinatorUrl`: URL of the coordinator node (required)
|
||||
- `node.heartbeatInterval`: Heartbeat interval (ms) (default: 5000)
|
||||
- `node.registerOnStartup`: Auto-register on startup (default: true)
|
||||
|
||||
## Deployment Scenarios
|
||||
|
||||
### Scenario 1: Docker Compose
|
||||
|
||||
Create a `docker-compose.yml`:
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
coordinator:
|
||||
image: samanhappy/mcphub:latest
|
||||
ports:
|
||||
- "3000:3000"
|
||||
volumes:
|
||||
- ./coordinator-config.json:/app/mcp_settings.json
|
||||
- coordinator-data:/app/data
|
||||
environment:
|
||||
- NODE_ENV=production
|
||||
|
||||
node1:
|
||||
image: samanhappy/mcphub:latest
|
||||
volumes:
|
||||
- ./node1-config.json:/app/mcp_settings.json
|
||||
- node1-data:/app/data
|
||||
environment:
|
||||
- NODE_ENV=production
|
||||
depends_on:
|
||||
- coordinator
|
||||
|
||||
node2:
|
||||
image: samanhappy/mcphub:latest
|
||||
volumes:
|
||||
- ./node2-config.json:/app/mcp_settings.json
|
||||
- node2-data:/app/data
|
||||
environment:
|
||||
- NODE_ENV=production
|
||||
depends_on:
|
||||
- coordinator
|
||||
|
||||
volumes:
|
||||
coordinator-data:
|
||||
node1-data:
|
||||
node2-data:
|
||||
```
|
||||
|
||||
Start the cluster:
|
||||
|
||||
```bash
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
### Scenario 2: Kubernetes
|
||||
|
||||
Create Kubernetes manifests:
|
||||
|
||||
**Coordinator Deployment:**
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: mcphub-coordinator
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: mcphub-coordinator
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: mcphub-coordinator
|
||||
spec:
|
||||
containers:
|
||||
- name: mcphub
|
||||
image: samanhappy/mcphub:latest
|
||||
ports:
|
||||
- containerPort: 3000
|
||||
volumeMounts:
|
||||
- name: config
|
||||
mountPath: /app/mcp_settings.json
|
||||
subPath: mcp_settings.json
|
||||
volumes:
|
||||
- name: config
|
||||
configMap:
|
||||
name: mcphub-coordinator-config
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: mcphub-coordinator
|
||||
spec:
|
||||
selector:
|
||||
app: mcphub-coordinator
|
||||
ports:
|
||||
- port: 3000
|
||||
targetPort: 3000
|
||||
type: LoadBalancer
|
||||
```
|
||||
|
||||
**Worker Node Deployment:**
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: mcphub-node
|
||||
spec:
|
||||
replicas: 3
|
||||
selector:
|
||||
matchLabels:
|
||||
app: mcphub-node
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: mcphub-node
|
||||
spec:
|
||||
containers:
|
||||
- name: mcphub
|
||||
image: samanhappy/mcphub:latest
|
||||
volumeMounts:
|
||||
- name: config
|
||||
mountPath: /app/mcp_settings.json
|
||||
subPath: mcp_settings.json
|
||||
volumes:
|
||||
- name: config
|
||||
configMap:
|
||||
name: mcphub-node-config
|
||||
```
|
||||
|
||||
Apply the manifests:
|
||||
|
||||
```bash
|
||||
kubectl apply -f coordinator.yaml
|
||||
kubectl apply -f nodes.yaml
|
||||
```
|
||||
|
||||
### Scenario 3: Manual Deployment
|
||||
|
||||
**On Coordinator (192.168.1.100):**
|
||||
|
||||
```bash
|
||||
# Install MCPHub
|
||||
npm install -g @samanhappy/mcphub
|
||||
|
||||
# Configure as coordinator
|
||||
cat > mcp_settings.json <<EOF
|
||||
{
|
||||
"systemConfig": {
|
||||
"cluster": {
|
||||
"enabled": true,
|
||||
"mode": "coordinator"
|
||||
}
|
||||
}
|
||||
}
|
||||
EOF
|
||||
|
||||
# Start coordinator
|
||||
PORT=3000 mcphub
|
||||
```
|
||||
|
||||
**On Node 1 (192.168.1.101):**
|
||||
|
||||
```bash
|
||||
# Install MCPHub
|
||||
npm install -g @samanhappy/mcphub
|
||||
|
||||
# Configure as node
|
||||
cat > mcp_settings.json <<EOF
|
||||
{
|
||||
"mcpServers": {
|
||||
"server1": { "command": "..." }
|
||||
},
|
||||
"systemConfig": {
|
||||
"cluster": {
|
||||
"enabled": true,
|
||||
"mode": "node",
|
||||
"node": {
|
||||
"coordinatorUrl": "http://192.168.1.100:3000"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
EOF
|
||||
|
||||
# Start node
|
||||
PORT=3001 mcphub
|
||||
```
|
||||
|
||||
**On Node 2 (192.168.1.102):**
|
||||
|
||||
```bash
|
||||
# Similar to Node 1, but with PORT=3002
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Accessing the Cluster
|
||||
|
||||
Once the cluster is running, connect AI clients to the coordinator's endpoint:
|
||||
|
||||
```
|
||||
http://coordinator:3000/mcp
|
||||
http://coordinator:3000/sse
|
||||
```
|
||||
|
||||
The coordinator will:
|
||||
1. Route requests to appropriate nodes based on session affinity
|
||||
2. Load balance across multiple replicas of the same server
|
||||
3. Automatically failover to healthy nodes
|
||||
|
||||
### Sticky Sessions
|
||||
|
||||
Sticky sessions ensure that a client's requests are routed to the same node throughout their session. This is important for:
|
||||
|
||||
- Maintaining conversation context
|
||||
- Preserving temporary state
|
||||
- Consistent tool execution
|
||||
|
||||
The default strategy is **consistent-hash**, which uses the session ID to determine the target node. Alternative strategies:
|
||||
|
||||
- **Cookie-based**: Uses `MCPHUB_NODE` cookie
|
||||
- **Header-based**: Uses `X-MCPHub-Node` header
|
||||
|
||||
### Multiple Replicas
|
||||
|
||||
You can deploy the same MCP server on multiple nodes for:
|
||||
|
||||
- **Load balancing**: Distribute requests across replicas
|
||||
- **High availability**: Failover if one node goes down
|
||||
|
||||
Example configuration:
|
||||
|
||||
**Node 1:**
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"playwright": {
|
||||
"command": "npx",
|
||||
"args": ["@playwright/mcp@latest"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Node 2:**
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"playwright": {
|
||||
"command": "npx",
|
||||
"args": ["@playwright/mcp@latest"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The coordinator will automatically load balance requests to `playwright` across both nodes.
|
||||
|
||||
## Management API
|
||||
|
||||
The coordinator exposes cluster management endpoints:
|
||||
|
||||
### Get Cluster Status
|
||||
|
||||
```bash
|
||||
curl http://coordinator:3000/api/cluster/status
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": {
|
||||
"enabled": true,
|
||||
"mode": "coordinator",
|
||||
"nodeId": "coordinator",
|
||||
"stats": {
|
||||
"nodes": 3,
|
||||
"activeNodes": 3,
|
||||
"servers": 5,
|
||||
"sessions": 10
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Get All Nodes
|
||||
|
||||
```bash
|
||||
curl http://coordinator:3000/api/cluster/nodes
|
||||
```
|
||||
|
||||
### Get Server Replicas
|
||||
|
||||
```bash
|
||||
curl http://coordinator:3000/api/cluster/servers/playwright/replicas
|
||||
```
|
||||
|
||||
### Get Session Affinity
|
||||
|
||||
```bash
|
||||
curl http://coordinator:3000/api/cluster/sessions/{sessionId}
|
||||
```
|
||||
|
||||
## Monitoring and Troubleshooting
|
||||
|
||||
### Check Node Health
|
||||
|
||||
Monitor coordinator logs for heartbeat messages:
|
||||
|
||||
```
|
||||
Node registered: Worker Node 1 (node-1) with 2 servers
|
||||
```
|
||||
|
||||
If a node becomes unhealthy:
|
||||
|
||||
```
|
||||
Marking node node-1 as unhealthy (last heartbeat: 2024-01-01T10:00:00.000Z)
|
||||
```
|
||||
|
||||
### Verify Registration
|
||||
|
||||
Check if nodes are registered:
|
||||
|
||||
```bash
|
||||
curl http://coordinator:3000/api/cluster/nodes?active=true
|
||||
```
|
||||
|
||||
### Session Affinity Issues
|
||||
|
||||
If sessions aren't sticking to the same node:
|
||||
|
||||
1. Verify sticky sessions are enabled in coordinator config
|
||||
2. Check that session IDs are being passed correctly
|
||||
3. Review coordinator logs for session affinity errors
|
||||
|
||||
### Network Connectivity
|
||||
|
||||
Ensure worker nodes can reach the coordinator:
|
||||
|
||||
```bash
|
||||
# From worker node
|
||||
curl http://coordinator:3000/health
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Coordinator Load
|
||||
|
||||
The coordinator handles:
|
||||
- Request routing
|
||||
- Node heartbeats
|
||||
- Session tracking
|
||||
|
||||
For very large clusters (>50 nodes), consider:
|
||||
- Increasing coordinator resources
|
||||
- Tuning heartbeat intervals
|
||||
- Using header-based sticky sessions (lower overhead)
|
||||
|
||||
### Network Latency
|
||||
|
||||
Minimize latency between coordinator and nodes:
|
||||
- Deploy in the same datacenter/region
|
||||
- Use low-latency networking
|
||||
- Consider coordinator placement near clients
|
||||
|
||||
### Session Timeout
|
||||
|
||||
Balance session timeout with resource usage:
|
||||
- Shorter timeout: Less memory, more re-routing
|
||||
- Longer timeout: Better stickiness, more memory
|
||||
|
||||
Default is 1 hour, adjust based on your use case.
|
||||
|
||||
## Limitations
|
||||
|
||||
1. **Stateful Sessions**: Node-local state is lost if a node fails. Use external storage for persistent state.
|
||||
2. **Single Coordinator**: Currently supports one coordinator. Consider load balancing at the infrastructure level.
|
||||
3. **Network Partitions**: Nodes that lose connection to coordinator will be marked unhealthy.
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use Groups**: Organize MCP servers into groups for easier management
|
||||
2. **Monitor Health**: Set up alerts for unhealthy nodes
|
||||
3. **Version Consistency**: Run the same MCPHub version across all nodes
|
||||
4. **Resource Planning**: Allocate appropriate resources based on MCP server requirements
|
||||
5. **Backup Configuration**: Keep coordinator config backed up
|
||||
6. **Gradual Rollout**: Test cluster configuration with a small number of nodes first
|
||||
|
||||
## See Also
|
||||
|
||||
- [Docker Deployment](../deployment/docker.md)
|
||||
- [Kubernetes Deployment](../deployment/kubernetes.md)
|
||||
- [High Availability Setup](../deployment/high-availability.md)
|
||||
Reference in New Issue
Block a user