Add comprehensive cluster documentation and configuration examples

Co-authored-by: samanhappy <2755122+samanhappy@users.noreply.github.com>
2025-12-24 02:39:19 -05:00 · 2025-10-26 14:13:40 +00:00
parent f4bac3adc0
commit 1f535c3d7f
4 changed files with 1473 additions and 3 deletions
--- a/docs/cluster-deployment.md
+++ b/docs/cluster-deployment.md
@@ -0,0 +1,516 @@
+# Cluster Deployment Guide
+
+MCPHub supports cluster deployment, allowing you to run multiple nodes that work together as a unified system. This enables:
+
+- **High Availability**: Distribute MCP servers across multiple nodes for redundancy
+- **Load Distribution**: Balance requests across multiple replicas of the same MCP server
+- **Sticky Sessions**: Ensure client sessions are routed to the same node consistently
+- **Centralized Management**: One coordinator manages the entire cluster
+
+## Architecture
+
+MCPHub cluster has three operating modes:
+
+1. **Standalone Mode** (Default): Single node operation, no cluster features
+2. **Coordinator Mode**: Central node that manages the cluster, routes requests, and maintains session affinity
+3. **Node Mode**: Worker nodes that register with the coordinator and run MCP servers
+
+```
+┌─────────────────────────────────────────┐
+│         Coordinator Node                │
+│  - Manages cluster state                │
+│  - Routes client requests                │
+│  - Maintains session affinity           │
+│  - Health monitoring                    │
+└───────────┬─────────────────────────────┘
+            │
+    ┌───────┴───────────────────┐
+    │                           │
+┌───▼────────┐         ┌────────▼────┐
+│  Node 1    │         │   Node 2    │
+│  - MCP A   │         │   - MCP A   │
+│  - MCP B   │         │   - MCP C   │
+└────────────┘         └─────────────┘
+```
+
+## Configuration
+
+### Coordinator Configuration
+
+Create or update `mcp_settings.json` on the coordinator node:
+
+```json
+{
+  "mcpServers": {
+    // Optional: coordinator can also run MCP servers
+    "example": {
+      "command": "npx",
+      "args": ["-y", "example-mcp-server"]
+    }
+  },
+  "systemConfig": {
+    "cluster": {
+      "enabled": true,
+      "mode": "coordinator",
+      "coordinator": {
+        "nodeTimeout": 15000,
+        "cleanupInterval": 30000,
+        "stickySessionTimeout": 3600000
+      },
+      "stickySession": {
+        "enabled": true,
+        "strategy": "consistent-hash",
+        "cookieName": "MCPHUB_NODE",
+        "headerName": "X-MCPHub-Node"
+      }
+    }
+  }
+}
+```
+
+**Configuration Options:**
+
+- `nodeTimeout`: Time (ms) before marking a node as unhealthy (default: 15000)
+- `cleanupInterval`: Interval (ms) for cleaning up inactive nodes (default: 30000)
+- `stickySessionTimeout`: Session affinity timeout (ms) (default: 3600000 - 1 hour)
+- `stickySession.enabled`: Enable sticky session routing (default: true)
+- `stickySession.strategy`: Session affinity strategy:
+  - `consistent-hash`: Hash-based routing (default)
+  - `cookie`: Cookie-based routing
+  - `header`: Header-based routing
+
+### Node Configuration
+
+Create or update `mcp_settings.json` on each worker node:
+
+```json
+{
+  "mcpServers": {
+    "amap": {
+      "command": "npx",
+      "args": ["-y", "@amap/amap-maps-mcp-server"]
+    },
+    "playwright": {
+      "command": "npx",
+      "args": ["@playwright/mcp@latest", "--headless"]
+    }
+  },
+  "systemConfig": {
+    "cluster": {
+      "enabled": true,
+      "mode": "node",
+      "node": {
+        "id": "node-1",
+        "name": "Worker Node 1",
+        "coordinatorUrl": "http://coordinator:3000",
+        "heartbeatInterval": 5000,
+        "registerOnStartup": true
+      }
+    }
+  }
+}
+```
+
+**Configuration Options:**
+
+- `node.id`: Unique node identifier (auto-generated if not provided)
+- `node.name`: Human-readable node name (defaults to hostname)
+- `node.coordinatorUrl`: URL of the coordinator node (required)
+- `node.heartbeatInterval`: Heartbeat interval (ms) (default: 5000)
+- `node.registerOnStartup`: Auto-register on startup (default: true)
+
+## Deployment Scenarios
+
+### Scenario 1: Docker Compose
+
+Create a `docker-compose.yml`:
+
+```yaml
+version: '3.8'
+
+services:
+  coordinator:
+    image: samanhappy/mcphub:latest
+    ports:
+      - "3000:3000"
+    volumes:
+      - ./coordinator-config.json:/app/mcp_settings.json
+      - coordinator-data:/app/data
+    environment:
+      - NODE_ENV=production
+
+  node1:
+    image: samanhappy/mcphub:latest
+    volumes:
+      - ./node1-config.json:/app/mcp_settings.json
+      - node1-data:/app/data
+    environment:
+      - NODE_ENV=production
+    depends_on:
+      - coordinator
+
+  node2:
+    image: samanhappy/mcphub:latest
+    volumes:
+      - ./node2-config.json:/app/mcp_settings.json
+      - node2-data:/app/data
+    environment:
+      - NODE_ENV=production
+    depends_on:
+      - coordinator
+
+volumes:
+  coordinator-data:
+  node1-data:
+  node2-data:
+```
+
+Start the cluster:
+
+```bash
+docker-compose up -d
+```
+
+### Scenario 2: Kubernetes
+
+Create Kubernetes manifests:
+
+**Coordinator Deployment:**
+
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: mcphub-coordinator
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: mcphub-coordinator
+  template:
+    metadata:
+      labels:
+        app: mcphub-coordinator
+    spec:
+      containers:
+      - name: mcphub
+        image: samanhappy/mcphub:latest
+        ports:
+        - containerPort: 3000
+        volumeMounts:
+        - name: config
+          mountPath: /app/mcp_settings.json
+          subPath: mcp_settings.json
+      volumes:
+      - name: config
+        configMap:
+          name: mcphub-coordinator-config
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: mcphub-coordinator
+spec:
+  selector:
+    app: mcphub-coordinator
+  ports:
+  - port: 3000
+    targetPort: 3000
+  type: LoadBalancer
+```
+
+**Worker Node Deployment:**
+
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: mcphub-node
+spec:
+  replicas: 3
+  selector:
+    matchLabels:
+      app: mcphub-node
+  template:
+    metadata:
+      labels:
+        app: mcphub-node
+    spec:
+      containers:
+      - name: mcphub
+        image: samanhappy/mcphub:latest
+        volumeMounts:
+        - name: config
+          mountPath: /app/mcp_settings.json
+          subPath: mcp_settings.json
+      volumes:
+      - name: config
+        configMap:
+          name: mcphub-node-config
+```
+
+Apply the manifests:
+
+```bash
+kubectl apply -f coordinator.yaml
+kubectl apply -f nodes.yaml
+```
+
+### Scenario 3: Manual Deployment
+
+**On Coordinator (192.168.1.100):**
+
+```bash
+# Install MCPHub
+npm install -g @samanhappy/mcphub
+
+# Configure as coordinator
+cat > mcp_settings.json <<EOF
+{
+  "systemConfig": {
+    "cluster": {
+      "enabled": true,
+      "mode": "coordinator"
+    }
+  }
+}
+EOF
+
+# Start coordinator
+PORT=3000 mcphub
+```
+
+**On Node 1 (192.168.1.101):**
+
+```bash
+# Install MCPHub
+npm install -g @samanhappy/mcphub
+
+# Configure as node
+cat > mcp_settings.json <<EOF
+{
+  "mcpServers": {
+    "server1": { "command": "..." }
+  },
+  "systemConfig": {
+    "cluster": {
+      "enabled": true,
+      "mode": "node",
+      "node": {
+        "coordinatorUrl": "http://192.168.1.100:3000"
+      }
+    }
+  }
+}
+EOF
+
+# Start node
+PORT=3001 mcphub
+```
+
+**On Node 2 (192.168.1.102):**
+
+```bash
+# Similar to Node 1, but with PORT=3002
+```
+
+## Usage
+
+### Accessing the Cluster
+
+Once the cluster is running, connect AI clients to the coordinator's endpoint:
+
+```
+http://coordinator:3000/mcp
+http://coordinator:3000/sse
+```
+
+The coordinator will:
+1. Route requests to appropriate nodes based on session affinity
+2. Load balance across multiple replicas of the same server
+3. Automatically failover to healthy nodes
+
+### Sticky Sessions
+
+Sticky sessions ensure that a client's requests are routed to the same node throughout their session. This is important for:
+
+- Maintaining conversation context
+- Preserving temporary state
+- Consistent tool execution
+
+The default strategy is **consistent-hash**, which uses the session ID to determine the target node. Alternative strategies:
+
+- **Cookie-based**: Uses `MCPHUB_NODE` cookie
+- **Header-based**: Uses `X-MCPHub-Node` header
+
+### Multiple Replicas
+
+You can deploy the same MCP server on multiple nodes for:
+
+- **Load balancing**: Distribute requests across replicas
+- **High availability**: Failover if one node goes down
+
+Example configuration:
+
+**Node 1:**
+```json
+{
+  "mcpServers": {
+    "playwright": {
+      "command": "npx",
+      "args": ["@playwright/mcp@latest"]
+    }
+  }
+}
+```
+
+**Node 2:**
+```json
+{
+  "mcpServers": {
+    "playwright": {
+      "command": "npx",
+      "args": ["@playwright/mcp@latest"]
+    }
+  }
+}
+```
+
+The coordinator will automatically load balance requests to `playwright` across both nodes.
+
+## Management API
+
+The coordinator exposes cluster management endpoints:
+
+### Get Cluster Status
+
+```bash
+curl http://coordinator:3000/api/cluster/status
+```
+
+Response:
+```json
+{
+  "success": true,
+  "data": {
+    "enabled": true,
+    "mode": "coordinator",
+    "nodeId": "coordinator",
+    "stats": {
+      "nodes": 3,
+      "activeNodes": 3,
+      "servers": 5,
+      "sessions": 10
+    }
+  }
+}
+```
+
+### Get All Nodes
+
+```bash
+curl http://coordinator:3000/api/cluster/nodes
+```
+
+### Get Server Replicas
+
+```bash
+curl http://coordinator:3000/api/cluster/servers/playwright/replicas
+```
+
+### Get Session Affinity
+
+```bash
+curl http://coordinator:3000/api/cluster/sessions/{sessionId}
+```
+
+## Monitoring and Troubleshooting
+
+### Check Node Health
+
+Monitor coordinator logs for heartbeat messages:
+
+```
+Node registered: Worker Node 1 (node-1) with 2 servers
+```
+
+If a node becomes unhealthy:
+
+```
+Marking node node-1 as unhealthy (last heartbeat: 2024-01-01T10:00:00.000Z)
+```
+
+### Verify Registration
+
+Check if nodes are registered:
+
+```bash
+curl http://coordinator:3000/api/cluster/nodes?active=true
+```
+
+### Session Affinity Issues
+
+If sessions aren't sticking to the same node:
+
+1. Verify sticky sessions are enabled in coordinator config
+2. Check that session IDs are being passed correctly
+3. Review coordinator logs for session affinity errors
+
+### Network Connectivity
+
+Ensure worker nodes can reach the coordinator:
+
+```bash
+# From worker node
+curl http://coordinator:3000/health
+```
+
+## Performance Considerations
+
+### Coordinator Load
+
+The coordinator handles:
+- Request routing
+- Node heartbeats
+- Session tracking
+
+For very large clusters (>50 nodes), consider:
+- Increasing coordinator resources
+- Tuning heartbeat intervals
+- Using header-based sticky sessions (lower overhead)
+
+### Network Latency
+
+Minimize latency between coordinator and nodes:
+- Deploy in the same datacenter/region
+- Use low-latency networking
+- Consider coordinator placement near clients
+
+### Session Timeout
+
+Balance session timeout with resource usage:
+- Shorter timeout: Less memory, more re-routing
+- Longer timeout: Better stickiness, more memory
+
+Default is 1 hour, adjust based on your use case.
+
+## Limitations
+
+1. **Stateful Sessions**: Node-local state is lost if a node fails. Use external storage for persistent state.
+2. **Single Coordinator**: Currently supports one coordinator. Consider load balancing at the infrastructure level.
+3. **Network Partitions**: Nodes that lose connection to coordinator will be marked unhealthy.
+
+## Best Practices
+
+1. **Use Groups**: Organize MCP servers into groups for easier management
+2. **Monitor Health**: Set up alerts for unhealthy nodes
+3. **Version Consistency**: Run the same MCPHub version across all nodes
+4. **Resource Planning**: Allocate appropriate resources based on MCP server requirements
+5. **Backup Configuration**: Keep coordinator config backed up
+6. **Gradual Rollout**: Test cluster configuration with a small number of nodes first
+
+## See Also
+
+- [Docker Deployment](../deployment/docker.md)
+- [Kubernetes Deployment](../deployment/kubernetes.md)
+- [High Availability Setup](../deployment/high-availability.md)