Deploying an AI agent to production is different from running one on your laptop. Reliability, monitoring, scaling, and security all matter in ways they don't during development. ZeroClaw's architecture — single binary, zero dependencies, minimal resource usage — makes production deployment simpler than most agent frameworks, but "simpler" isn't "trivial."
This guide covers the full spectrum: single Docker container for small deployments, Docker Compose for multi-service setups, and Kubernetes for scale.
Docker: Single Container
The simplest production deployment. One container, one agent.
Create a Dockerfile:
```bash FROM alpine:3.19 COPY zeroclaw /usr/local/bin/zeroclaw COPY config.toml /etc/zeroclaw/config.toml RUN chmod +x /usr/local/bin/zeroclaw EXPOSE 3000 VOLUME /data CMD ["zeroclaw", "start", "--config", "/etc/zeroclaw/config.toml", "--data", "/data"] ```
The image is tiny. Alpine base (5MB) + ZeroClaw binary (3.4MB) = an 8.4MB container image. Compare that to a typical Python agent framework image at 800MB-1.2GB.
Build and run:
```bash docker build -t zeroclaw-agent . docker run -d \ --name my-agent \ -p 3000:3000 \ -v zeroclaw-data:/data \ --restart unless-stopped \ zeroclaw-agent ```
The volume mount ensures conversation history and memory persist across container restarts.
Docker Compose: Agent + Ollama
For a self-hosted stack with local model inference:
```bash version: '3.8' services: ollama: image: ollama/ollama volumes: - ollama-models:/root/.ollama deploy: resources: reservations: devices: - capabilities: [gpu]
zeroclaw: image: zeroclaw-agent depends_on: - ollama environment: - ZEROCLAW_PROVIDER_URL=http://ollama:11434/v1 volumes: - zeroclaw-data:/data ports: - "3000:3000" restart: unless-stopped
volumes: ollama-models: zeroclaw-data: ```
This gives you a complete offline-capable AI stack in two containers. Ollama handles model serving; ZeroClaw handles agent logic, memory, and channels.
Kubernetes: Production at Scale
For teams running multiple agents or serving many users, Kubernetes provides auto-scaling, health monitoring, and resource management.
Deployment manifest:
```bash apiVersion: apps/v1 kind: Deployment metadata: name: zeroclaw-agent spec: replicas: 3 selector: matchLabels: app: zeroclaw template: metadata: labels: app: zeroclaw spec: containers: - name: zeroclaw image: zeroclaw-agent:latest resources: requests: memory: "16Mi" cpu: "50m" limits: memory: "64Mi" cpu: "500m" ports: - containerPort: 3000 livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 1 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 3000 initialDelaySeconds: 1 periodSeconds: 5 volumeMounts: - name: data mountPath: /data volumes: - name: data persistentVolumeClaim: claimName: zeroclaw-data ```
Note the resource requests: 16Mi memory, 50m CPU. That's not a typo. ZeroClaw genuinely runs in 16MB of RAM. A Kubernetes node with 8GB of RAM can run hundreds of ZeroClaw instances.
Horizontal Pod Autoscaler:
```bash apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: zeroclaw-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: zeroclaw-agent minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 ```
ZeroClaw's sub-10ms startup means new pods are ready to serve requests almost instantly — no warm-up period, no slow JVM startup, no dependency loading. The autoscaler can react to traffic spikes in seconds rather than minutes.
Monitoring
ZeroClaw exposes Prometheus metrics at `/metrics`:
- •`zeroclaw_requests_total` — total requests processed
- •`zeroclaw_request_duration_seconds` — request latency histogram
- •`zeroclaw_memory_entries_total` — memory database size
- •`zeroclaw_active_channels` — connected messaging channels
- •`zeroclaw_tool_invocations_total` — tool usage by type
A Prometheus scrape config:
```bash scrape_configs: - job_name: 'zeroclaw' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_label_app] regex: zeroclaw action: keep ```
Pair with Grafana dashboards for visibility into agent performance, memory growth, and channel health.
Security Hardening for Production
Production deployments need additional security beyond development defaults:
Network policy: Restrict pod-to-pod communication. ZeroClaw only needs to reach the model inference service (Ollama or cloud API) and the messaging platform APIs.
```bash apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: zeroclaw-netpol spec: podSelector: matchLabels: app: zeroclaw policyTypes: - Egress egress: - to: - podSelector: matchLabels: app: ollama ports: - port: 11434 - to: - ipBlock: cidr: 0.0.0.0/0 ports: - port: 443 ```
Secrets management: API keys and bot tokens should come from Kubernetes Secrets, not environment variables or config files baked into the image.
Read-only filesystem: Mount the container filesystem as read-only, with a writable volume only for the data directory.
Non-root execution: ZeroClaw doesn't need root permissions. Run as a non-root user with `securityContext.runAsNonRoot: true`.
The Deployment Decision Tree
- •Single user, always-on: Docker container on a VPS or home server. Simplest setup, lowest cost.
- •Small team (2-10 users): Docker Compose with Ollama for local inference or a cloud API for convenience.
- •Organization (10+ users): Kubernetes with auto-scaling, monitoring, and security policies.
- •Enterprise (regulated industry): Kubernetes + NVIDIA OpenShell for container-level isolation, or dedicated infrastructure with compliance controls.
ZeroClaw's resource efficiency means you can start with the simplest deployment and scale up without rearchitecting. A Docker container and a Kubernetes pod run the same binary with the same configuration. The upgrade path is adding infrastructure, not rewriting the agent.