tutorial deployment production

Deploying ZeroClaw to Production: Docker, Kubernetes, and Everything Between

ZeroClaws.io

@zeroclaws

March 21, 2026

9 min read

Deploying ZeroClaw to Production: Docker, Kubernetes, and Everything Between

Deploying an AI agent to production is different from running one on your laptop. Reliability, monitoring, scaling, and security all matter in ways they don't during development. ZeroClaw's architecture — single binary, zero dependencies, minimal resource usage — makes production deployment simpler than most agent frameworks, but "simpler" isn't "trivial."

This guide covers the full spectrum: single Docker container for small deployments, Docker Compose for multi-service setups, and Kubernetes for scale.

Docker: Single Container

The simplest production deployment. One container, one agent.

Create a Dockerfile:

```bash FROM alpine:3.19 COPY zeroclaw /usr/local/bin/zeroclaw COPY config.toml /etc/zeroclaw/config.toml RUN chmod +x /usr/local/bin/zeroclaw EXPOSE 3000 VOLUME /data CMD ["zeroclaw", "start", "--config", "/etc/zeroclaw/config.toml", "--data", "/data"] ```

The image is tiny. Alpine base (5MB) + ZeroClaw binary (3.4MB) = an 8.4MB container image. Compare that to a typical Python agent framework image at 800MB-1.2GB.

Build and run:

```bash docker build -t zeroclaw-agent . docker run -d \ --name my-agent \ -p 3000:3000 \ -v zeroclaw-data:/data \ --restart unless-stopped \ zeroclaw-agent ```

The volume mount ensures conversation history and memory persist across container restarts.

Docker Compose: Agent + Ollama

For a self-hosted stack with local model inference:

```bash version: '3.8' services: ollama: image: ollama/ollama volumes: - ollama-models:/root/.ollama deploy: resources: reservations: devices: - capabilities: [gpu]

zeroclaw: image: zeroclaw-agent depends_on: - ollama environment: - ZEROCLAW_PROVIDER_URL=http://ollama:11434/v1 volumes: - zeroclaw-data:/data ports: - "3000:3000" restart: unless-stopped

volumes: ollama-models: zeroclaw-data: ```

This gives you a complete offline-capable AI stack in two containers. Ollama handles model serving; ZeroClaw handles agent logic, memory, and channels.

Kubernetes: Production at Scale

For teams running multiple agents or serving many users, Kubernetes provides auto-scaling, health monitoring, and resource management.

Deployment manifest:

```bash apiVersion: apps/v1 kind: Deployment metadata: name: zeroclaw-agent spec: replicas: 3 selector: matchLabels: app: zeroclaw template: metadata: labels: app: zeroclaw spec: containers: - name: zeroclaw image: zeroclaw-agent:latest resources: requests: memory: "16Mi" cpu: "50m" limits: memory: "64Mi" cpu: "500m" ports: - containerPort: 3000 livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 1 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 3000 initialDelaySeconds: 1 periodSeconds: 5 volumeMounts: - name: data mountPath: /data volumes: - name: data persistentVolumeClaim: claimName: zeroclaw-data ```

Note the resource requests: 16Mi memory, 50m CPU. That's not a typo. ZeroClaw genuinely runs in 16MB of RAM. A Kubernetes node with 8GB of RAM can run hundreds of ZeroClaw instances.

Horizontal Pod Autoscaler:

```bash apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: zeroclaw-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: zeroclaw-agent minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 ```

ZeroClaw's sub-10ms startup means new pods are ready to serve requests almost instantly — no warm-up period, no slow JVM startup, no dependency loading. The autoscaler can react to traffic spikes in seconds rather than minutes.

Monitoring

ZeroClaw exposes Prometheus metrics at `/metrics`:

•`zeroclaw_requests_total` — total requests processed
•`zeroclaw_request_duration_seconds` — request latency histogram
•`zeroclaw_memory_entries_total` — memory database size
•`zeroclaw_active_channels` — connected messaging channels
•`zeroclaw_tool_invocations_total` — tool usage by type

A Prometheus scrape config:

```bash scrape_configs: - job_name: 'zeroclaw' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_label_app] regex: zeroclaw action: keep ```

Pair with Grafana dashboards for visibility into agent performance, memory growth, and channel health.

Security Hardening for Production

Production deployments need additional security beyond development defaults:

Network policy: Restrict pod-to-pod communication. ZeroClaw only needs to reach the model inference service (Ollama or cloud API) and the messaging platform APIs.

```bash apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: zeroclaw-netpol spec: podSelector: matchLabels: app: zeroclaw policyTypes: - Egress egress: - to: - podSelector: matchLabels: app: ollama ports: - port: 11434 - to: - ipBlock: cidr: 0.0.0.0/0 ports: - port: 443 ```

Secrets management: API keys and bot tokens should come from Kubernetes Secrets, not environment variables or config files baked into the image.

Read-only filesystem: Mount the container filesystem as read-only, with a writable volume only for the data directory.

Non-root execution: ZeroClaw doesn't need root permissions. Run as a non-root user with `securityContext.runAsNonRoot: true`.

The Deployment Decision Tree

•Single user, always-on: Docker container on a VPS or home server. Simplest setup, lowest cost.
•Small team (2-10 users): Docker Compose with Ollama for local inference or a cloud API for convenience.
•Organization (10+ users): Kubernetes with auto-scaling, monitoring, and security policies.
•Enterprise (regulated industry): Kubernetes + NVIDIA OpenShell for container-level isolation, or dedicated infrastructure with compliance controls.

ZeroClaw's resource efficiency means you can start with the simplest deployment and scale up without rearchitecting. A Docker container and a Kubernetes pod run the same binary with the same configuration. The upgrade path is adding infrastructure, not rewriting the agent.

arrow_back

arrow_forward

Share on: share code

star Star on GitHub