At some point, most people who use cloud AI services have a moment of pause. You're typing a question about a medical symptom, or drafting a message about a sensitive business situation, or asking for help with code that contains proprietary logic — and you realize that everything you're typing is being sent to a server you don't control, processed by a company whose data retention policies you've never read, and potentially used to train future models.
For many use cases, that's an acceptable trade-off. For many others, it isn't. And for a growing number of developers, the question isn't whether to use AI — it's whether to use AI without handing over their data.
ZeroClaw and Ollama together answer that question. Here's how to set it up.
Why Local-First Is Worth the Effort
The obvious benefit of running AI locally is privacy: your prompts and responses never leave your machine. But the less obvious benefits are often just as compelling.
There are no API costs. Cloud AI services charge per token — typically a few dollars per million tokens for input, more for output. For light personal use, that's negligible. For a business processing thousands of documents, or a developer running an AI assistant all day, it adds up fast. A local model has zero marginal cost per query.
There are no rate limits. Cloud providers throttle requests to manage load. A local model runs as fast as your hardware allows, with no queuing, no 429 errors, and no degraded service during peak hours.
There's no internet dependency. A local AI assistant works on airplanes, in basements, behind corporate firewalls, and in air-gapped environments where cloud access is prohibited. Once the model is downloaded, it runs entirely offline.
And for regulated industries — healthcare, legal, finance — local AI isn't just a preference, it's often a compliance requirement. HIPAA, GDPR, and various financial regulations place strict limits on where sensitive data can be processed. A local model sidesteps those concerns entirely.
Step 1: Install Ollama
Ollama is the easiest way to run large language models locally. It handles model downloads, quantization, and serving through a simple API that ZeroClaw knows how to talk to.
On macOS:
```bash brew install ollama ```
On Linux:
```bash curl -fsSL https://ollama.com/install.sh | sh ```
On Windows, download the installer from ollama.com.
Once installed, pull a model. For most use cases, llama3.1:8b is the right starting point — it's capable enough for real work and runs comfortably on machines with 8GB of RAM:
```bash ollama pull llama3.1:8b ```
If you're on lower-end hardware, `qwen3:4b` or `phi3:mini` are lighter options that still handle most tasks well. If you have a machine with 16GB+ RAM and want better quality, `llama3.1:70b` (quantized) is worth trying.
Step 2: Install ZeroClaw
```bash brew install zeroclaw ```
Or on Linux:
```bash curl -fsSL https://raw.githubusercontent.com/zeroclaw-labs/zeroclaw/main/scripts/bootstrap.sh | bash ```
ZeroClaw is a single binary. There's nothing else to install, no runtime to configure, no dependencies to manage.
Step 3: Point ZeroClaw at Ollama
Edit your `config.toml` to tell ZeroClaw to use Ollama as its AI provider:
```toml [ai] provider = "ollama" model = "llama3.1:8b" endpoint = "http://localhost:11434" ```
That's the entire configuration change. ZeroClaw's provider system is designed so that switching between Anthropic, OpenAI, Ollama, or any other supported provider is a one-line change. No code modifications, no recompile, no plugin to install.
Step 4: Connect a Channel
Add Telegram as your interface — it works on every device, has a good mobile app, and ZeroClaw's Telegram integration is mature:
```toml [channels.telegram] token = "YOUR_BOT_TOKEN" allowed_users = [123456789] ```
Start ZeroClaw:
```bash zeroclaw start ```
Send a message to your Telegram bot. The response comes from Ollama running on your machine. Nothing touches the internet except the Telegram API call to deliver the message — the actual AI processing is entirely local.
Going Further: Hybrid Mode
Pure local AI has one real limitation: smaller models aren't as capable as frontier models like Claude or GPT-4 for complex reasoning tasks. ZeroClaw's hybrid mode lets you get the best of both worlds.
```toml [ai] provider = "ollama" model = "llama3.1:8b"
[ai.fallback] provider = "anthropic" model = "claude-sonnet-4-20250514" trigger = "complexity_threshold" ```
With this configuration, simple questions — "what's the capital of France?", "summarize this paragraph", "write a regex for email addresses" — are handled locally at zero cost. Complex reasoning tasks that the local model struggles with fall back to Claude automatically. You control where the boundary sits. For most users, this hybrid approach is the practical sweet spot: 80–90% of queries handled locally for free, with cloud fallback available for the cases that genuinely need it.
What This Actually Runs On
Running ZeroClaw + Ollama with llama3.1:8b requires about 6GB of RAM for the model itself, plus a negligible 4MB for ZeroClaw. A machine with 8GB of RAM can run the full stack; 16GB is comfortable. Response time on an Apple M1 is typically 2–5 seconds for a typical query. On a modern machine with a discrete GPU, it's faster.
A $200 Mac Mini, a $50 used ThinkPad, or a machine you already own — any of these can run a fully private AI assistant 24/7 with zero ongoing costs. The hardware pays for itself in a few months compared to a cloud AI subscription.
The Bigger Picture
The narrative that "AI requires the cloud" made sense in 2023, when running a capable model locally required expensive hardware and significant technical expertise. That's no longer true. Ollama made local models accessible. ZeroClaw made connecting them to your daily workflow trivial.
The result is an AI assistant that knows nothing about you except what you tell it, stores nothing on anyone else's servers, and costs nothing to run beyond the electricity to keep your machine on. For anyone who's ever hesitated before typing something sensitive into a chat box, that's worth a lot.