The most common question from people setting up their first local AI agent isn't about software. It's about hardware. "What do I need to buy?" And the honest answer is: it depends on what you want to do with it.
A $10 Raspberry Pi can run ZeroClaw with a tiny model. An $800 workstation can run 70B-parameter models at conversational speed. Between those extremes is a range of practical sweet spots that most people miss because the internet loves to recommend either the cheapest or the most expensive option.
Here's the actual landscape, tier by tier.
Tier 1: Under $50 — The Minimum Viable Agent
Hardware: Raspberry Pi 4 (4GB) or Pi Zero 2 W
What you get: ZeroClaw runs beautifully — 3.4MB binary, under 5MB RAM, instant startup. But model inference on CPU-only Pi hardware is slow. A quantized 1.5B model (like Qwen2.5:1.5b) runs at 2-4 tokens per second. A 4B model is borderline unusable at under 1 tok/s.
Best for: Running ZeroClaw as a lightweight agent that routes queries to a cloud API (OpenAI, Anthropic, etc.). The Pi handles the agent logic, channel management, and memory — the cloud handles inference. API costs are typically $5-15/month for personal use.
Model recommendation: Don't run models locally at this tier. Use a cloud provider and treat the Pi as your always-on agent runtime.
Real cost: Pi 4 4GB ($35) + case ($5) + power supply ($8) + SD card ($8) = ~$56 total, plus monthly API costs.
Tier 2: $150-250 — The Edge AI Sweet Spot
Hardware: Raspberry Pi 5 (8GB) + AI HAT+ 2
What you get: 40 TOPS of dedicated AI inference. Quantized 8B models run at 12-15 tokens per second — fast enough for natural conversation. 4B models hit 22-28 tok/s. The HAT+ 2's 8GB of dedicated LPDDR4X memory means the model doesn't compete with the OS for RAM.
Best for: A fully offline, always-on AI assistant. No cloud dependency, no API costs, no data leaving your network. Perfect for home automation, family assistants, privacy-focused setups.
- •General use: llama3.1:8b (Q4_K_M quantization)
- •Fast responses: gemma3:4b
- •Coding help: qwen2.5-coder:7b
Real cost: Pi 5 8GB ($80) + AI HAT+ 2 ($130) + case + power + storage = ~$230 total. Zero ongoing costs.
Tier 3: $300-500 — The Used GPU Play
Hardware: Any desktop or mini PC + a used NVIDIA RTX 3090
What you get: 24GB of VRAM opens up a completely different model tier. You can run quantized 30B models at 20+ tok/s or a quantized 70B model at 8-12 tok/s. The quality jump from 8B to 30B+ is significant — longer context understanding, better reasoning, fewer hallucinations.
The RTX 3090 is the most cost-effective AI card on the used market in 2026. Originally $1,500, they sell for $250-350 used. Nothing else in that price range matches 24GB of VRAM.
Best for: Power users who want frontier-adjacent model quality without cloud costs. Developers running AI assistants for coding. Small teams sharing a local inference server.
- •General use: deepseek-v3.2:32b (Q4_K_M)
- •Coding: qwen2.5-coder:32b
- •Maximum quality: llama3.1:70b (Q3_K_M — fits in 24GB, slower but impressive)
Real cost: Used desktop ($100-150) + used RTX 3090 ($300) = ~$400-450. Power draw ~300W under load.
Tier 4: $500-800 — The Current-Gen Sweet Spot
Hardware: Desktop or mini PC + NVIDIA RTX 4070 Ti Super (16GB VRAM) or RTX 4080 Super (16GB)
What you get: Modern Ampere/Ada architecture with faster inference per TOPS than the 3090, better power efficiency, and hardware-accelerated quantization support. 16GB of VRAM comfortably runs 30B models. Inference speed for a 32B Q4 model: 25-35 tok/s.
The trade-off versus the 3090: less VRAM (16GB vs 24GB) but faster per-token inference, lower power draw, and newer driver support.
Best for: Daily-driver AI workstation. Running multiple models simultaneously (a small model for quick queries, a large one for complex tasks). Software development with AI pair programming.
- •General: deepseek-v3.2:32b (Q4_K_M) — the current quality champion for 16GB cards
- •Code: qwen2.5-coder:32b (Q4_K_M)
- •Fast: llama3.1:8b for quick queries (runs at 80+ tok/s on these cards)
Real cost: RTX 4070 Ti Super ($500-550 new) in an existing desktop, or ~$800 for a complete build.
Tier 5: $800+ — Maximum Local Performance
Hardware: Dual GPU setup or RTX 4090 (24GB) or RTX 5090 (32GB)
What you get: The RTX 4090's 24GB runs quantized 70B models at 15-20 tok/s — genuinely comparable to cloud API response times. The RTX 5090's 32GB fits larger quantizations for better quality. Dual 3090s (48GB combined via tensor parallelism) can run full-precision 30B models or highly quantized 100B+ models.
Best for: Research, business-critical AI workloads where cloud dependency is unacceptable, teams of 5-10 people sharing inference infrastructure.
- •RTX 4090: llama3.1:70b (Q4_K_M) at full speed
- •Dual 3090: llama3.1:70b (Q5_K_M) with better quality
- •RTX 5090: the largest models at the best quantization you can fit
The Decision Framework
Don't overbuy. The right tier depends on three questions:
- •No → Tier 1 ($50 Pi + cloud API) is the best value
- •Yes → Tier 2+ depending on quality requirements
- •Basic assistance (Q&A, simple tasks) → 4B-8B models, Tier 2
- •Good all-around quality → 30B models, Tier 3-4
- •Frontier-level reasoning → 70B+ models, Tier 5
- •Just you → Tier 2-3
- •Small team (2-5) → Tier 4
- •Larger team or production → Tier 5
The ZeroClaw Factor
One thing that's consistent across all tiers: ZeroClaw's overhead is negligible. At 3.4MB binary size and under 5MB RAM, it consumes a rounding error of resources at every tier. The entire hardware budget goes to model inference, not framework overhead.
On Tier 1 hardware where every megabyte matters, this is the difference between running a useful agent and running out of memory. On Tier 5 hardware, it means your $800 investment is almost entirely dedicated to AI performance, not wasted on runtime bloat.
Buy the hardware that matches your use case. Let the runtime stay out of the way.