Inside ZeroClaw's Hybrid Memory: How SQLite + FTS5 + Vectors Beat Dedicated Vector Databases

When most AI agent frameworks implement memory, they reach for a vector database. Pinecone, Weaviate, ChromaDB, Qdrant — the options are well-known and well-documented. Store embeddings, query by similarity, get relevant context for the next prompt.

ZeroClaw uses none of them. Its memory system is a single SQLite file combining FTS5 full-text search with vector similarity search. No external database. No separate service. No network round-trips. The entire memory system ships inside the 3.4MB binary.

This isn't a limitation — it's a deliberate architectural choice, and the performance data backs it up.

Why Vector-Only Search Fails for Agents

Vector search finds semantically similar content. Ask "how do I deploy to Kubernetes?" and a vector search retrieves past conversations about Kubernetes deployment, Docker containers, and cloud infrastructure. Useful.

But agent memory has a different access pattern than a knowledge base. Agents need to recall:

•Exact terms. "What was the API key for the staging server?" Vector search might return conversations about API keys in general. You need the exact conversation where the staging key was mentioned.
•Recent context. "What did we just discuss?" Recency matters more than semantic similarity for maintaining conversation flow.
•Structured queries. "All conversations from last week about database migrations." This is a filter, not a similarity search.

Vector-only memory handles the first category well and the others poorly. Full-text search handles exact terms and structured queries well but misses semantic connections. The optimal solution combines both.

How ZeroClaw's Hybrid Search Works

ZeroClaw stores every memory entry in a SQLite table with three search mechanisms:

1. FTS5 full-text search (BM25 scoring). SQLite's FTS5 extension provides fast full-text search with BM25 relevance scoring — the same algorithm that powers traditional search engines. Queries like "staging API key" find exact mentions instantly, scored by term frequency and document length.

2. Vector similarity search. Each memory entry includes an embedding vector computed at write time. Similarity queries use cosine distance to find semantically related content, even when the exact words don't match.

3. Metadata filtering. Timestamps, conversation IDs, channel sources, and custom tags enable structured queries that narrow results before the search runs.

When ZeroClaw retrieves context for a prompt, it runs both searches in parallel and merges the results:

•FTS5 results are scored by BM25 relevance
•Vector results are scored by cosine similarity
•Scores are normalized and combined with configurable weights (default: 60% FTS5, 40% vector)
•Metadata filters (recency, source channel, conversation ID) are applied as pre-filters
•The top-K results become the agent's context window

This hybrid approach catches what either search alone would miss. The exact term "staging API key" surfaces through FTS5. The semantically related discussion about "deployment credentials" surfaces through vectors. The agent gets both.

Performance: SQLite vs Dedicated Vector DBs

Benchmarks on a dataset of 100,000 memory entries (typical for a heavily-used personal agent over 6 months):

•ZeroClaw SQLite: 0.3ms
•ChromaDB (local): 2.1ms
•Weaviate (local): 4.8ms
•Pinecone (cloud): 15-50ms (network dependent)

•ZeroClaw SQLite: 1.2ms
•ChromaDB: 8.5ms
•Weaviate: 12ms
•Pinecone: 30-80ms

•ZeroClaw SQLite: 45MB (file size on disk, memory-mapped)
•ChromaDB: 380MB RSS
•Weaviate: 1.2GB RSS

•ZeroClaw SQLite: <1ms (file is opened on demand)
•ChromaDB: 2.3s
•Weaviate: 8-12s

ZeroClaw's SQLite approach is faster at every operation and uses a fraction of the memory. The performance advantage comes from three factors:

1.**No network overhead.** SQLite is in-process. There's no serialization, no TCP round-trip, no connection pooling. Read and write operations are direct memory-mapped file access.

2.**No server process.** ChromaDB and Weaviate run as separate services with their own memory allocator, garbage collector, and thread pool. SQLite shares the agent's process space with zero overhead.

3.**Optimized for the access pattern.** Agent memory is mostly writes (every conversation turn) and occasional reads (context retrieval). SQLite excels at this pattern. Dedicated vector databases are optimized for large-scale similarity search across millions of vectors — overkill for agent-scale data.

The Embedding Strategy

ZeroClaw computes embeddings using a lightweight model bundled in the binary. The default is a quantized variant of a sentence transformer that produces 384-dimensional vectors. It's not the highest-quality embedding model available, but it's small enough to run in-process without a GPU.

For users who want higher-quality embeddings, ZeroClaw supports delegating embedding computation to Ollama or any OpenAI-compatible embedding endpoint. The trade-off: slightly higher write latency (the embedding API call) for better semantic search quality.

In practice, the bundled model is sufficient for most agent use cases. Agent memory recall doesn't need the nuance of research-grade embeddings — it needs to find the right conversation about Kubernetes when you ask about deploying containers. The lightweight model handles this reliably.

Why Not Just Use a Vector Database?

The argument for dedicated vector databases assumes a scale that agent memory doesn't reach.

Pinecone shines at querying across billions of vectors with complex filtering. A personal AI agent accumulates maybe 100,000 memory entries over a year of heavy use. A team-shared agent might reach a million. At these scales, SQLite is faster than any external database because the overhead of the external service exceeds the computational savings.

The break-even point — where a dedicated vector database outperforms SQLite — is roughly 10 million entries with complex multi-vector queries. No individual or small-team AI agent hits that threshold.

For users who do need scale, ZeroClaw's memory interface is a trait. Swap the SQLite implementation for a Postgres+pgvector backend or a dedicated vector database without changing anything else. But start with SQLite. You'll probably never need to switch.

The Single-File Advantage

There's a practical benefit that's easy to overlook: the entire agent's memory is a single file.

•Backup: `cp memory.db memory.db.backup`
•Migration: copy the file to a new machine
•Inspection: open it with any SQLite client and run SQL queries against your agent's memory
•Deletion: `rm memory.db` and your agent has a clean slate
•Encryption: wrap it with SQLCipher for at-rest encryption

No database server to manage. No connection strings. No schema migrations. No backup scripts that need a running service. The file is the database, and the database is the memory.

For edge deployments on a Raspberry Pi, this is the difference between "it works" and "I need to also run and manage a database service." On a 4GB Pi, the 380MB that ChromaDB needs is a dealbreaker. The 45MB that SQLite needs is nothing.

Simple things should be simple. Agent memory is a simple thing. SQLite keeps it that way.