Vector search

Optional semantic memory via sqlite-vec — a virtual table inside the same SQLite file used for long-term memory. No separate service, no extra process, no migration.

Source: crates/memory/src/vector.rs, crates/memory/src/embedding/.

Turning it on

vector:
  enabled: true
  backend: sqlite-vec
  default_recall_mode: hybrid
  embedding:
    provider: http
    base_url: https://api.openai.com/v1
    model: text-embedding-3-small
    api_key: ${OPENAI_API_KEY}
    dimensions: 1536
    timeout_secs: 30

Dimension must match the model output:

ModelDimensions
text-embedding-3-small1536
text-embedding-3-large3072
nomic-embed-text768
Gemini text-embedding-004768

A mismatch aborts startup with an explicit error. If you already have vectors at a different dimension, you must delete the DB (or the vector table) and rebuild the index.

Storage

CREATE VIRTUAL TABLE vec_memories USING vec0(
  memory_id TEXT PRIMARY KEY,
  embedding FLOAT[<dimensions>]
);

The virtual table lives in the same SQLite file as memories. A join on memory_id brings you back the content and tags.

Embedding provider

#![allow(unused)]
fn main() {
trait EmbeddingProvider {
    fn dimension(&self) -> usize;
    async fn embed(&self, texts: &[String]) -> Result<Vec<Vec<f32>>>;
}
}

Phase 5.4 ships one provider: http — any OpenAI-compatible /embeddings endpoint. That covers OpenAI, Gemini (via its API), Ollama, LM Studio, and self-hosted inference.

Local-only providers (fastembed, candle) are intentional follow-ups — the HTTP provider is enough to unblock everything downstream.

Recall modes

Set the default in memory.yaml and override per tool call with the mode argument.

keyword — FTS5 + concept expansion

flowchart LR
    Q[query] --> CT[derive 3 concept tags]
    Q --> M[FTS5 MATCH<br/>query OR tag1 OR tag2 OR tag3]
    CT --> M
    M --> R[rank by BM25]
    R --> RES[top N]
  • Fast, no embedding cost
  • Misses semantic neighbors that don't share vocabulary
  • The extra concept tags are auto-derived from the query and help narrow down concept matches

vector — nearest-neighbor

flowchart LR
    Q[query] --> EMB[embed]
    EMB --> VEC[vec_memories<br/>MATCH k=N*2]
    VEC --> JOIN[join memories<br/>filter by agent_id]
    JOIN --> RES[top N by distance]
  • Catches paraphrases and cross-vocabulary matches
  • Embedding request on every call — watch costs and latency
  • Falls back to keyword on provider error (via hybrid) — not on pure vector mode, where errors surface

hybrid — Reciprocal Rank Fusion

The default recommendation. Runs both keyword and vector, then fuses ranks with the RRF formula 1 / (K + rank + 1) where K = 60:

flowchart LR
    Q[query] --> K[keyword search]
    Q --> V[vector search]
    K --> RRF[RRF fusion<br/>K=60]
    V --> RRF
    RRF --> RES[top N by fused score]

Vector errors degrade gracefully to keyword-only without raising.

Tool interaction

The memory tool takes an optional mode param:

{
  "action": "recall",
  "query": "what's the client's address?",
  "limit": 5,
  "mode": "hybrid"
}

If omitted, default_recall_mode is used.

Cost and latency profile

ModePer recall
keyword1 SQL query, no LLM call
vector1 embedding HTTP call + 1 SQL query
hybrid1 embedding HTTP call + 2 SQL queries + fusion

For high-throughput agents that recall on every turn, start with keyword and upgrade to hybrid only where you see miss rate matter.

Gotchas

  • Changing embedding model = full reindex. The dimension check catches the obvious case, but even same-dimension model swaps produce semantically different vectors; the old index becomes stale.
  • sqlite3_auto_extension registers once per process. Not a problem in production, but test suites that instantiate multiple SQLite connections across tests may hit edge cases.
  • Vector returns distance, not similarity. Lower is closer. Hybrid fusion normalizes across both, so callers don't see this directly unless they bypass the tool.