Introduction
nexo-rs is a Rust framework for building multi-agent LLM systems that live on real messaging channels — WhatsApp, Telegram, email — instead of a chat webapp. Event-driven over NATS, per-agent tool sandboxes, drop-in configuration for private vs. public agents.
One process, many agents, many channels. Kate handles your personal Telegram; Ana works the WhatsApp sales line; a cron-style poller sweeps Gmail for leads — all sharing one broker, one tool registry, and one memory layer.
Single binary, ~34 MB. No Node, no npm, no Docker required. Stripped: 29 MB. Gzipped: 13 MB. Runs on a fresh VPS, on Termux without root, or as a systemd unit. The closest reference point is OpenClaw (TypeScript, Node): nexo-rs trades JS familiarity for a single static binary, a fault-tolerant NATS broker layer, per-agent capability sandboxes, durable workflows, secrets audit, and Termux-first portability — see vs OpenClaw for the full side-by-side.
flowchart LR
WA[WhatsApp] --> NATS[(NATS broker)]
TG[Telegram] --> NATS
MAIL[Email / Gmail poller] --> NATS
BROWSER[Browser CDP] --> NATS
NATS --> ANA[Agent: Ana]
NATS --> KATE[Agent: Kate]
NATS --> OPS[Agent: ops-bot]
ANA --> TOOLS[Tools & extensions]
KATE --> TOOLS
OPS --> TOOLS
TOOLS --> MEM[(Memory: SQLite + sqlite-vec)]
TOOLS --> LLM{{LLM providers}}
Why it exists
Most "agent frameworks" assume one LLM talking to one user through one UI. Real deployments are not shaped that way:
- Several agents with different personas, models, and skills
- Multiple channels (WA + Telegram + mail) feeding the same agents
- Business logic that is not LLM-driven (scheduled tasks, regex email triage, lead notifications) running next to the LLM loop
- Private prompts and pricing tables alongside an open-source core
nexo-rs is opinionated toward that shape.
What's in the box
| Area | What ships |
|---|---|
| Runtime | Multi-agent core, SessionManager, Heartbeat, CircuitBreaker |
| Broker | NATS (async-nats = 0.35) + disk queue + DLQ + backpressure |
| LLMs | MiniMax M2.5 (primary), Anthropic (OAuth + API), OpenAI-compat, Gemini |
| Plugins | WhatsApp, Telegram, Email, Browser (CDP), Google (Gmail/Calendar/Drive/Sheets) |
| Memory | Short-term in-memory, long-term SQLite, vector via sqlite-vec |
| Extensions | TOML manifest, stdio + NATS runtimes, CLI, 22 skills shipped |
| MCP | Client (stdio + HTTP), agent as MCP server, hot-reload |
| TaskFlow | Durable multi-step flow runtime with wait/resume |
| Soul | Identity, MEMORY.md, dreaming, workspace-git, transcripts |
Who it is for
- Developers who want to run real agents — not a ChatGPT demo with retrieval.
- Multi-tenant single-install — several agents, several channels, isolated by config.
- Fault-tolerance-first teams — disk queue, DLQ, circuit breakers, single-instance lock, no message drop on reconnect.
- Anyone extending with their own stack — stdio extensions in any language, MCP, drop-in private agents.
What it is not
- Not a chatbot, not a webapp. It has no UI of its own.
- Not a replacement for LangChain/LlamaIndex as a "primitives library". It is an operational runtime.
- Not a channel-abstraction layer. WhatsApp behaves like WhatsApp, Telegram like Telegram. The runtime surfaces channels, not uniforms them.
Next
- Installation
- Quick start
- Architecture overview
- API reference (rustdoc) — every public type in the workspace
Install nexo-rs (LLM-friendly guide)
Share this URL with any AI assistant to teach it how to install nexo-rs end-to-end on Linux or Termux:
https://lordmacu.github.io/nexo-rs/install-for-ai.htmlThe page is intentionally linear: copy-paste each block in order. No menus to navigate, no marketing, every command is deterministic.
What you are installing
- nexo-rs — a Rust multi-agent framework. Single
agentbinary- a
cloudflaredtunnel for the web admin UI.
- a
- Source: https://github.com/lordmacu/nexo-rs
- Docs (full): https://lordmacu.github.io/nexo-rs/
- License: dual MIT / Apache-2.0
Pick your platform
- Linux (Debian / Ubuntu / Arch / Fedora) → §A
- Termux (Android, no root) → §B
Skip the section that doesn't apply.
§A — Linux install
A.1. System packages
Debian / Ubuntu:
sudo apt update
sudo apt install -y build-essential pkg-config libsqlite3-dev git curl
Arch:
sudo pacman -Syu --needed base-devel pkgconf sqlite git curl
Fedora:
sudo dnf install -y @development-tools pkgconf-pkg-config sqlite-devel git curl
A.2. Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source "$HOME/.cargo/env"
rustup component add rustfmt clippy
A.3. Clone + build
git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
cargo build --release --bin agent
The compiled binary is at ./target/release/agent. Copy it into
PATH (optional):
sudo install -m 0755 target/release/agent /usr/local/bin/agent
A.4. First-run wizard
agent setup
Follow the interactive prompts. Defaults are sane. The wizard
writes config/agents.d/<your-agent>.yaml, IDENTITY.md,
SOUL.md, and any channel YAMLs you opt into.
A.5. Run
agent
Or, for the web admin (loopback HTTP + Cloudflare tunnel):
agent admin
The admin command prints a one-time URL + password to stdout. Open the URL, log in, and configure from the browser.
A.6. (Optional) systemd service
sudo useradd -r -s /bin/false -d /srv/nexo-rs nexo
sudo mkdir -p /srv/nexo-rs
sudo cp -r config target/release/agent /srv/nexo-rs/
sudo chown -R nexo:nexo /srv/nexo-rs
sudo tee /etc/systemd/system/nexo-rs.service > /dev/null <<'EOF'
[Unit]
Description=nexo-rs agent
After=network.target
[Service]
Type=simple
User=nexo
WorkingDirectory=/srv/nexo-rs
ExecStart=/srv/nexo-rs/agent --config /srv/nexo-rs/config
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now nexo-rs
Logs: journalctl -u nexo-rs -f.
A.7. (Optional) NATS broker
A single-process install does not need NATS — the runtime falls back to in-process channels. Add NATS only when scaling beyond one host:
curl -L -o /tmp/nats.tar.gz \
https://github.com/nats-io/nats-server/releases/download/v2.10.20/nats-server-v2.10.20-linux-amd64.tar.gz
tar -xzf /tmp/nats.tar.gz -C /tmp
sudo mv /tmp/nats-server-*/nats-server /usr/local/bin/
sudo systemctl enable --now nats-server # if you have a unit file
Then in config/broker.yaml set kind: nats and url: nats://127.0.0.1:4222.
§B — Termux install (Android, no root)
B.1. Termux from F-Droid
Install Termux from https://f-droid.org/en/packages/com.termux/. Do not install from the Google Play Store — that build is outdated.
Open Termux. Then:
pkg update
pkg upgrade -y
B.2. Build dependencies
pkg install -y rust git curl sqlite openssl clang pkg-config
Optional extras (only the ones you'll use):
# media transcoding + OCR + youtube downloads
pkg install -y ffmpeg tesseract yt-dlp
# tmux for long-running tunnels and ssh
pkg install -y tmux openssh
# headless Chromium for the browser plugin
pkg install -y tur-repo
pkg install -y chromium
# Termux:API for sensors / SMS / clipboard
pkg install -y termux-api
# (also install the Termux:API companion app from F-Droid)
B.3. Clone + build
cd ~
git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
cargo build --release --bin agent
B.4. First-run wizard
./target/release/agent setup
B.5. Run
./target/release/agent
Or with the admin UI (the cloudflared tunnel works on Termux):
./target/release/agent admin
B.6. Keep running with the screen off
Termux apps get killed on doze unless you disable battery optimizations and acquire a wake lock:
-
Disable optimizations: Android Settings → Apps → Termux → Battery → Unrestricted.
-
Wake lock: in Termux, type:
termux-wake-lock -
(Optional) auto-restart on boot: install Termux:Boot from F-Droid, then create
~/.termux/boot/00-nexo-rs:mkdir -p ~/.termux/boot cat > ~/.termux/boot/00-nexo-rs <<'EOF' #!/data/data/com.termux/files/usr/bin/sh termux-wake-lock cd ~/nexo-rs ./target/release/agent --config ./config >> ~/nexo-rs/agent.log 2>&1 EOF chmod +x ~/.termux/boot/00-nexo-rs
B.7. Termux-specific tip — Chromium flags
The browser plugin (plugins: [browser]) needs the right Chromium
launch flags on Termux. The defaults already cover Android; nothing
extra to set. Just make sure chromium is on PATH (it is, after
pkg install chromium).
Config layout (both platforms)
After agent setup runs, the project tree looks like:
nexo-rs/
├── config/
│ ├── agents.yaml # opt-in dev defaults
│ ├── agents.d/ # your agents land here
│ │ └── <slug>.yaml
│ ├── broker.yaml # NATS or local
│ ├── llm.yaml # provider keys + model
│ └── plugins/ # one YAML per channel plugin
├── secrets/ # mode 0600 token files (gitignored)
├── data/ # SQLite databases (memory, taskflow, transcripts)
├── target/release/agent # the built binary
└── agent.log # if you redirected stdout
Edit YAML by hand or use the web admin (agent admin).
Troubleshooting
cargo buildfails with linker errors on Linux — installbuild-essentialandpkg-config(§A.1).cargo buildhitsout of memoryon Termux — close other apps, or build with one job:cargo build --release -j 1.agentexits immediately withfailed to load config— runagent setupfirst; the wizard creates the missing files.- WhatsApp QR pairing fails on Termux — make sure the device is on the same network as your phone, then open the QR pairing URL the daemon prints.
- Admin tunnel URL doesn't respond — Cloudflare's quick tunnel
occasionally rotates; restart
agent adminand copy the new URL.
Useful commands after install
agent --help # all subcommands
agent doctor capabilities --json # which env toggles are armed
agent setup doctor # audit configured secrets
agent ext doctor --json # extension health
agent flow list # taskflow admin
agent dlq list # dead-letter queue
Full reference: https://lordmacu.github.io/nexo-rs/cli/reference.html
When asking an AI for help
Paste this URL into your prompt:
Install nexo-rs from https://lordmacu.github.io/nexo-rs/install-for-ai.html
on this machine. The OS is <Linux distro / Termux>. Stop after each
section to confirm output looks right.
The page above is the canonical, copy-paste-friendly install path. The full mdBook (https://lordmacu.github.io/nexo-rs/) covers the same ground in more depth — link there once the agent is up.
Installation
Pick the channel that matches your environment. Every channel
produces the same nexo binary; the differences are in how it
gets onto your machine and which dependencies come bundled.
Channel matrix
| Channel | When to pick it | Time to first run | Bundled runtime tools |
|---|---|---|---|
| Docker (GHCR) | Production, CI, "just works" | ~30 s | Chrome, Chromium, cloudflared, ffmpeg, tesseract, yt-dlp |
| Nix flake | NixOS, reproducible dev shell | ~3-5 min cold | None (system-level) |
| Native (no Docker) | Bare-metal Linux / macOS, full control | ~10-15 min | None (apt / brew / pacman) |
| Termux | Phone-hosted personal agent | ~15-20 min | None (pkg install) |
| From source | Contributors | ~5 min after toolchain | None |
Quickest path — Docker
docker pull ghcr.io/lordmacu/nexo-rs:latest
docker run --rm \
-v $(pwd)/config:/app/config:ro \
-v $(pwd)/data:/app/data \
-p 8080:8080 -p 9090:9090 \
ghcr.io/lordmacu/nexo-rs:latest --help
The image is multi-arch (linux/amd64 + linux/arm64), built
fresh on every push to main and every v* tag, with SBOM and
SLSA provenance attestations. Full guide: Docker.
Build from source
For contributors and operators who want to track main directly:
git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
cargo build --release --bin nexo
./target/release/nexo --help
The workspace compiles 22 crates and produces the nexo binary
plus a few smoke-test bins (browser-test,
integration-browser-check, llm_smoke). Toolchain is pinned to
Rust 1.80 (MSRV) via rust-toolchain.toml — no manual channel
selection needed.
Prerequisites
- Rust 1.80+ (
rustuprecommended) - NATS running locally or reachable over the network — for
development:
Production setup: see broker.yaml.docker run -p 4222:4222 nats:2.10-alpine - Git (the memory subsystem uses per-agent workspace-git)
- Chrome / Chromium (only if you plan to use the browser plugin)
Verification
./target/release/nexo --version
cargo test --workspace --lib
nexo --version prints the build provenance line (commit + build
timestamp) so a bug report carries enough context to reproduce.
Bootstrap script
For native or Termux installs, ./scripts/bootstrap.sh automates
the whole process — installs the system deps, downloads NATS if
not present, scaffolds config/, and runs the setup wizard.
./scripts/bootstrap.sh # interactive
./scripts/bootstrap.sh --yes # accept all defaults
The script auto-detects Termux ($PREFIX set) and switches to
pkg install + broker.type: local so you don't need root or
NATS on a phone.
Next steps
- Quick start — first agent running in five minutes
- Setup wizard — pair channels and wire secrets
- Docker — compose stack, secrets, GHCR pulls
- Nix flake —
nix run, dev shell - Native install — detailed no-Docker setup
- Termux install — phone-hosted personal agent
Native install (no Docker)
If you'd rather run nexo-rs directly on a Linux / macOS host — development loop, single-machine deploy, restricted container environment — this page walks through every step and names the bootstrap script that automates it.
Fast path
git clone git@github.com:lordmacu/nexo-rs.git
cd nexo-rs
./scripts/bootstrap.sh
scripts/bootstrap.sh verifies prerequisites, installs a local
NATS, creates the runtime directories, stages example configs, and
builds the agent binary. Re-runnable — each step is idempotent.
Keep reading for what it actually does (and what to do when a step needs manual intervention).
Prerequisites
| Tool | Required for | Notes |
|---|---|---|
| Rust (stable, edition 2021) | building the binaries | rust-toolchain.toml pins the channel |
| Git | cloning + per-agent workspace-git | default on most hosts |
| NATS ≥ 2.10 | the broker | binary or dev docker container is fine |
| SQLite ≥ 3.38 | memory + broker disk queue | ships with most distros |
| Chrome / Chromium | browser plugin (optional) | skip if you don't use the browser plugin |
| ffmpeg + ffprobe | media-related skills (optional) | skip if you don't ship those skills |
| yt-dlp / tesseract / tmux / ssh | individual skills (optional) | each skill declares its requires.bins |
On Ubuntu / Debian:
sudo apt update
sudo apt install -y build-essential pkg-config libsqlite3-dev git curl
On macOS:
xcode-select --install
brew install sqlite git
Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"
rustup component add rustfmt clippy
The repo's rust-toolchain.toml pins the channel; no manual version
pick is needed.
Install NATS
Pick one path:
Option A — native NATS server
# Linux x86_64
curl -L -o /tmp/nats.tar.gz \
https://github.com/nats-io/nats-server/releases/download/v2.10.20/nats-server-v2.10.20-linux-amd64.tar.gz
tar -xzf /tmp/nats.tar.gz -C /tmp
sudo mv /tmp/nats-server-*/nats-server /usr/local/bin/
For macOS: brew install nats-server.
Start it:
nats-server -js # foreground
nats-server -js -D # foreground with debug
# or, as a systemd service: see below
Option B — dev throwaway via Docker
Even on a "no-Docker" box, a single short-lived container for the broker is often fine:
docker run -d --name nexo-nats --restart unless-stopped \
-p 4222:4222 -p 8222:8222 nats:2.10-alpine
This is the same broker the compose stack would use; only the broker itself runs in a container.
Systemd unit (Linux, production)
/etc/systemd/system/nats-server.service:
[Unit]
Description=NATS Server
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/nats-server -js
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now nats-server
Build nexo-rs
git clone git@github.com:lordmacu/nexo-rs.git
cd nexo-rs
cargo build --release
The output is ./target/release/agent. Symlink it into $PATH if
you want:
sudo ln -sf "$(pwd)/target/release/agent" /usr/local/bin/agent
Prepare runtime directories
mkdir -p ./data/{queue,workspace,media,transcripts}
mkdir -p ./secrets # gitignored; holds API keys, nkey files, etc.
chmod 700 ./secrets # restrictive — the credential gauntlet checks this
Stage config
The repo ships config/*.yaml with safe defaults. Override whatever
you need:
# Optional: copy the ana sales agent template into the gitignored dir
cp config/agents.d/ana.example.yaml config/agents.d/ana.yaml
# Add an API key:
export MINIMAX_API_KEY=...
export MINIMAX_GROUP_ID=...
# or write to secrets/ files referenced from config/llm.yaml via ${file:...}
See Configuration — layout for the full reference.
Pair channels and set secrets
./target/release/agent setup
The wizard pairs WhatsApp / Telegram / Google / LLM credentials interactively. See Setup wizard.
First run
./target/release/agent --config ./config
Watch the startup summary — it tells you exactly which plugins loaded, which extensions were skipped and why, and whether the broker is reachable. If anything's missing, the log line names the specific file or env var to fix.
Run as a systemd service
/etc/systemd/system/nexo-rs.service:
[Unit]
Description=nexo-rs agent
Requires=nats-server.service
After=nats-server.service
[Service]
Type=simple
User=nexo
Group=nexo
WorkingDirectory=/srv/nexo-rs
Environment=RUST_LOG=info
Environment=AGENT_ENV=production
ExecStart=/usr/local/bin/agent --config /srv/nexo-rs/config
Restart=on-failure
RestartSec=5
# Optional: restrict where the agent can write
ReadWritePaths=/srv/nexo-rs/data /srv/nexo-rs/secrets
[Install]
WantedBy=multi-user.target
sudo useradd -r -s /bin/false -d /srv/nexo-rs nexo
sudo chown -R nexo:nexo /srv/nexo-rs
sudo systemctl daemon-reload
sudo systemctl enable --now nexo-rs
Logs:
journalctl -u nexo-rs -f
macOS launchd
~/Library/LaunchAgents/dev.nexo-rs.agent.plist:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key> <string>dev.nexo-rs.agent</string>
<key>WorkingDirectory</key><string>/Users/you/nexo-rs</string>
<key>ProgramArguments</key>
<array>
<string>/Users/you/nexo-rs/target/release/agent</string>
<string>--config</string><string>/Users/you/nexo-rs/config</string>
</array>
<key>EnvironmentVariables</key>
<dict>
<key>RUST_LOG</key><string>info</string>
</dict>
<key>RunAtLoad</key> <true/>
<key>KeepAlive</key> <true/>
</dict>
</plist>
launchctl load -w ~/Library/LaunchAgents/dev.nexo-rs.agent.plist
launchctl start dev.nexo-rs.agent
Verify
agent status # lists running agents
curl localhost:8080/ready # readiness
curl localhost:9090/metrics # Prometheus metrics
See Metrics + health.
Upgrading
cd nexo-rs
git pull
cargo build --release
sudo systemctl restart nexo-rs # Linux
# or: launchctl kickstart -k gui/$UID/dev.nexo-rs.agent # macOS
The graceful shutdown sequence drains in-flight work and persists the disk queue before exit.
Uninstalling
sudo systemctl disable --now nexo-rs nats-server
sudo rm /etc/systemd/system/{nexo-rs,nats-server}.service
sudo rm /usr/local/bin/{agent,nats-server}
sudo userdel nexo
rm -rf /srv/nexo-rs
See also
- Quick start — the five-minute dev loop
- Docker — container path for comparison
- Setup wizard
- Configuration
Termux (Android) install
Run nexo-rs directly on an Android phone under Termux. No Docker, no server — a self-hosted agent in your pocket.
Use this path for a personal agent (one phone, one WhatsApp, one Telegram). For multi-tenant / multi-process deployments the regular Linux setup on a server is the right shape.
Quickest path — pre-built .deb
Once a v* release is published (recipe lives in
packaging/termux/build.sh), download the asset and install with
one command:
# Inside Termux on the phone:
curl -LO https://github.com/lordmacu/nexo-rs/releases/latest/download/nexo-rs_aarch64.deb
pkg install ./nexo-rs_aarch64.deb
The deb pulls the runtime deps Termux already ships (libsqlite,
openssl, ffmpeg, tesseract, python, yt-dlp). Its
postinst scaffolds ~/.nexo/{data,secret} and prints the next
steps. Skip the build-from-source section below if this works.
Root vs non-root
Everything in this guide runs without root. You do not need to root your phone to self-host nexo-rs on it.
Root only unlocks extras:
| Scenario | Needs root? |
|---|---|
| Build + run the agent daemon | ❌ no |
| Pair WhatsApp, Telegram, Google | ❌ no |
Local broker (broker.type: local) | ❌ no |
| Native NATS Go binary | ❌ no (installs to $PREFIX/bin) |
termux-wake-lock, Termux:Boot autostart | ❌ no |
Install skills from pkg (ffmpeg, tesseract, yt-dlp) | ❌ no |
| MCP client / server mode | ❌ no |
Browser plugin via cdp_url to a chromium you launched yourself | ❌ no |
Docker compose stack (via proot-distro or Linux Deploy) | ✅ yes |
| SELinux permissive (if Chromium sandbox misbehaves) | ✅ yes |
| Running multiple proot-distro containers side by side | ✅ yes |
| Bypass Android's battery optimizer more aggressively | ✅ yes |
Short version: don't root just for nexo-rs. Root if you want the full compose stack in a Linux-Deploy chroot, otherwise skip it.
What works
| Area | Status |
|---|---|
| Core runtime, memory, TaskFlow, dreaming | ✅ full |
Broker: type: local (in-process) or native NATS Go binary | ✅ full |
| LLM providers (MiniMax / Anthropic / OpenAI-compat / Gemini) | ✅ all rustls-based |
| WhatsApp plugin (pure Rust + Signal Protocol) | ✅ pairing via Unicode QR |
| Telegram plugin | ✅ Bot API over HTTP |
| Gmail / Google plugin + gmail-poller | ✅ OAuth over HTTP |
| Extensions (stdio + NATS) | ✅ spawn works |
| Skills: fetch-url, dns-tools, rss, weather, wikipedia, pdf-extract, brave-search, wolfram-alpha, summarize, translate | ✅ pure Rust |
| MCP client + server | ✅ stdio + HTTP |
| Health / metrics / admin HTTP servers (8080 / 9090 / 9091) | ✅ unprivileged ports |
What needs a tweak
| Thing | Workaround |
|---|---|
| Service manager (no systemd) | termux-services (runit) or tmux + nohup |
| Run at boot | install the Termux:Boot app + drop a script in ~/.termux/boot/ |
| Survives screen-off | termux-wake-lock (from the Termux:API add-on) before running the agent |
| Browser plugin (Chrome/Chromium) | use cdp_url: to a chromium you start manually with --no-sandbox --disable-dev-shm-usage; or disabled: [browser] if you don't need it |
| Secrets file permission gauntlet | export CHAT_AUTH_SKIP_PERM_CHECK=1 (Android filesystem perms model differs) |
| WhatsApp public tunnel (cloudflared) | skip the public tunnel; pair locally via Unicode QR rendered on the terminal |
| Docker / compose | use broker.type: local or native NATS binary — no containers involved |
Prerequisites
From a fresh Termux install:
pkg update
pkg install -y rust git curl sqlite openssl clang pkg-config
Optional (enables specific skills):
pkg install -y ffmpeg tesseract yt-dlp tmux openssh
Optional (browser plugin):
pkg install -y tur-repo
pkg install -y chromium
Optional (run in background without the terminal session alive):
pkg install -y termux-services termux-api
# install the companion app "Termux:API" from F-Droid
Fast path — bootstrap script
The repo's scripts/bootstrap.sh auto-detects Termux and picks the
right defaults:
git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
./scripts/bootstrap.sh --yes
What it does on Termux:
- Verifies
rust,git,curl,sqlitefrompkg - Downloads the static
nats-serverGo binary (arm64), drops it in$PREFIX/bin/— or skip with--nats=skipto use the local broker - Creates
./data/**and./secrets/(with Termux-compatible perms) - Stages
config/agents.d/*.example.yaml→*.yamlif missing - Runs
cargo build --release(grab a coffee — ~20–40 min on phone hardware) - Optionally launches
agent setupto pair channels
Expect a ~60–100 MB final binary.
Manual install
1. Install Rust and deps
pkg install -y rust git curl sqlite openssl clang pkg-config
2. Clone and build
git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
cargo build --release --bin agent
3. Broker
Option A — local (simplest):
# config/broker.yaml
broker:
type: local
persistence:
enabled: true
path: ./data/queue
No NATS binary needed. All pub/sub stays in-process.
Option B — native NATS binary:
curl -L -o /tmp/nats.tar.gz \
https://github.com/nats-io/nats-server/releases/download/v2.10.20/nats-server-v2.10.20-linux-arm64.tar.gz
tar -xzf /tmp/nats.tar.gz -C /tmp
install -m 0755 "$(find /tmp -name nats-server -type f | head -1)" \
$PREFIX/bin/nats-server
nats-server -js &
Go binaries are static and work on Termux without libc surprises.
4. Runtime directories and secrets
mkdir -p ./data/{queue,workspace,media,transcripts} ./secrets
Termux stores files under /data/data/com.termux/files/home by
default. Avoid pointing config paths at /sdcard — Android's
scoped-storage model breaks directory permissions there.
5. Relax the credentials perm check
Android's filesystem doesn't support the same permission bits as Linux in the same way. The credentials gauntlet would refuse to boot with false-positive warnings:
export CHAT_AUTH_SKIP_PERM_CHECK=1
Add it to ~/.termux/termux.properties or a wrapper shell script
so it's set every time.
6. Launch the wizard
./target/release/agent setup
For the WhatsApp pairing step, the wizard renders the QR as Unicode blocks directly in the terminal — scan from the phone's WhatsApp app (Settings → Linked Devices). No public tunnel needed.
7. Run the agent
termux-wake-lock # keep CPU awake even with screen off
./target/release/agent --config ./config
Staying alive in the background
Android's aggressive task killing is the biggest operational surprise. Pick one:
A — termux-wake-lock + foreground notification
termux-wake-lock
# agent in foreground:
./target/release/agent --config ./config
The wake-lock persists until you run termux-wake-unlock or kill
the session. Minimum friction, most reliable.
B — termux-services (runit)
pkg install -y termux-services
sv-enable termux-services
mkdir -p ~/.config/service/nexo-rs
cat > ~/.config/service/nexo-rs/run <<'EOF'
#!/data/data/com.termux/files/usr/bin/sh
cd /data/data/com.termux/files/home/nexo-rs
export CHAT_AUTH_SKIP_PERM_CHECK=1
exec ./target/release/agent --config ./config 2>&1
EOF
chmod +x ~/.config/service/nexo-rs/run
sv up nexo-rs
sv status nexo-rs
C — Termux:Boot (start on device boot)
Install the Termux:Boot app from F-Droid, then:
mkdir -p ~/.termux/boot
cat > ~/.termux/boot/start-agent <<'EOF'
#!/data/data/com.termux/files/usr/bin/sh
termux-wake-lock
cd /data/data/com.termux/files/home/nexo-rs
export CHAT_AUTH_SKIP_PERM_CHECK=1
exec ./target/release/agent --config ./config
EOF
chmod +x ~/.termux/boot/start-agent
Disabling the browser plugin
If you don't need headless browser control (most phone-hosted
agents don't), drop it from config/extensions.yaml:
extensions:
disabled: [browser]
Or, if you have tur-repo chromium installed and want nexo-rs to
spawn it, use the browser.args field to forward the flags Termux
needs:
# config/plugins/browser.yaml
browser:
headless: true
executable: /data/data/com.termux/files/usr/bin/chromium
args:
- --no-sandbox
- --disable-dev-shm-usage
- --disable-gpu
The built-in launch flags still apply; args is appended after
them so you can also override any of the built-ins (Chrome's CLI
parser uses last-wins).
Alternative: launch chromium yourself and attach via cdp_url:
# config/plugins/browser.yaml
browser:
# Start chromium yourself with:
# chromium --headless --no-sandbox --disable-dev-shm-usage \
# --disable-gpu --remote-debugging-port=9222 &
cdp_url: http://127.0.0.1:9222
When cdp_url is set, args is ignored — nexo-rs doesn't spawn
Chrome, only connects to yours.
Verify
curl localhost:8080/ready
curl localhost:9090/metrics
./target/release/agent status
Upgrading
cd ~/nexo-rs
git pull
cargo build --release
# restart under whichever method you picked (wake-lock / runit / Boot)
Android's graceful shutdown still runs on SIGTERM — closing the Termux session or killing the process drains the disk queue cleanly.
See also
- Installation — shared prerequisites
- Native install (no Docker) — the Linux/macOS path the Termux recipe is a sibling of
- Plugins — Browser
- Config — broker.yaml
Install — Nix
Nexo ships a Nix flake that pins the toolchain (Rust 1.80, MSRV) and the native build deps so a contributor or operator can go from clean shell to working binary without touching the host system.
Run without installing
nix run github:lordmacu/nexo-rs -- --help
First invocation builds from source (~3-5 min on cold cache); subsequent runs hit the local Nix store.
Build a local binary
nix build github:lordmacu/nexo-rs
./result/bin/nexo --help
The binary is the same nexo produced by cargo build --release --bin nexo. Outputs a result/ symlink the operator can link
into /usr/local/bin/ or copy elsewhere.
Contributor dev shell
git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
nix develop
Drops you into a shell with:
rustc1.80 +cargo+clippy+rustfmt+rust-srccargo-edit,cargo-watch,cargo-nextest,cargo-denymdbook+mdbook-mermaid(formdbook build docs)sqlite,pkg-config,openssl,libgit2(build deps)
RUST_LOG=info is exported by default. The toolchain version is
pinned in flake.nix — bump in lockstep with
[workspace.package].rust-version in Cargo.toml.
What the flake does NOT install
The nexo binary alone is not enough for full functionality.
Runtime tools the channel plugins shell out to live at the system
level, not in the flake:
- Chrome / Chromium — required by the browser plugin
cloudflared— used by the tunnel pluginffmpeg— media transcoding for WhatsApp voice notestesseract-ocr— OCR skillyt-dlp— theyt-dlpextension
Operators install these via their distro's package manager. The native install guide lists the apt / pacman / brew commands. The Docker image bundles all of them — that's the path of least friction for a "just works" deploy.
Pinning a release
Once v* tags are published, pin to a specific release:
nix run github:lordmacu/nexo-rs/v0.1.1 -- --help
Or in a flake input:
{
inputs.nexo-rs.url = "github:lordmacu/nexo-rs/v0.1.1";
}
Verifying the build
nix flake check
Runs nix flake check — verifies the flake metadata, evaluates
all outputs (packages, apps, devShells, formatter) without
actually building. Useful in CI to catch flake regressions early.
Troubleshooting
- "experimental feature 'flakes' is disabled" — add to
~/.config/nix/nix.conf:experimental-features = nix-command flakes - First build is very slow — the build re-fetches and re-compiles
every cargo dependency in the sandbox. Subsequent builds are
cached. A future Phase 27.x will publish a
cachixcache sonix buildpulls the binary directly. - Build fails on macOS arm64 —
git2-rsoccasionally lags on Apple silicon. Workaround: build the binary inside the Docker image instead (see Docker).
Quick start
Minimum viable agent running in five minutes. Covers: NATS, one agent, one channel, one LLM key.
1. Start NATS
docker run -d --name nexo-nats -p 4222:4222 nats:2.10-alpine
2. Build the binary
git clone git@github.com:lordmacu/nexo-rs.git
cd nexo-rs
cargo build --release
3. Provide an LLM key
Pick one provider to get started. MiniMax M2.5 is the primary:
export MINIMAX_API_KEY=your-key
export MINIMAX_GROUP_ID=your-group-id
Or Anthropic:
export ANTHROPIC_API_KEY=sk-ant-...
The shipped config/llm.yaml reads both via ${ENV_VAR}.
4. Run the setup wizard
./target/release/agent setup
The wizard walks you through:
- Choosing a default LLM provider
- Pairing any channels you want (WhatsApp QR, Telegram bot token, Google OAuth)
- Writing secrets into
./secrets/(gitignored)
See Setup wizard for the full step-by-step.
5. Run the agent
./target/release/agent --config ./config
First boot emits a startup summary listing:
- which plugins loaded
- which extensions were discovered / skipped (and why)
- which LLM providers are wired
- the NATS connection state
If anything is missing, the log line tells you exactly what to fix.
6. Talk to it
If you paired Telegram, send a message to the bot. If you paired WhatsApp, send a message to the paired number. The agent replies via the same channel.
What you just ran
sequenceDiagram
participant U as User
participant CH as Channel plugin
participant B as NATS
participant A as Agent runtime
participant L as LLM provider
U->>CH: Inbound message
CH->>B: publish plugin.inbound.<channel>
B->>A: deliver
A->>L: chat.completion(tools)
L-->>A: assistant turn
A->>B: publish plugin.outbound.<channel>
B->>CH: deliver
CH-->>U: Outbound reply
Next
- Setup wizard — every wizard step in detail
- Configuration layout
- Architecture overview
Setup wizard
The setup wizard is the recommended way to configure nexo-rs on a fresh install. It pairs channels, writes secrets, and patches the YAML config files so the runtime boots with everything it needs.
./target/release/agent setup
Run it from the repo root (or wherever your config/ directory lives).
What the wizard does
flowchart TD
START([agent setup]) --> MENU{Menu}
MENU --> LLM[LLM provider]
MENU --> WA[WhatsApp pairing]
MENU --> TG[Telegram bot]
MENU --> GOOG[Google OAuth]
MENU --> MEM[Memory DB location]
MENU --> INFRA[NATS + runtime]
MENU --> SKILLS[Enable / disable skills]
LLM --> WRITE1[Write secrets/<br/>patch llm.yaml]
WA --> QR[Scan QR<br/>write session dir]
TG --> TOKEN[Ask bot token<br/>write secret]
GOOG --> OAUTH[Open browser<br/>PKCE flow]
MEM --> WRITE2[Patch memory.yaml]
INFRA --> WRITE3[Patch broker.yaml]
SKILLS --> WRITE4[Patch extensions.yaml]
WRITE1 --> DONE([Done])
QR --> DONE
TOKEN --> DONE
OAUTH --> DONE
WRITE2 --> DONE
WRITE3 --> DONE
WRITE4 --> DONE
Every step is optional. You can run setup repeatedly — each section
is idempotent.
Steps in detail
LLM provider
Prompts for the default provider (MiniMax, Anthropic, OpenAI-compat,
Gemini). Writes the API key to ./secrets/<provider>_api_key.txt and
ensures config/llm.yaml references it via ${file:...} or the
corresponding env var.
WhatsApp pairing (multi-instance)
Per-agent. Asks which agent you are pairing and which instance label
to use (personal, work, …). Each instance gets its own session
dir under ./data/workspace/<agent>/whatsapp/<instance> and an
allow_agents list (defense-in-depth ACL). The wizard:
- Normalises
config/plugins/whatsapp.yamlto sequence form (legacy single-mapping entries are auto-converted on first edit). - Upserts the entry by instance label.
- Writes
credentials.whatsapp: <instance>on the chosen agent's YAML —agents.yamlif the agent lives there, otherwise the matchingagents.d/*.yaml. - Launches the pairing loop and renders the QR as Unicode blocks. Scan with WhatsApp → Settings → Linked Devices.
- Runs the credential gauntlet so any drift surfaces immediately.
Re-run the wizard once per number you want to pair; instance labels are append-friendly.
Telegram bot (multi-instance)
Same shape as WhatsApp. Asks for instance label (default
<agent>_bot) and bot token from @BotFather. Token lands at
./secrets/<instance>_telegram_token.txt with mode 0o600; the
YAML references it via ${file:...} so secrets never live in
telegram.yaml directly. Adds credentials.telegram: <instance>
on the agent.
Google OAuth
The wizard writes one entry per agent in
config/plugins/google-auth.yaml:
google_auth:
accounts:
- id: ana@google
agent_id: ana
client_id_path: ./secrets/ana_google_client_id.txt
client_secret_path: ./secrets/ana_google_client_secret.txt
token_path: ./secrets/ana_google_token.json
scopes: [https://www.googleapis.com/auth/gmail.modify]
Two consent flows are offered after the YAML is written:
- Device-code (default — works headless / over SSH): the wizard
prints
verification_url+ a 6-characteruser_code. Open the URL on any device, type the code, approve. The wizard pollsoauth2.googleapis.com/tokenuntil approval and persists the refresh_token attoken_path(mode0o600). - Skip and consent later via the
google_auth_startLLM tool — uses the loopback PKCE flow, requires a local browser.
Scopes are comma-separated at the prompt; defaults to
gmail.modify. Re-running with a different id adds a second
account; re-running with the same id overwrites in place.
Memory DB location
Lets you pick where the SQLite long-term memory file lives. Default is
./data/memory.db. Per-agent isolation is on by default — each agent
gets its own DB file under its workspace.
Infrastructure (NATS + runtime)
Asks for the NATS URL, optional user/password, and timeouts. Patches
config/broker.yaml.
Skills on/off
Lets you selectively disable shipped extensions you don't plan to use (reduces tool surface exposed to the LLM).
Files the wizard touches
| Target | What it writes |
|---|---|
config/llm.yaml | Provider entries, base_url, auth mode |
config/plugins/whatsapp.yaml | session_dir, media_dir |
config/plugins/telegram.yaml | token (via ${file:...}), allow-list |
config/plugins/google.yaml | OAuth bundle path, scopes |
config/memory.yaml | DB location |
config/broker.yaml | NATS URL, creds |
config/extensions.yaml | enabled/disabled list |
./secrets/* | Plaintext secret files (gitignored) |
Every YAML patch preserves existing keys and comments via the
yaml_patch module — your hand edits survive.
Re-running
Re-run agent setup as many times as you want. Paired channels are
detected and skipped unless you explicitly ask to re-pair. To wipe a
paired session:
./target/release/agent setup wipe whatsapp --agent ana
Troubleshooting
- WhatsApp QR expires too fast → the QR refreshes every ~20s; the wizard re-renders. Scan from the phone with a stable network.
- Google OAuth fails with
redirect_uri_mismatch→ the wizard binds to127.0.0.1:<port>; make sure your OAuth client allowshttp://127.0.0.1as a redirect URI. - NATS unreachable → the wizard will warn but still write config. The runtime's disk queue will drain once NATS comes back.
Agent-centric setup wizard
The hub menu's Configurar agente (canal, modelo, idioma, skills)
entry drops the operator into a per-agent submenu. Where the rest of
the wizard groups actions by service (Telegram, OpenAI, the
browser plugin), this submenu groups them by agent: pick one agent
up front, then mutate its model, language, channels, and skills from
a single dashboard. Every action reuses the existing channel / LLM /
skill flows underneath, so behavior stays in lockstep with the rest
of the wizard.
./target/release/agent setup
# → Configurar agente (canal, modelo, idioma, skills)
Dashboard
Agente: kate
Modelo: anthropic / claude-haiku-4-5 [creds ✔]
Idioma: es
Canales: ✔ telegram:default (bound)
✗ whatsapp:default (unbound)
Skills: 8 / 24 attached
The dashboard is recomputed from disk on every loop iteration, so the screen always reflects the most recent YAML state.
Action menu
After the dashboard renders, the operator picks one of:
| Action | Effect |
|---|---|
Modelo | Attach / detach / change the LLM provider + model name. Re-uses the LLM credential form when secrets are missing. |
Idioma | Pick from es / en / pt / fr / it / de, or clear the directive. |
Canales | Auth/Reauth, Bind, or Unbind a channel for this agent. Auth flows are the same services_imperative dispatchers the legacy menu uses. |
Skills | Multi-select against the skill catalog. Newly added skills with required secrets prompt for creds. |
← volver | Exit the submenu, return to the hub. |
YAML mutations
| Action | YAML path | Operation |
|---|---|---|
| Attach model | agents[<id>].model.provider, …model.model | upsert_agent_field |
| Detach model | agents[<id>].model | remove_agent_field |
| Set language | agents[<id>].language | upsert_agent_field |
| Clear language | agents[<id>].language | remove_agent_field |
| Bind channel | agents[<id>].plugins[], agents[<id>].inbound_bindings[] | append_agent_list_item (idempotent) |
| Unbind channel | agents[<id>].plugins[], agents[<id>].inbound_bindings[] | remove_agent_list_item by predicate |
| Replace skills | agents[<id>].skills | upsert_agent_field (full sequence) |
All mutations land atomically (tempfile + rename) and are gated by the same process-wide YAML mutex the legacy upsert path uses, so concurrent wizard sessions don't corrupt the file.
Hot-reload
After every successful mutation, the wizard fires a best-effort
nexo --config <dir> reload so a running daemon picks up the YAML
edit without a manual restart. The call is fire-and-forget: when
the binary isn't on PATH or the daemon isn't running, the wizard
keeps going silently.
Where the code lives
crates/setup/src/agent_wizard.rs— submenu + dashboard.crates/setup/src/yaml_patch.rs—read_agent_field,upsert_agent_field,remove_agent_field,append_agent_list_item,remove_agent_list_item.crates/setup/tests/agent_wizard_yaml.rs— schema-roundtrip tests that re-parse the mutated YAML throughnexo_config::AgentsConfig.
Verifying releases
Every Nexo release artifact is signed with Sigstore Cosign using keyless OIDC — no long-lived private key, no PGP key management, no out-of-band trust establishment. The signature is tied to the GitHub Actions workflow run that produced the artifact, and a public record lives in the Rekor transparency log.
Why keyless
Traditional signing requires a long-lived signing key. If it leaks, every past release becomes suspect. Keyless signing instead anchors each signature to:
- The GitHub Actions OIDC identity of the workflow run
(
https://token.actions.githubusercontent.com) - The specific repo + workflow file that ran
(
https://github.com/lordmacu/nexo-rs/.github/workflows/...) - The commit + ref the workflow built from
A short-lived certificate (10 min validity) is issued by Sigstore's
fulcio CA, the artifact is signed with it, and the whole bundle
is recorded in rekor (immutable). To forge a signature, an
attacker would need to compromise GitHub's OIDC infra and the
exact workflow path — and even then the forgery shows up in the
public log.
Install Cosign
# macOS:
brew install cosign
# Linux (Debian/Ubuntu):
curl -L "https://github.com/sigstore/cosign/releases/latest/download/cosign-linux-amd64" \
-o /usr/local/bin/cosign
chmod +x /usr/local/bin/cosign
# Linux (Fedora/RHEL):
sudo dnf install cosign
# Verify the install:
cosign version
Verify a Docker image
Every image at ghcr.io/lordmacu/nexo-rs is cosign-signed by the
docker.yml workflow. Verify any tag with:
cosign verify ghcr.io/lordmacu/nexo-rs:latest \
--certificate-identity-regexp 'https://github.com/lordmacu/nexo-rs/.*' \
--certificate-oidc-issuer https://token.actions.githubusercontent.com
A successful verification prints the full certificate + the Rekor entry URL. Anything else (signature missing, identity mismatch, broken cert chain) means don't trust this image — check the release notes, file an issue.
Verify a downloaded binary / .deb / .rpm / .tar.gz
The sign-artifacts.yml workflow attaches three files next to
every release asset:
<asset>.sig— the raw signature<asset>.pem— the leaf certificate<asset>.bundle— combined Sigstore bundle (preferred; carries the inclusion proof)
Verify with the bundle (recommended, single command):
cosign verify-blob \
--bundle nexo-rs_0.1.1_amd64.deb.bundle \
--certificate-identity-regexp 'https://github.com/lordmacu/nexo-rs/.*' \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
nexo-rs_0.1.1_amd64.deb
Or with the standalone .sig + .pem if you prefer:
cosign verify-blob \
--signature nexo-rs_0.1.1_amd64.deb.sig \
--certificate nexo-rs_0.1.1_amd64.deb.pem \
--certificate-identity-regexp 'https://github.com/lordmacu/nexo-rs/.*' \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
nexo-rs_0.1.1_amd64.deb
Verify in CI / scripted contexts
Drop this in a deploy pipeline:
#!/usr/bin/env bash
set -euo pipefail
ASSET="${1:?usage: $0 <asset-path>}"
BUNDLE="${ASSET}.bundle"
if [ ! -f "$BUNDLE" ]; then
echo "ERROR: $BUNDLE missing — refusing to deploy unsigned artifact" >&2
exit 1
fi
cosign verify-blob \
--bundle "$BUNDLE" \
--certificate-identity-regexp 'https://github.com/lordmacu/nexo-rs/.*' \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
"$ASSET" \
|| { echo "ERROR: signature verification failed for $ASSET" >&2; exit 2; }
Inspecting the transparency log
Every signature is searchable on Rekor:
# Search by artifact sha256:
cosign tree ghcr.io/lordmacu/nexo-rs:latest
The output shows every cosign-related artifact attached to the image (signatures, attestations, SBOMs) plus the Rekor log index where each was recorded.
What if verification fails
- Identity regex doesn't match — the asset may have been built from a fork / unofficial workflow. Re-download from the GitHub release page directly.
bundlefile missing — older releases (pre-Phase 27.3) don't have signatures. Tagv0.1.1is the first signed release.- Cert chain expired / revoked — Sigstore's
fulcioroot CA has a long lifespan, but the leaf cert is short-lived.cosignautomatically fetches the right TUF root; if you see chain errors runcosign initializeto refresh local trust roots. - Network errors talking to Rekor / Fulcio — both have CDN
in front. Retry, or use
--insecure-ignore-tlogfor local verification (drops the transparency log check — only safe in air-gapped trust contexts).
Out of scope (for now)
- Long-lived PGP keys for the apt / yum repos — needs Phase 27.4 signed-repo work to consume them on the user side. Until that ships, .deb / .rpm signatures live in the Cosign world only.
- A Homebrew bottle-signing path that lets
brewvalidate without the OIDC chain — Phase 27.6 follow-up.
Reproducible builds + SBOM
Every Nexo release ships with two artefacts that let an operator verify provenance and exact composition:
- CycloneDX SBOM (
sbom-cyclonedx.json) — every cargo dependency at the exact version + hash that was compiled into the binary. - SPDX SBOM (
sbom-spdx.json) — full filesystem scan viasyft, captures anything that wasn't a cargo dep (bundled binaries, generated assets, vendored data files).
Both SBOMs are Cosign-signed (*.bundle) using the same keyless
OIDC chain documented in Verifying releases.
Reading the SBOMs
# Pretty-print the CycloneDX dep tree:
jq '.components | map({name, version, purl})' sbom-cyclonedx.json | less
# Find a specific crate:
jq '.components[] | select(.name == "tokio")' sbom-cyclonedx.json
# Audit the cargo deps with `cargo-audit` (run against the SBOM,
# without rebuilding):
cargo audit --db ~/.cargo/advisory-db --json | \
jq -r '.vulnerabilities.list[].advisory.id'
Reproducible build claim
The release workflow targets bit-identical binary between two
runs given the same git sha + rust-toolchain.toml + Cargo.lock.
The pipeline pins:
- Rust toolchain:
rust-toolchain.tomlfixes the channel + components. - Dependency versions:
Cargo.lockis committed and--lockedis used by every release build. - Build environment: GitHub Actions
ubuntu-latestrunner +cargo build --releasewith noRUSTFLAGSoverrides. - Build provenance: SLSA Level 2 attestation generated by
actions/attest-build-provenance(Phase 27.2 wiring). - Cosign signature: each binary + SBOM signed via OIDC (Phase 27.3).
Reproducing a release locally
# 1. Check out the exact tag.
git clone https://github.com/lordmacu/nexo-rs && cd nexo-rs
git checkout v0.1.1
# 2. Build with the locked deps.
rustup show # confirms the toolchain matches rust-toolchain.toml
cargo build --release --bin nexo --locked
# 3. Compare your binary's sha256 against the release asset:
sha256sum target/release/nexo
# Expected: same hash listed in `sha256sums.txt` on the GitHub release.
If the hashes don't match: the build is not reproducible on your host. Common reasons:
- Different glibc version → embedded
__VERSIONED_SYMBOLstrings drift. The release workflow runs onubuntu-latest(currently Ubuntu 24.04, glibc 2.39); building on Debian 12 (glibc 2.36) produces different bytes. - Different LLVM in your local rustc build (rare, mostly affects Mac users compiling with stable + nightly side-by-side).
- Local
~/.cargo/config.tomlinjectingRUSTFLAGS. - Build PROFILE-DEV vs PROFILE-RELEASE.
For a guaranteed bit-identical reproduction, build inside the same container the workflow uses:
docker run --rm -v $(pwd):/src -w /src \
rust:1.80-bookworm \
cargo build --release --bin nexo --locked
This reproduces what the GitHub Actions runner would do — same glibc, same toolchain version, same LLVM.
SLSA verification
The workflow attaches an attestation.intoto.jsonl (SLSA Level 2
provenance) per release. Verify with slsa-verifier:
go install github.com/slsa-framework/slsa-verifier/v2/cli/slsa-verifier@latest
slsa-verifier verify-artifact nexo \
--provenance-path attestation.intoto.jsonl \
--source-uri github.com/lordmacu/nexo-rs \
--source-tag v0.1.1
A green verification proves:
- The artefact came from the
lordmacu/nexo-rsrepo - It was built by a GitHub-hosted runner (not a fork or local box)
- The build inputs match what's recorded in the provenance
Auditing for known CVEs
The SBOM lets cargo-audit work without rebuilding:
# Convert CycloneDX → cargo-audit's format:
cyclonedx-cli convert --input-format json \
--output-format json sbom-cyclonedx.json | \
jq '...' > deps.json
# Or just feed it to grype (broader scope, multi-format):
grype sbom:./sbom-cyclonedx.json
grype sbom:./sbom-spdx.json
Grype catches CVEs across both Rust crates and any system-level deps captured by syft.
Out of scope (deferred)
apk/pkgSBOM for the Termux deb — Termux's package metadata doesn't speak SPDX yet. The release SBOMs cover the same artifact contents though.- Reproducible Docker image layers — the current Dockerfile
uses
apt-get update && apt-get installwhich pulls whatever's latest at build time. Pinning to specific Debian package versions is a follow-up (Phase 34 hardening).
Architecture overview
nexo-rs is a single-process multi-agent runtime. One binary (agent)
hosts every agent, every channel plugin, every extension, and the
persistence layer. Coordination between components happens over NATS
(with a local tokio-mpsc fallback when NATS is offline).
Why single-process: shared in-memory caches, zero IPC overhead between agent and tool invocations, simpler ops. The broker and disk queue give us the durability a multi-process layout would provide, without the coordination cost.
High-level layout
flowchart TB
subgraph PROC[agent process]
direction TB
subgraph PLUGINS[Channel plugins]
WA[WhatsApp]
TG[Telegram]
MAIL[Email / Gmail poller]
BR[Browser CDP]
GOOG[Google APIs]
end
subgraph BUS[Event bus]
NATS[(NATS)]
LOCAL[(Local mpsc fallback)]
DQ[(Disk queue + DLQ)]
end
subgraph AGENTS[Agent runtimes]
A1[Agent: ana]
A2[Agent: kate]
A3[Agent: ops]
end
subgraph STORE[Persistence]
STM[(Short-term sessions<br/>in-memory)]
LTM[(Long-term memory<br/>SQLite + sqlite-vec)]
WS[(Workspace-git<br/>per agent)]
end
subgraph TOOLS[Tools & integrations]
EXT[Extensions<br/>stdio / NATS]
MCP[MCP client / server]
LLM[LLM providers]
end
PLUGINS --> BUS
BUS --> AGENTS
AGENTS --> BUS
AGENTS --> STORE
AGENTS --> TOOLS
TOOLS --> LLM
end
USERS[End users] <--> PLUGINS
Workspace crates
The Cargo.toml workspace defines these member crates:
| Crate | Responsibility |
|---|---|
crates/core | Agent runtime, trait, SessionManager, HookRegistry, heartbeat, tool registry |
crates/broker | NATS client, local fallback, disk queue, DLQ, backpressure |
crates/llm | LLM clients (MiniMax, Anthropic, OpenAI-compat, Gemini), retry, rate limiter |
crates/memory | Short-term sessions, long-term SQLite, vector search via sqlite-vec |
crates/config | YAML parsing, env-var resolution, secrets loading |
crates/extensions | Manifest parser, discovery, stdio + NATS runtimes, watcher, CLI |
crates/mcp | MCP client (stdio + HTTP), server mode, tool catalog, hot-reload |
crates/taskflow | Durable flow state machine with wait/resume |
crates/resilience | CircuitBreaker three-state machine |
crates/setup | Interactive wizard, YAML patcher, pairing flows |
crates/tunnel | Public HTTPS tunnel for pairing / webhooks |
crates/plugins/browser | Chrome DevTools Protocol client |
crates/plugins/whatsapp | Wrapper over whatsapp-rs (Signal Protocol) |
crates/plugins/telegram | Bot API client |
crates/plugins/email | IMAP / SMTP |
crates/plugins/gmail-poller | Cron-style Gmail → broker bridge |
crates/plugins/google | Gmail / Calendar / Drive / Sheets tools |
Binaries
Defined in Cargo.toml:
| Binary | Entry | Purpose |
|---|---|---|
agent | src/main.rs | Main daemon; also exposes setup, dlq, ext, flow, status subcommands |
browser-test | src/browser_test.rs | CDP integration smoke test |
integration-browser-check | src/integration_browser_check.rs | End-to-end browser flow validation |
llm_smoke | src/bin/llm_smoke.rs | LLM provider smoke test |
Runtime topology
agent runs a single tokio multi-thread runtime. Work is split into
independent tasks:
flowchart LR
MAIN[main tokio runtime]
MAIN --> PA[Per-agent runtime task]
MAIN --> PI[Plugin intake loops]
MAIN --> HB[Heartbeat scheduler]
MAIN --> MCP[MCP runtime manager]
MAIN --> EXT[Extension stdio runtimes]
MAIN --> MET[Metrics server :9090]
MAIN --> HEALTH[Health server :8080]
MAIN --> ADMIN[Admin console :9091]
MAIN --> LOCK[Single-instance lock watcher]
Each agent runtime owns its own subscription to inbound topics, its own
session manager view, its own LLM-loop state. Agents do not share
mutable in-memory state — coordination between agents happens over the
event bus (agent.route.<target_id>).
What lives where — quick mental model
- A message arrives → lands on
plugin.inbound.<channel>(NATS) - Agent runtime consumes it →
SessionManagerattaches or creates a session,HookRegistryfiresbefore_message - LLM loop runs → tools invoked through the registry, which calls
into extensions / MCP / built-ins, each wrapped by
CircuitBreaker - Tool result flows back →
after_tool_callhooks fire, LLM decides next turn - Agent emits reply → publishes to
plugin.outbound.<channel> - Channel plugin delivers → physical message goes to the user
Details per subsystem:
Agent runtime
The agent runtime is the per-agent machinery that consumes inbound
events, drives the LLM loop, invokes tools, and emits outbound events.
One AgentRuntime is instantiated per configured agent at boot; each
runs as its own async task.
Source: crates/core/src/agent/ (behavior.rs, agent.rs,
runtime.rs, hook_registry.rs), boot in src/main.rs.
AgentBehavior trait
Every agent implements AgentBehavior (crates/core/src/agent/behavior.rs).
The trait is intentionally small — default no-ops let built-in types
(like LlmAgentBehavior) override only what they need.
| Method | Fires on | Default |
|---|---|---|
on_message(ctx, msg) | Inbound message from a plugin | no-op |
on_event(ctx, event) | Any event on a subscribed topic | no-op |
on_heartbeat(ctx) | Periodic tick (if heartbeat enabled) | no-op |
decide(ctx, msg) | LLM-reasoning hook (stub for custom flows) | empty string |
The shipped LlmAgentBehavior implements the full chat-completion
loop with tool calls, streaming, rate-limited retry, and hook fan-out.
Boot sequence
sequenceDiagram
participant Main as src/main.rs
participant Cfg as AppConfig
participant Disc as Extension discovery
participant SM as SessionManager
participant TR as ToolRegistry
participant LLM as LLM client
participant AR as AgentRuntime
participant Bus as Broker
Main->>Cfg: load(config_dir)
Main->>Disc: run_extension_discovery()
Main->>SM: with_cap(ttl, max_sessions)
Main->>TR: register built-ins + extensions + MCP
Main->>LLM: build per provider (w/ CircuitBreaker)
loop per agent in config
Main->>AR: new(agent_id, behavior, tools, sm, llm, broker)
AR->>Bus: subscribe plugin.inbound.<channel>+
AR->>Bus: subscribe agent.route.<agent_id>
AR-->>Main: ready
end
Main->>Main: install signal handlers
Main->>Main: serve forever
Request/response lifecycle
A single inbound message drives the following flow inside one agent runtime:
sequenceDiagram
participant Bus as NATS
participant AR as AgentRuntime
participant SM as SessionManager
participant HR as HookRegistry
participant LLM as LLM
participant TR as ToolRegistry
participant Ext as Extension / MCP / built-in
Bus->>AR: plugin.inbound.<ch>
AR->>SM: get_or_create(session_key)
AR->>HR: fire("before_message")
loop LLM turn loop
AR->>LLM: completion(messages, tools)
LLM-->>AR: assistant turn (text or tool_calls)
alt tool_calls present
AR->>HR: fire("before_tool_call", name, args)
AR->>TR: invoke(tool_name, args)
TR->>Ext: call
Ext-->>TR: result
TR-->>AR: result
AR->>HR: fire("after_tool_call", name, result)
else text only
AR->>Bus: publish plugin.outbound.<ch>
end
end
AR->>HR: fire("after_message")
SessionManager
Defined in crates/core/src/session/manager.rs. Tracks per-user
conversational state in memory.
- Key:
SessionKeyderived from(agent_id, channel, sender_id); group chats get one session per group - Storage:
DashMap<SessionKey, Session>— lock-free concurrent map - TTL: configured via
memory.short_term.session_ttl(default 30 min); each access updateslast_access - Cap: soft limit
DEFAULT_MAX_SESSIONS = 10,000; on overflow the oldest-idle session is evicted before insert - Sweeper: background task scans every 1 s, removes expired entries
- Callbacks:
on_expire()fires viatokio::spawnwhen a session is dropped — used by the MCP runtime to tear down per-session children
stateDiagram-v2
[*] --> Active: first message
Active --> Active: on_message / on_event<br/>(last_access updated)
Active --> Expired: idle > TTL
Active --> Evicted: cap exceeded,<br/>oldest-idle chosen
Expired --> [*]: sweeper removes
Evicted --> [*]: on_expire() fires
HookRegistry
Defined in crates/core/src/agent/hook_registry.rs. Lets extensions
inject behavior at well-known points in the lifecycle without patching
the runtime.
- Hook names: arbitrary strings. In practice the runtime fires:
before_message,after_message,before_tool_call,after_tool_call,on_session_start,on_session_end - Fan-out: sequential by priority (lower first), insertion order breaks ties
- Cap: 128 handlers per hook name — defensive guard against a buggy extension re-registering on every reload
- Errors: logged, treated as
Continue— one misbehaving hook does not cascade into the rest - Override: a hook may return
Override(new_args)to mutate what the next hook (or the runtime itself) sees
Heartbeat
# per-agent config
heartbeat:
enabled: true
interval: 30s
- Scheduled per agent if
heartbeat.enabled: true - Interval parsed via
humantime— anyhumantimeduration works - Each tick:
- Fires
AgentBehavior::on_heartbeat(ctx) - Publishes
agent.events.<agent_id>.heartbeat
- Fires
- Typical uses: proactive messages ("good morning"), reminders, external state syncs (pull Gmail, scan calendar), liveness pings
Graceful shutdown
src/main.rs installs SIGTERM / Ctrl+C handlers. On signal, the process
tears down in a specific order so in-flight work finishes cleanly:
flowchart TD
SIG[SIGTERM / Ctrl+C] --> C1[Cancel dream-sweep loops<br/>5 s grace]
C1 --> C2[Mark /ready = false<br/>stop new traffic]
C2 --> C3[Stop plugin intake<br/>no new inbound]
C3 --> C4[Shutdown MCP runtime manager<br/>5 s clean close]
C4 --> C5[Shutdown extensions<br/>5 s grace then kill_on_drop]
C5 --> C6[Stop agent runtimes<br/>drain buffered messages]
C6 --> C7[Abort metrics + health tasks]
C7 --> EXIT([exit 0])
This order is enforced in src/main.rs around lines 1389–1458.
Extensions get the longest grace period because stdio children can be
mid-tool-call; the disk queue absorbs any events that the plugins
couldn't finish publishing.
Why this shape
- One tokio runtime, many tasks: lets you run 10 agents on one CPU core when idle, saturates cores under load. No thread-per-agent bloat.
- No shared mutable state across agents: each agent holds its own registry views, its own session map. Cross-agent communication goes over the bus → visible, replayable, testable.
- Hooks instead of inheritance: extensions customize behavior without recompiling the core. Every insertion point is named, sequenced, and capped.
Event bus (NATS)
Every piece of communication between plugins, agents, and the broker
layer itself flows over NATS (async-nats = 0.35). When NATS is
offline, a local tokio::mpsc bus takes over and a SQLite-backed disk
queue holds events until reconnection. No events are lost.
Source: crates/broker/ (nats.rs, local.rs, disk_queue.rs,
topic.rs).
Why NATS
- Subject-based routing fits the "N plugins × M agents" fan-out
naturally (
plugin.inbound.*wildcards) - Low-latency pub/sub with no broker-side state to manage
- Cluster-ready without rewriting the data plane
- Async-nats is mature, has
JetStreamif we ever need it
The design doc discusses the alternatives (RabbitMQ, Redis streams)
that were rejected; see proyecto/design-agent-framework.md.
Subject namespace
| Pattern | Direction | Example | Who publishes | Who subscribes |
|---|---|---|---|---|
plugin.inbound.<plugin> | plugin → agent | plugin.inbound.whatsapp | Channel plugins | Agent runtimes |
plugin.inbound.<plugin>.<instance> | plugin → agent | plugin.inbound.telegram.sales_bot | Multi-instance plugins (WA, TG) | Agent runtimes |
plugin.outbound.<plugin> | agent → plugin | plugin.outbound.whatsapp | Agent tools (send, reply…) | Channel plugins |
plugin.outbound.<plugin>.<instance> | agent → plugin | plugin.outbound.whatsapp.ana | Agent tools | Specific plugin instance |
plugin.health.<plugin> | plugin → runtime | plugin.health.browser | Plugins | Health server |
agent.events.<agent_id> | internal | agent.events.ana | Runtime internals | Dashboards, tests |
agent.events.<agent_id>.heartbeat | scheduler → agent | agent.events.kate.heartbeat | Heartbeat scheduler | That agent |
agent.route.<target_id> | agent → agent | agent.route.ops | Sending agent's delegate tool | Target agent runtime |
taskflow.resume | external → flow | taskflow.resume | Anything (other agents, services, ops) | TaskFlow resume bridge |
Multi-instance plugins append an .<instance> suffix so two WhatsApp
accounts (e.g. Ana's line and Kate's line) can run side by side without
subject collisions.
Agent-to-agent routing
sequenceDiagram
participant Ana
participant Bus as NATS
participant Ops
Ana->>Ana: LLM decides to delegate
Ana->>Bus: publish agent.route.ops<br/>(correlation_id=X)
Bus->>Ops: deliver
Ops->>Ops: on_message handler runs
Ops->>Bus: publish agent.route.ana<br/>(correlation_id=X)
Bus->>Ana: deliver
Ana->>Ana: correlate reply by ID
The sender always includes a correlation_id in the event envelope;
the receiver echoes it on the reply. That's how one agent can fan out
to several agents and reassemble results.
Broker abstraction
crates/broker exposes a Broker trait implemented by two backends:
NatsBroker— real NATS connection wrapped in aCircuitBreakerLocalBroker— in-processtokio::mpscfor tests and offline mode
Switching between them is driven by config. The local broker matches
NATS subject semantics (including . segments and > wildcards),
which keeps the test surface identical to production.
Disk queue
When a publish to NATS fails — circuit breaker open, connection lost, transient 5xx — the event is persisted to the disk queue instead of being dropped.
| Property | Value |
|---|---|
| Storage | SQLite |
| Default path | ./data/ (configurable via broker.persistence.path) |
| Tables | pending_events, dead_letters |
| Event format | JSON serialization of Event { id, topic, payload, enqueued_at, attempts } |
| Drain order | FIFO by enqueued_at |
| Batch size | up to 100 per drain() call |
| Max attempts before DLQ | 3 (DEFAULT_MAX_ATTEMPTS) |
flowchart LR
PUB[publish] --> OK{NATS up?}
OK -->|yes| NATS[(NATS)]
OK -->|no| ENQ[disk_queue.enqueue]
ENQ --> SQLITE[(pending_events)]
RECON[NATS reconnect] --> DRAIN[disk_queue.drain]
SQLITE --> DRAIN
DRAIN --> NATS
DRAIN -.->|3 attempts failed| DLQ[(dead_letters)]
DRAIN -.->|deserialization error| DLQ
Drain on reconnect
When NatsBroker detects reconnection, it calls disk_queue.drain():
- Read up to 100 oldest events from
pending_events - Republish each to NATS
- On success: delete row
- On failure: increment
attempts, leave row in place - Once
attempts >= 3: move todead_letters
Dead-letter queue (DLQ)
Events that exhaust retries, or fail to deserialize at all, land in
dead_letters. They're not silently discarded — CLI lets you inspect
and replay them.
agent dlq list # show all dead events
agent dlq replay <event_id> # move one back to pending_events
agent dlq purge # drop the table (destructive!)
Replay moves the entry back to pending_events; the next drain cycle
retries it with attempts reset.
Backpressure
Two independent mechanisms:
- Local broker channels are 256-capacity
tokio::mpscper subscriber. If a subscriber is slow, dropped events log aslow consumerwarning but the subscription stays alive. - Disk queue applies proportional sleep at >50% capacity (scaled
from 0 ms up to
MAX_BACKPRESSURE_MS = 500 ms). At the hard cap it additionally drops the oldest event and sleeps 500 ms — an intentional "shed load, don't block the producer forever" stance.
The disk queue's backpressure only matters when NATS is down for a long time and the producer is faster than real time. In normal operation the disk queue stays near-empty.
Local fallback
When NATS is unreachable or the circuit breaker on the publish path is Open, the runtime degrades gracefully:
- Inbound events from local plugins (e.g. a Telegram webhook fielded
in-process) go through
LocalBrokerand reach agents immediately - Outbound events that target a plugin hosted in the same process
(which is every shipped plugin) also go through
LocalBroker - Anything that would have crossed a real NATS hop sits in the disk queue until reconnection
In practice, single-machine deployments keep working even with no NATS at all — the disk queue and the local broker together are sufficient for one process. NATS starts earning its keep the moment you scale to multiple processes, machines, or regions.
Fault tolerance
Every external call goes through a CircuitBreaker. Every retryable
error has a bounded retry policy with jittered exponential backoff.
Every event survives a NATS outage. A second process cannot race the
first onto the same bus.
This page collects all of those guardrails in one place.
CircuitBreaker
Source: crates/resilience/src/lib.rs.
A three-state machine wrapped around any fallible external call. Once a dependency is failing, the breaker fails fast instead of piling up calls against a dead endpoint; periodic probes let it recover without human intervention.
stateDiagram-v2
[*] --> Closed
Closed --> Open: 5 consecutive failures
Open --> HalfOpen: backoff elapsed
HalfOpen --> Closed: 2 consecutive successes
HalfOpen --> Open: any failure<br/>(backoff × 2, capped)
Defaults
| Field | Default | Meaning |
|---|---|---|
failure_threshold | 5 | consecutive failures before opening |
success_threshold | 2 | consecutive successes in HalfOpen before closing |
initial_backoff | 10 s | wait time on first open |
max_backoff | 120 s | cap on exponential backoff |
Where it wraps
- LLM calls — one circuit per provider (MiniMax, Anthropic, OpenAI-compat, Gemini). A provider outage doesn't cascade to others.
- NATS publish — one circuit over the broker. When it opens the disk queue absorbs writes.
- CDP commands — one circuit per browser session. A dead Chrome doesn't freeze the agent loop.
- Extension stdio — implicit via the
StdioRuntimelifecycle (crashed child → respawn, bounded).
Signals
CircuitBreaker exposes the usual methods (allow(), on_success(),
on_failure()) plus two explicit overrides:
trip()— force Open from outside (e.g. a health check decided the dep is down before a call fails)reset()— force Closed (e.g. the operator just restored the dep and doesn't want to wait for the probe window)
Retry policies
Retries live at a layer above the circuit breaker — they handle transient failures (429, 5xx, network blips) that don't warrant flipping the breaker. Every retry policy uses jittered exponential backoff to avoid thundering-herd reconnection storms.
| Component | Max attempts | Backoff range |
|---|---|---|
| LLM 429 (rate limit) | 5 | 1 s → 60 s, jittered exponential |
| LLM 5xx (server error) | 3 | 1 s → 30 s, jittered exponential |
| NATS publish drain | 3 per event | disk queue drain cycle |
| CDP | via circuit only | backoff = circuit's open window |
These live in crates/llm/src/retry.rs (LLM) and
crates/broker/src/disk_queue.rs (NATS drain).
Error classification
Retries only trigger on retryable errors. A 4xx other than 429 — missing key, invalid model, malformed request — fails fast. The rationale: retrying a misconfigured call wastes budget and still fails. Fail loudly, fix the config.
No message drop
The broker layer guarantees at-least-once delivery for publishes that reach the runtime:
flowchart LR
P[publisher] --> TRY{NATS healthy?}
TRY -->|yes| NATS[(NATS)]
TRY -->|no| DQ[(disk queue)]
DQ --> WAIT{reconnect?}
WAIT -->|yes| DRAIN[drain FIFO]
DRAIN --> NATS
DQ -->|3 failed attempts| DLQ[(dead letters)]
DLQ --> CLI[agent dlq replay]
In the absolute worst case — NATS down forever, disk full — the disk queue starts shedding oldest events at its hard cap, but the producer never crashes and never silently drops.
Single-instance lockfile
A second agent process pointed at the same data directory would
double-subscribe every topic, delivering every message twice. To
prevent that, boot acquires a lockfile and kicks out any stale or
racing instance.
Source: src/main.rs::acquire_single_instance_lock.
flowchart TD
START[agent boot] --> READ[read data/agent.lock]
READ --> EXIST{file exists?}
EXIST -->|no| WRITE[write our PID]
EXIST -->|yes| PID[parse PID]
PID --> ALIVE{/proc/PID/ exists?}
ALIVE -->|no| WRITE
ALIVE -->|yes| SIGTERM[send SIGTERM]
SIGTERM --> WAIT[wait up to 5 s<br/>50 × 100 ms polls]
WAIT --> DEAD{process gone?}
DEAD -->|yes| WRITE
DEAD -->|no| SIGKILL[send SIGKILL]
SIGKILL --> WRITE
WRITE --> LOCK[RAII handle alive]
The SingleInstanceLock RAII struct stores our own PID. On drop it
only removes the lockfile if the stored PID still matches the current
one — so a takeover by a third process doesn't let the original
owner wipe the lock on its way out.
Graceful shutdown
See Agent runtime — Graceful shutdown for the ordered teardown sequence. Key points from a fault-tolerance angle:
- Dream-sweep loops and MCP sessions get explicit grace windows so in-flight work doesn't produce partial state
- Plugin intake is stopped before agent runtimes — the runtimes drain anything already in their mailboxes before exiting
- If the disk queue has unflushed events on SIGTERM, they survive to the next boot
Operator guardrails
Beyond the automatic mechanisms:
- Skill gating — an extension declaring
requires.env = ["FOO"]is skipped at discovery whenFOOis unset, instead of being registered and failing on every invocation. See Extensions — manifest. - Inbound filter — events with neither text nor media (receipts, typing indicators, reactions-only) are dropped before they reach the LLM, saving cost and avoiding noisy turns.
- Health endpoints —
:8080/readyand:8080/liveexpose lifecycle state for k8s liveness / readiness probes. - Metrics —
:9090/metrics(Prometheus) exposes everything from inbound event counts to circuit breaker state; see Metrics.
Transcripts (FTS + redaction)
Per-session JSONL transcripts under agents.<id>.transcripts_dir are
the canonical record of every turn. Two optional layers wrap that
record:
- FTS5 index — a SQLite virtual table that mirrors transcript
content for
MATCHqueries. Backs thesession_logstool'ssearchaction when present. - Redaction — a regex pre-processor that rewrites entry content before it ever reaches disk. Patterns target common credentials and home-directory paths.
Source: crates/core/src/agent/transcripts_index.rs,
crates/core/src/agent/redaction.rs,
crates/core/src/agent/transcripts.rs.
Configuration
config/transcripts.yaml (optional; absent → defaults below):
fts:
enabled: true # default
db_path: ./data/transcripts.db # default
redaction:
enabled: false # default — opt in
use_builtins: true # only relevant if enabled
extra_patterns:
- { regex: "TENANT-[0-9]+", label: "tenant_id" }
JSONL is the source of truth. The FTS index is derivable; if the DB
is corrupted or deleted, agent transcripts reindex (planned) can
rebuild it from disk.
FTS schema
CREATE VIRTUAL TABLE transcripts_fts USING fts5(
content,
agent_id UNINDEXED,
session_id UNINDEXED,
timestamp_unix UNINDEXED,
role UNINDEXED,
source_plugin UNINDEXED,
tokenize = 'unicode61 remove_diacritics 2'
);
The DB is shared across agents; isolation is enforced at query time
by WHERE agent_id = ?. User queries are escaped as a single FTS5
phrase so operators (OR, NOT, :) in the user input never reach
the engine as syntax.
session_logs integration
When the index is available, the search action returns:
{
"ok": true,
"query": "reembolso",
"backend": "fts5",
"count": 3,
"hits": [
{
"session_id": "…",
"timestamp": "2026-04-25T18:00:00Z",
"role": "user",
"source_plugin": "wa",
"preview": "...quería un [reembolso] del pedido..."
}
]
}
If the index is None (FTS disabled or init failed), the action
falls back to the legacy substring scan over JSONL. The shape is the
same minus backend: "fts5".
Redaction patterns
| Label | Detects | Example match |
|---|---|---|
bearer_jwt | Bearer eyJ… JWT triplets | Bearer eyJhbGc.eyJzdWI.dGVzdA |
anthropic_key | Anthropic API keys | sk-ant-abcdef… |
openai_key | sk- prefix API keys (OpenAI etc.) | sk-abc123… |
aws_access_key | AWS access key id | AKIAIOSFODNN7EXAMPLE |
hex_token_32 | Long hex strings | 5d41402abc4b2a76b9719d911017c592 |
home_path | Linux/macOS home dirs | /home/familia, /Users/alice |
Each match is replaced with [REDACTED:<label>]. Patterns run in the
order above, so more specific shapes (Bearer JWT, Anthropic) win over
generic catch-alls below.
A 40-char base64 pattern targeting AWS secret keys was deliberately
omitted — it produces too many false positives on legitimate hashes
and opaque ids. Operators who need it can add it scoped via
extra_patterns.
Custom patterns
redaction:
enabled: true
extra_patterns:
- { regex: "TENANT-[0-9]+", label: "tenant_id" }
- { regex: "internal\\.acme", label: "internal_host" }
Custom patterns run after built-ins. Invalid regex aborts boot with a message naming the offending index and label.
What redaction does not do
- It does not maintain a reverse map. Once content is redacted on disk the original is gone — by design. A reversible mapping would recreate the leak surface this feature is meant to close.
- It does not rewrite previously-written JSONL files. New entries redact going forward; historical content stays as-is.
- It does not redact
tracinglogs — that's a separate concern. - The FTS index stores the redacted text, so
searchresults never surface the original secrets either.
Operational notes
- The FTS index uses WAL journaling and capped pool size of 4 — it shares the same idiom as the long-term memory DB.
- Insert is best-effort. If an FTS write fails (disk full, lock
contention) the tool logs at
warnand the JSONL append still succeeds. The source of truth is never compromised. - Boot logs include
transcripts FTS index ready(or the warn that it fell back) andtranscripts redaction activewhen the redactor has any rule loaded.
nexo-rs vs OpenClaw
OpenClaw is the closest reference point in the multi-channel-agent-gateway space. nexo-rs mined OpenClaw's plugin SDK, channel boundaries, and skills layout for ideas, then rebuilt the runtime in Rust with stricter operational guarantees. This page lays out the differences honestly — including where OpenClaw still has the edge.
Substrate
| Dimension | OpenClaw | nexo-rs |
|---|---|---|
| Language | TypeScript | Rust |
| Runtime | Node 22+ | none — single statically-linked binary |
| Install footprint | pnpm install over ~42 runtime deps + 24 dev deps | one binary, 34 MB built (29 MB stripped, 13 MB gzipped) |
| Cold-start | node boot + module resolution | direct exec — sub-100ms to agent serve |
| Mobile target | feasible with Termux + Node | first-class on Termux, no root, no Docker |
| Memory safety | runtime errors | Rust ownership: data races, use-after-free, null deref refused at compile |
The single-binary shape is the reason nexo-rs runs comfortably on a
phone (Termux) and on a fresh VPS without a Node ecosystem
underneath. cargo build --release and ship target/release/agent
— that is the whole deliverable.
Process & messaging
| Dimension | OpenClaw | nexo-rs |
|---|---|---|
| Process model | single Node process | multi-process via NATS, in-process LocalBroker fallback when NATS is offline |
| Subject namespace | n/a (in-process buses) | plugin.inbound.<plugin>[.instance] / plugin.outbound.… / agent.route.<id> / taskflow.resume |
| Fault tolerance | best-effort | NatsBroker wraps every publish in a CircuitBreaker; failures spill to a SQLite-backed disk queue and drain on reconnect |
| At-least-once delivery | n/a | drain path documented as at-least-once; consumers dedupe by event.id |
| DLQ | n/a | failed events land in dead_letters after 3 attempts; agent dlq list/replay/purge from the CLI |
| Subscription survival | restart | NATS subscriptions auto-resubscribe on reconnect with backoff (250 ms → 10 s) |
Hot reload
| Dimension | OpenClaw | nexo-rs |
|---|---|---|
| Config change | restart | agent reload (or file-watcher trigger) swaps a RuntimeSnapshot via ArcSwap — in-flight turns finish on the old snapshot, the next event picks up the new one |
| Watched files | — | agents.yaml, agents.d/*.yaml, llm.yaml (extra paths via runtime.yaml) |
| Per-agent reload channel | — | mpsc to each AgentRuntime, the coordinator drains acks to confirm |
Per-agent capability sandbox
OpenClaw's plugin allowlist is global to the gateway. nexo-rs pushes the allowlist down to the agent and the binding (the inbound channel surface):
agents:
- id: kate
plugins: [whatsapp, telegram, browser, taskflow]
allowed_tools: ["whatsapp_*", "browser_navigate", "memory_*"]
outbound_allowlist:
whatsapp: ["+57…"]
telegram: [123456789]
skill_overrides:
ffmpeg-tools: warn
accept_delegates_from: ["ana"]
inbound_bindings:
- plugin: whatsapp
instance: kate_wa
# per-binding overrides for the same agent
allowed_tools: ["whatsapp_*"]
outbound_allowlist:
whatsapp: ["+57…"]
What that buys:
- An LLM running under
katecannot send messages to a number not inoutbound_allowlist, even if a prompt injection asks it to. - Two channels exposed to the same agent (sales WA, private TG) carry different capability surfaces — the sales binding doesn't get the private one's tool set.
- Skill modes (
strict/warn/disable) are decided per agent, with explicitrequires.bin_versionssemver constraints (probed at boot, process-cached).
Secrets
| Dimension | OpenClaw | nexo-rs |
|---|---|---|
| Credential resolution | env vars | agents.<id>.credentials block per channel; resolver maps to per-channel stores (gauntlet validates at boot) |
| 1Password | n/a | op CLI extension + inject_template tool: render {{ op://Vault/Item/field }} and pipe to allowlisted commands without exposing the secret |
| Audit log | n/a | append-only JSONL at OP_AUDIT_LOG_PATH: every read_secret and inject_template records agent_id, session_id, fingerprint, reveal_allowed — never the value |
| Capability inventory | n/a | agent doctor capabilities [--json] enumerates every write/reveal env toggle (OP_ALLOW_REVEAL, CLOUDFLARE_*, DOCKER_API_*, PROXMOX_*, SSH_EXEC_*) with state + risk |
Transcripts
OpenClaw stores transcripts as JSONL and greps them. nexo-rs
keeps the JSONL (source of truth) and adds:
- SQLite FTS5 index (
data/transcripts.db) — write-through fromTranscriptWriter::append_entry. Thesession_logs searchagent tool usesMATCHqueries with phrase-escaped user input so operator strings can't inject FTS operators. - Pre-persistence redactor (opt-in) — regex pass over content
before write. 6 built-in patterns (Bearer JWT,
sk-…,sk-ant-…, AWS access keys, 64+ hex tokens, home paths) plus operator-definedextra_patterns. JSONL and FTS receive the same redacted text. - Atomic header writes —
OpenOptions::create_new(true)so 16 concurrent first-appends to the same session result in exactly one header line.
Durable workflows
OpenClaw doesn't ship a durable-flow primitive. nexo-rs has TaskFlow:
taskflowLLM tool with actionsstart | status | advance | wait | finish | fail | cancel | list_mine.- Three wait conditions:
Timer { at },ExternalEvent { topic, correlation_id },Manual. - Single global
WaitEngineticks every 5 s (configurable), resumes flows whose deadlines have passed. taskflow.resumeNATS subject lets external services wakeexternal_eventflows: publish{flow_id, topic, correlation_id, payload}and the bridge callstry_resume_external.agent flow list/show/cancel/resumefrom the CLI.- Guardrails:
timer_max_horizon(default 30 days) blocks unbounded waits; non-empty topic + correlation_id required forexternal_event.
LLM auth
| Dimension | OpenClaw | nexo-rs |
|---|---|---|
| Anthropic | API key | API key and claude_subscription OAuth PKCE flow — uses the operator's Claude Code subscription quota instead of API billing |
| MiniMax | API key | API key and Token Plan / Coding Plan OAuth bundle (api_flavor: anthropic_messages) |
| OpenAI-compat | API key | API key + DeepSeek wired out of the box (OpenAI-compat reuse) |
| Gemini | not in core | first-class client |
MCP
OpenClaw supports MCP as a client. nexo-rs is both:
- Client — stdio and HTTP transports, full tool / resource /
prompt catalog,
tools/list_changedhot-reload. - Server —
agent mcp-serverexposes the agent's own tools (filtered by allowlist) over stdio for Claude Desktop / Cursor / any MCP-aware host. Proxy tools (ext_*,mcp_*) are unconditionally hidden so the agent doesn't become an open relay.
Build size
target/release/agent 34 MB
target/release/agent (stripped) 29 MB
target/release/agent (.gz -9) 13 MB
For comparison, an OpenClaw install (Node + node_modules after
pnpm install) sits in the hundreds of megabytes — most of it
needed at runtime, not just build-time.
Where OpenClaw is still ahead
Honest list:
- Installer & onboarding flow — OpenClaw's
openclaw doctorfamily and the bundled installer give a smoother first-run UX than nexo-rs'sagent setupwizard, especially for non-Rust developers. - TS familiarity — the JS / TS audience for plugin authors is larger than the Rust audience; if your team writes mostly TypeScript, contributing back to OpenClaw is faster.
- Track record — OpenClaw has a longer release history, more maintainers, and more shipped extensions in the wild.
- Apps surface — OpenClaw ships iOS / Android / macOS companion apps; nexo-rs only ships the daemon and the loopback web admin (admin-ui Phase A0–A11 still in progress).
Summary
If you want operational guarantees (single binary, fault-tolerant broker, per-agent sandbox, durable workflows, secrets audit) and you're OK with Rust, nexo-rs.
If you want fast onboarding, a TS plugin ecosystem, and the OpenClaw apps, OpenClaw.
The two projects share enough vocabulary that moving an extension
between them is mostly a port, not a rewrite. The plugin SDK
shape (stdio-spoken JSON-RPC + a plugin.toml manifest) is
deliberately compatible.
Driver subsystem (Phase 67)
The driver subsystem turns the nexo-rs agent runtime into the
"human in the loop" for another agent — typically the Claude Code
CLI. It runs a goal-bound experiment: spawn the external CLI, watch
its tool-use stream, decide allow/deny on every action, feed back
acceptance failures, and stop only when the CLI claims "done" AND
objective verification passes.
This page describes the architectural shape; concrete impl details live with each sub-phase.
Why
Claude Code (or any other local CLI agent) is excellent at writing code, but it sometimes:
- over-claims completion — says "done" when tests are red;
- proposes destructive shell commands when stuck;
- forgets which approaches it already tried and failed.
A second agent — driven by nexo-rs, backed by a different LLM
(MiniMax M2.5), with persistent memory — closes those gaps.
Architecture
nexo-rs daemon
│
├─ "claude-driver" agent
│ ├─ LLM: MiniMax M2.5
│ ├─ memory: short_term + long_term + vector + transcripts
│ └─ skills: claude_cli, git_checkpoint, test_runner,
│ acceptance_eval, escalate
│
└─ MCP server (in-process)
└─ tool: permission_prompt(tool_name, input) → {allow|deny, message}
claude (subprocess, one per turn)
└─ claude --resume <id>
--output-format stream-json
--permission-prompt-tool mcp__nexo-driver__permission_prompt
--add-dir <worktree>
--allowedTools "Read,Grep,Glob,LS,WebFetch"
-p "<turn prompt>"
Termination model
Claude says "done" — driver does NOT trust it. Driver runs the goal's
acceptance criteria (cargo build, cargo test, cargo clippy,
PHASES marker, custom verifiers). Only when all pass is the goal
declared Done. Otherwise the failures are folded into the next
turn's prompt: "you said done, but here's what still fails — fix it".
The driver also stops on budget exhaustion: max turns, wall-time, tokens, or consecutive denies. On exhaustion the driver escalates to the operator (WhatsApp / Telegram via existing channel plugins) with a state dump.
Foundational types — nexo-driver-types
The contract — AgentHarness trait + Goal / Attempt / Decision
/ AcceptanceCriterion / BudgetGuards types — lives in the leaf
crate nexo-driver-types. Every value is serde-serializable so the
contract can travel through NATS, get re-imported by extensions, and
power admin-ui dashboards without dragging in the daemon.
How a turn flows (Phase 67.1)
#![allow(unused)] fn main() { use std::time::Duration; use nexo_driver_claude::{ClaudeCommand, spawn_turn}; use nexo_driver_types::CancellationToken; async fn doc(session_id: String) -> anyhow::Result<()> { let cmd = ClaudeCommand::discover("Implementa Phase 26.z")? .resume(session_id) .allowed_tools(["Read", "Grep", "Glob", "LS"]) .permission_prompt_tool("mcp__nexo-driver__permission_prompt") .cwd("/tmp/claude-runs/26-z"); let cancel = CancellationToken::new(); let mut turn = spawn_turn(cmd, &cancel, Duration::from_secs(600), Duration::from_secs(1)).await?; while let Some(ev) = turn.next_event().await? { // dispatch on ev (Assistant tool_use → permission_prompt; Result → done check) let _ = ev; } let _exit = turn.shutdown().await?; Ok(()) } }
next_event cooperatively races three signals via tokio::select!:
the cancel token, the per-turn deadline, and the JSONL stream. Errors
land as Cancelled, Timeout, ParseLine, etc. Cleanup is always
shutdown() — ChildHandle::Drop is the panic safety net.
Persistence (Phase 67.2)
SqliteBindingStore keeps (goal_id → claude session_id) plus
timestamps in a single claude_session_bindings table. Two filters
are applied on get:
- idle TTL —
last_active_atmust be withinidle_ttlof now; - max age —
created_at + max_agemust be in the future.
Either filter can be None (no filter) or Duration::ZERO (alias).
Three soft-delete-friendly operations live alongside clear:
mark_invalid(goal_id)flipslast_session_invalid = 1instead of deleting the row. Phase 67.8 (replay-policy) calls this when Claude rejects a session id mid-turn; the row stays for forensics.touch(goal_id)bumpslast_active_atonly. Driver loop calls it per observed event so the idle filter doesn't need a structural upsert per turn.purge_older_than(cutoff)reaps rows the operator no longer cares about. Phase 67.6 (worktree janitor) calls it nightly.
Schema migrations: PRAGMA user_version = 1 is the sentinel; every
open() runs CREATE TABLE/INDEX IF NOT EXISTS. Future v2 will
extend that helper.
Permission flow (Phase 67.3)
Every Claude tool call that isn't on the static allowlist
(Read,Grep,Glob,LS,WebFetch) goes through the MCP server before
execution:
Claude Code ─── tools/call mcp__nexo-driver__permission_prompt ───▶
│
stdio JSON-RPC
│
▼
nexo-driver-permission-mcp (child)
│
calls PermissionDecider
│
▼
{behavior: allow|deny, ...}
PermissionMcpServer exposes one tool, permission_prompt. The
in-process AllowSession cache keyed on (tool_name, hash(input))
short-circuits repeat calls (a Claude turn that re-reads the same
file pays the decider once).
Outcomes Claude receives are always one of two shapes:
{ "behavior": "allow" } // optional updatedInput
{ "behavior": "deny", "message": "..." }
Internally the driver tracks five outcomes — AllowOnce,
AllowSession{scope}, Deny, Unavailable, Cancelled — collapsing
the last three to deny on the wire. Unavailable (timeout) is
fail-closed by design.
Phase 67.3 ships the bin in placeholder modes (--allow-all for dev,
--deny-all <reason> for shadow). Phase 67.4 will swap those flags
for --socket <path> so the bin asks the daemon's LlmDecider
(MiniMax + memory) for each decision.
Goal lifecycle (Phase 67.4)
nexo-driver run goal.yaml
│
▼
DriverOrchestrator::run_goal
│
├─ workspace_manager.ensure(&goal) ─┐
│ │
├─ write_mcp_config(workspace, ├─ side-effects in
│ bin_path, socket_path) │ <workspace>/
│ │
├─ DriverSocketServer (already running) ──┘
│ spawned by builder, owned via JoinHandle
│
└─ for each turn:
├─ budget.is_exhausted? → BudgetExhausted{axis}
├─ AttemptStarted event
├─ run_attempt(ctx, params)
│ spawn `claude --resume <id> ... --mcp-config ...`
│ event-loop on stream-json
│ binding_store.upsert(session_id)
│ acceptance.evaluate(criteria, workspace)
│ return AttemptResult { outcome }
├─ AttemptCompleted event
└─ match outcome:
Done → break, GoalCompleted{Done}
NeedsRetry{f} → next turn with prior_failures
Continue{...} → next turn (e.g. session-invalid retry)
Cancelled → break
BudgetExhausted → break
Escalate{r} → emit Escalate event, break
AttemptOutcome::Continue covers two cases the loop treats the same:
the stream ended without Result::Success (Claude crashed early),
and a session not found reply that triggered
binding_store.mark_invalid so the next turn starts fresh.
NATS subjects emitted (when feature = "nats" and
emit_nats_events: true):
agent.driver.goal.{started,completed}agent.driver.attempt.{started,completed}agent.driver.decision(Phase 67.7 will populate whenLlmDeciderrecords its rationale)agent.driver.acceptanceagent.driver.budget.exhaustedagent.driver.escalateagent.driver.replay(Phase 67.8 — replay-policy verdict)agent.driver.compact(Phase 67.9 — compact-policy scheduled a/compact <focus>turn)
Compact policy (Phase 67.9)
Long agentic runs let Claude's context grow without bound. The
orchestrator runs a CompactPolicy after every successful work turn:
when running tokens cross threshold * context_window, the next
iteration is rewritten as a /compact <focus> slash command turn so
Claude Code shrinks its own context before the next work turn.
Compact turns absorb token usage but do not bump the goal's turn
counter, so they don't burn the budget. min_turns_between_compacts
prevents back-to-back compacts. Set context_window: 0 (or
enabled: false) in compact_policy: to disable.
Sub-phases
| Phase | What | Status |
|---|---|---|
| 67.0 | AgentHarness trait + types | ✅ |
| 67.1 | claude_cli skill (spawn + stream-json + resume) | ✅ |
| 67.2 | Session-binding store (SQLite) | ✅ |
| 67.3 | MCP permission_prompt in-process | ✅ |
| 67.4 | Driver agent loop + budget guards | ✅ |
| 67.5 | Acceptance evaluator | ✅ |
| 67.6 | Git worktree sandboxing + per-turn checkpoint | ✅ |
| 67.7 | Memoria semántica de decisiones | ✅ |
| 67.8 | Replay-policy (resume tras crash mid-turn) | ✅ |
| 67.9 | Compact opportunista | ✅ |
| 67.10 | Escalación a WhatsApp/Telegram | ⬜ |
| 67.11 | Shadow mode (calibración) | ⬜ |
| 67.12 | Multi-goal paralelo | ⬜ |
| 67.13 | Cost dashboard + admin-ui A4 tile | ⬜ |
See also
crates/driver-types/README.md— contract surface and layeringproyecto/PHASES.md— Phase 67 sub-phase status of record- OpenClaw reference:
research/src/agents/harness/types.ts - OpenClaw subprocess pattern:
research/extensions/codex/src/app-server/transport-stdio.ts
Project tracker + multi-agent dispatch (Phase 67.A–H)
The project-tracker subsystem lets a nexo-rs agent answer "qué fase
va el desarrollo" through Telegram / WhatsApp / a shell, and lets it
dispatch async programmer agents that ship phases on its behalf.
The implementation is layered:
| Layer | Crate | Responsibility |
|---|---|---|
| Project files | nexo-project-tracker | Parse PHASES.md + FOLLOWUPS.md, watch for changes, expose read tools. |
| Multi-agent state | nexo-agent-registry | DashMap + SQLite store of every in-flight goal, cap + queue + reattach. |
| Goal control | nexo-driver-loop | spawn_goal / pause_goal / resume_goal / cancel_goal per-goal. |
| Tool surface | nexo-dispatch-tools | program_phase, dispatch_followup, hook system, agent control + query, admin. |
| Capability gate | nexo-config + nexo-core | DispatchPolicy per agent / binding, ToolRegistry filter. |
Project tracker (Phase 67.A)
FsProjectTracker reads <root>/PHASES.md (required) and
<root>/FOLLOWUPS.md (optional) at startup, caches parsed state
behind a parking-lot RwLock with a 60 s TTL, and starts a notify
watcher on the parent directory that invalidates the cache on
Modify | Create | Remove events.
Read tools register through nexo_dispatch_tools::READ_TOOL_NAMES
(project_status, project_phases_list, followup_detail,
git_log_for_phase).
Set ${NEXO_PROJECT_ROOT} to point at a workspace other than the
daemon's cwd.
Multi-agent registry (Phase 67.B)
AgentRegistry is the single source of truth for every goal the
driver has admitted. Each entry holds an ArcSwap<AgentSnapshot>
(turn N/M, last acceptance, last decision summary, diff_stat) so
list_agents / agent_status readers never block writers.
admit(handle, enqueue)enforces the global cap. Beyond the cap,enqueue=trueparks the goal asQueued;enqueue=falserejects.release(goal_id, terminal)returns the next-up queued goal so the orchestrator can promote it viapromote_queuedonce the worktree / binding is ready.apply_attempt(AttemptResult)refreshes the live snapshot. Idempotent against out-of-order replay (lower turn_index ignored).- Reattach (Phase 67.B.4) walks the SQLite store at boot and
rehydrates
Runningrows. Withresume_running=falsethey flip toLostOnRestartand surface to the operator.
LogBuffer keeps a per-goal ring of recent driver events for the
agent_logs_tail tool — bounded so a chatty goal cannot OOM the
process.
Persistence wiring (Phase 71)
The bin reads agent_registry.store from
config/project-tracker/project_tracker.yaml and opens
SqliteAgentRegistryStore when the resolved path is non-empty.
Env placeholders (${NEXO_AGENT_REGISTRY_DB:-./data/agents.db})
are expanded before the open. Path open failures fall back to
MemoryAgentRegistryStore with a warn so a corrupt sqlite file
never bricks boot.
When the registry is sqlite-backed and reattach_on_boot: true,
the bin runs the reattach sweep with resume_running=false. Every
prior-run Running row flips to LostOnRestart, and any
notify_origin / notify_channel hook attached to that goal fires
once with an [abandoned] summary so the originating chat learns
the goal could not be resumed. Subprocess respawn is intentionally
not attempted — restoring a Claude Code worktree the daemon no
longer owns is unsafe to do silently and lives under Phase 67.C.1.
Shutdown drain (Phase 71.3)
On SIGTERM the bin runs nexo_dispatch_tools::drain_running_goals
before plugin teardown so notify_origin reaches WhatsApp /
Telegram while their adapters are still alive. Each Running goal's
Cancelled hooks fire with a [shutdown] summary; per-hook
dispatch is bounded by a 2 s timeout so a stuck publish cannot
hold shutdown hostage. The row then flips to LostOnRestart so
the next boot's reattach sweep does not re-fire the same
notification.
[shutdown] daemon stopping — goal `<id>` was running and has
been marked abandoned. Re-dispatch with `program_phase
phase_id=<phase>` if you still need it.
SIGKILL still bypasses this — the boot-time reattach sweep is the safety net for that case.
Turn-level audit log (Phase 72)
Live state (AgentSnapshot) only carries the latest decision /
diff / acceptance per goal. Once a turn rolls forward the previous
turn's data is gone. To answer "what did the agent actually do
across its 40 turns?" the runtime now writes a durable row per
turn into a goal_turns table on the same agents.db:
goal_turns(
goal_id TEXT,
turn_index INTEGER,
recorded_at INTEGER,
outcome TEXT, -- done | continue | needs_retry | …
decision TEXT, -- last Decision rendered as
-- "<tool> (allow|deny:msg|observe:note) — rationale"
summary TEXT, -- mirror of AgentSnapshot.last_progress_text
diff_stat TEXT,
error TEXT, -- pre-rendered for needs_retry / escalate / budget
raw_json TEXT, -- full AttemptResult payload
PRIMARY KEY (goal_id, turn_index)
);
EventForwarder writes a row on every AttemptResult event,
upsert-on-conflict so a replay can't dup history. The new chat tool
agent_turns_tail goal_id=<uuid> [n=20] returns a markdown table
of the last N rows (default 20, capped at 1000):
showing 20 of 40 turn(s) for `…`
| turn | outcome | decision | summary | error |
|---|---|---|---|---|
| 21 | continue | Edit (allow) — patch crate slack | wired Plugin trait | - |
| 22 | needs_retry | Bash (allow) — cargo build | … | E0432 in slack/src/lib.rs |
…
Best-effort writes: an append failure logs a warn but never blocks
the driver loop. When the registry isn't sqlite-backed (memory
fallback), the tool reports "set agent_registry.store in
project_tracker.yaml" rather than silently returning empty.
Async dispatch (Phase 67.C + 67.E)
DriverOrchestrator::spawn_goal(self: Arc<Self>, goal) returns a
tokio::task::JoinHandle so the calling tool returns the goal id
instantly without waiting for the run to finish. Per-goal pause /
cancel signals (watch<bool> and CancellationToken::child_token)
let pause_agent / cancel_agent target one goal without taking
down the rest of the orchestrator.
program_phase_dispatch is the heart of the dispatch surface: it
reads the sub-phase out of PHASES.md, runs DispatchGate::check,
constructs a Goal with the dispatcher / origin metadata, asks the
registry for a slot, and either spawns the goal or returns
Queued / Forbidden / NotFound. dispatch_followup is the
mirror that pulls the description from a FOLLOWUPS.md item.
Capability gate (Phase 67.D)
DispatchPolicy { mode, max_concurrent_per_dispatcher, allowed_phase_ids, forbidden_phase_ids } lives on AgentConfig
and (as Option<DispatchPolicy>) on InboundBinding. The
per-binding override fully replaces the agent-level value so an
operator can be precise per channel ("asistente is none
everywhere except this Telegram chat where it is full").
DispatchGate::check short-circuits in this order:
- capability
None→CapabilityNone(every kind). ReadOnlycapability + write kind →CapabilityReadOnly.- write +
require_trusted+!sender_trusted→SenderNotTrusted. Read tools bypass the trust gate solist_agentsstays open for unpaired senders. forbidden_phase_idsmatch →PhaseForbidden.- non-empty
allowed_phase_ids+ no match →PhaseNotAllowed. - dispatcher / sender / global caps. Global cap with
queue_when_full=trueis admitted; the orchestrator queues it. Without queue →GlobalCapReached.
ToolRegistry::apply_dispatch_capability(policy, is_admin) prunes
the registry of dispatch tool names not allowed by the resolved
policy. ToolRegistryCache::get_or_build_with_dispatch builds the
per-binding filtered registry that respects both allowed_tools
and dispatch_policy. Hot reload (Phase 18) constructs a fresh
ToolRegistryCache per snapshot, so a new dispatch_policy lands
on the next intake without restart; in-flight goals keep their
pre-reload tool surface so a hot reload never preempts.
Completion hooks (Phase 67.F)
Each hook is (on: HookTrigger, action: HookAction, id). Triggers
fire on Done | Failed | Cancelled | Progress { every_turns }.
Actions:
notify_origin— publish a markdown summary to the chat that triggered the goal. No-op whenorigin.plugin == "console".notify_channel { plugin, instance, recipient }— publish to an explicit channel different from the origin (escalate to ops).dispatch_phase { phase_id, only_if }— chain another goal whenonly_ifmatches the firing transition. Implemented via a pluggableDispatchPhaseChainerso the runtime ownsprogram_phase_dispatchplumbing.nats_publish { subject }— JSON payload to a custom subject.shell { cmd, timeout }— opt-in viaallow_shell_hooks. CapabilityPROGRAM_PHASE_ALLOW_SHELL_HOOKSregistered with the setup inventory soagent doctor capabilitiesflags it the moment the operator exports the env var. ReceivesNEXO_HOOK_GOAL_ID/PHASE_ID/TRANSITION/PAYLOAD_JSONenv vars.
HookIdempotencyStore (SQLite) keeps (goal_id, transition, action_kind, action_id) UNIQUE so at-least-once NATS replay or a
mid-hook restart cannot fire a hook twice.
HookRegistry (in-memory DashMap<GoalId, Vec<CompletionHook>>)
backs add_hook / remove_hook / agent_hooks_list.
NATS subjects (Phase 67.H.2)
| Subject | Producer |
|---|---|
agent.dispatch.spawned | program_phase_dispatch admitted |
agent.dispatch.denied | DispatchGate::check denied |
agent.tool.hook.dispatched | hook fired ok |
agent.tool.hook.failed | hook attempt errored |
agent.registry.snapshot.<goal_id> | per-goal periodic beacon |
agent.driver.progress | every Nth completed work-turn |
Plus the existing Phase 67.0–67.9 subjects:
agent.driver.{goal,attempt}.{started,completed},
agent.driver.{decision,acceptance,budget.exhausted,escalate,replay,compact}.
CLI (Phase 67.H.1)
nexo-driver-tools mirrors the chat tool surface for shell use:
nexo-driver-tools status [--phase <id> | --followups]
nexo-driver-tools dispatch <phase_id>
nexo-driver-tools agents list [--filter running|queued|...]
nexo-driver-tools agents show <goal_id>
nexo-driver-tools agents cancel <goal_id> [--reason "…"]
origin.plugin = "console" so notify_origin is a no-op (the
operator sees stdout, not a chat reply).
Built-in registration (nexo daemon)
The default nexo agent binary registers every dispatch
tool definition at boot via
nexo_core::agent::dispatch_handlers::register_dispatch_tools_into.
The LLM sees program_phase, list_agents, agent_status,
etc. in its toolset; per-binding dispatch_capability
(config/agents.yaml) prunes the write tools for bindings that
opted out.
What's NOT bundled by default is the runtime
DispatchToolContext — the orchestrator + registry + tracker
references the handlers consult. Without it, a tool call
returns a clean dispatch tools require AgentContext.dispatch to be set at boot error instead of pretending success. Two
integration paths from there:
- In-process orchestrator — boot a
DriverOrchestratoralongside the agents, share oneAgentRegistry. See the next section for the wiring sample. - NATS-based dispatch — agent bin publishes a message to
agent.driver.dispatch.requestthat a separatenexo-driverdaemon consumes. This is the topology to use when the Claude subprocess needs hardware (GPU box) the agent daemon doesn't have. The dispatch tool surface only changes in the registry it consults; operators can swap the in- processAgentRegistryfor one that mirrors a NATS-backed registry without touching the handlers.
Boot wiring (B8)
The integrator's main.rs ties everything together. Minimal
shape:
use std::sync::Arc;
use nexo_agent_registry::{AgentRegistry, MemoryAgentRegistryStore, LogBuffer};
use nexo_core::agent::{
dispatch_handlers::{register_dispatch_tools_into, DispatchToolContext},
tool_registry::ToolRegistry,
};
use nexo_dispatch_tools::{
event_forwarder::EventForwarder,
hooks::{DefaultHookDispatcher, HookRegistry, NoopNatsHookPublisher},
policy_gate::CapSnapshot,
NoopTelemetry,
};
use nexo_pairing::PairingAdapterRegistry;
use nexo_project_tracker::FsProjectTracker;
// 1. Project tracker.
let tracker: Arc<dyn nexo_project_tracker::ProjectTracker> =
Arc::new(FsProjectTracker::open(std::env::current_dir().unwrap())?);
// 2. Agent registry + log buffer.
let registry = Arc::new(AgentRegistry::new(
Arc::new(MemoryAgentRegistryStore::default()),
4,
));
let log_buffer = Arc::new(LogBuffer::new(200));
let hook_registry = Arc::new(HookRegistry::new());
// 3. Hook dispatcher with the channel adapters that Phase 26
// registered (whatsapp / telegram).
let pairing = PairingAdapterRegistry::new();
// pairing.register(WhatsappPairingAdapter::new(...));
// pairing.register(TelegramPairingAdapter::new(...));
let hook_dispatcher = Arc::new(DefaultHookDispatcher::new(
pairing,
Arc::new(NoopNatsHookPublisher),
));
// 4. Orchestrator with EventForwarder so registry / log_buffer /
// hooks see every driver event.
let inner_sink: Arc<dyn nexo_driver_loop::DriverEventSink> =
Arc::new(nexo_driver_loop::NoopEventSink);
let event_sink: Arc<dyn nexo_driver_loop::DriverEventSink> =
Arc::new(EventForwarder::new(
registry.clone(),
log_buffer.clone(),
hook_registry.clone(),
hook_dispatcher.clone(),
inner_sink,
));
// (orchestrator builder consumes event_sink)
// 5. Bundle for AgentContext.dispatch.
let dispatch_ctx = Arc::new(DispatchToolContext {
tracker,
orchestrator: orch.clone(),
registry,
hooks: hook_registry,
log_buffer,
default_caps: CapSnapshot {
queue_when_full: true,
..Default::default()
},
require_trusted: true,
telemetry: Arc::new(NoopTelemetry),
});
// 6. Register the handlers into the base ToolRegistry. The
// per-binding cache prunes write tools when capability=None
// or read_only.
let base = ToolRegistry::new();
register_dispatch_tools_into(&base);
// 7. Per-session AgentContext.with_dispatch(dispatch_ctx)
// + .with_sender_trusted(true) + .with_inbound_origin(plugin,
// instance, sender).
Without step 6 the handlers exist but aren't reachable by the LLM. Without step 4 the registry / log_buffer / hooks stay inert. Without step 5 the handlers return MissingDispatchCtx.
See also
proyecto/PHASES.md— Phase 67.A–H sub-phase status of record.architecture/driver-subsystem.md— Phase 67.0–67.9 driver loop- replay + compact policies.
Configuration layout
nexo-rs loads configuration from a single directory (passed via
--config <path>, default ./config). The runtime reads a small set
of required YAML files and a handful of optional ones.
Source: crates/config/src/lib.rs::AppConfig::load.
Directory tree
config/
├── agents.yaml # required — base agent catalog
├── agents.d/ # optional — drop-in agents, merged in alpha order
│ ├── ana.example.yaml # template (committed)
│ └── *.yaml # real definitions (gitignored)
├── broker.yaml # required — NATS / local broker + disk queue
├── llm.yaml # required — LLM providers
├── memory.yaml # required — short-term + long-term + vector
├── extensions.yaml # optional — extension search paths, toggles
├── mcp.yaml # optional — MCP servers the agent consumes
├── mcp_server.yaml # optional — expose this agent as an MCP server
├── tool_policy.yaml # optional — per-tool / per-agent policy
├── runtime.yaml # optional — hot-reload watcher settings
├── plugins/
│ ├── whatsapp.yaml
│ ├── telegram.yaml
│ ├── email.yaml
│ ├── browser.yaml
│ ├── google.yaml
│ └── gmail-poller.yaml
└── docker/ # optional — overrides for containerized runs
├── agents.yaml
├── llm.yaml
└── …
Required vs optional
The loader fails startup if any required file is missing or malformed.
Optional files return None when absent and unlock related features
only if present.
| File | Kind |
|---|---|
agents.yaml | required |
broker.yaml | required |
llm.yaml | required |
memory.yaml | required |
extensions.yaml | optional |
mcp.yaml | optional |
mcp_server.yaml | optional |
tool_policy.yaml | optional |
runtime.yaml | optional — hot-reload knobs; defaults enable reload at 500 ms debounce. See Config hot-reload. |
plugins/*.yaml | optional (only needed for plugins you enable) |
Drop-in agents
Files under config/agents.d/*.yaml are merged into the base
agents.yaml in lexicographic filename order. Each file has the
same top-level shape (agents: [...]); entries append to the base
list.
Common patterns:
00-dev.yaml/10-prod.yaml— control override order by numeric prefix- Keep
agents.yamlpublic-safe and drop sensitive business content (sales prompts, pricing, phone numbers) into gitignoredconfig/agents.d/ana.yaml - Ship
config/agents.d/<name>.example.yamlas a template so the shape stays discoverable
Details in Drop-in agents.
Docker layout
config/docker/ mirrors the main layout and is consumed when the
compose file mounts it at /app/config/docker:
# docker-compose.yml
command: ["agent", "--config", "/app/config/docker"]
Secrets inside Docker containers live at /run/secrets/<name> — the
compose definitions use ${file:/run/secrets/...} references. See
LLM config — auth for the full secret
resolution rules.
Env vars and secrets in YAML
YAML values can reference env vars and files:
| Syntax | Meaning |
|---|---|
${VAR} | read env var, fail if unset or empty |
${VAR:-fallback} | env var if set and non-empty, else fallback |
${VAR-fallback} | env var if set (even empty), else fallback |
${file:./secrets/x} | read file contents, trimmed of whitespace |
Path-traversal rules for ${file:...}:
- Relative paths are rooted at the current working directory
..segments are rejected outright- Absolute paths must sit under one of these whitelisted roots:
/run/secrets/(Docker secrets)/var/run/secrets/(Kubernetes projected volumes)./secrets/(project-local)- the directory pointed at by
$CONFIG_SECRETS_DIR(operator-defined)
Everything else is refused at parse time with an explicit error naming the invalid path and the allowed roots.
Validation
All config structs deserialize with #[serde(deny_unknown_fields)], so
typos fail fast:
unknown field `modl`, expected `model`
at line 4, column 5 in config/agents.yaml
Missing required fields produce the same kind of message:
missing field `model`
at line 5, column 3 in config/agents.yaml
Env / file resolution errors identify the placeholder and the file:
env var MINIMAX_API_KEY not set (referenced in llm.yaml)
${file:../etc/passwd}: `..` not allowed in file reference (in broker.yaml)
Boot sequence
flowchart TD
START([agent --config path]) --> LOAD[AppConfig::load]
LOAD --> REQ{required files<br/>present & parseable?}
REQ -->|no| FAIL([fail fast, exit 1])
REQ -->|yes| OPT[read optional files]
OPT --> DROP[merge config/agents.d/]
DROP --> RESOLVE[resolve env / file placeholders]
RESOLVE --> VAL[struct-level validation<br/>deny_unknown_fields]
VAL --> SEM[semantic validation<br/>validate_agents, MCP headers]
SEM --> READY([AppConfig ready])
Next
- agents.yaml — full agent schema
- llm.yaml — LLM provider schema + auth modes
- broker.yaml — NATS + disk queue
- memory.yaml — short/long/vector
- Drop-in agents — merge order and patterns
agents.yaml
The agent catalog. One entry per agent; each entry declares the model, channels, tools, sandboxing, and behavioral knobs for that agent.
Source: crates/config/src/types/agents.rs.
Top-level shape
agents:
- id: ana
model:
provider: minimax
model: MiniMax-M2.5
plugins: [whatsapp]
inbound_bindings:
- plugin: whatsapp
allowed_tools:
- whatsapp_send_message
outbound_allowlist:
whatsapp:
- "573000000000"
system_prompt: |
You are Ana, …
Full field reference
All fields use #[serde(deny_unknown_fields)] — typos fail fast.
Identity & model
| Field | Type | Required | Default | Purpose |
|---|---|---|---|---|
id | string | ✅ | — | Unique agent id. Used as session key, subject suffix, workspace dir name. |
model.provider | string | ✅ | — | Provider key in llm.yaml (e.g. minimax, anthropic). |
model.model | string | ✅ | — | Model id understood by that provider. |
description | string | — | "" | Human-readable role. Injected into # PEERS for delegation discovery. |
Channels
| Field | Type | Default | Purpose |
|---|---|---|---|
plugins | [string] | [] | Plugin ids this agent wants to expose tools for (whatsapp, telegram, browser, …). |
inbound_bindings | array | [] | Per-plugin binding list. Empty = legacy wildcard (receive everything). |
Each inbound_bindings[] entry can override the agent-level
defaults for that channel: allowed_tools, outbound_allowlist,
skills, model, system_prompt_extra, sender_rate_limit,
allowed_delegates. Useful for running the same agent on two channels
with different rules. See Per-binding capability override
below for the full override surface and merge rules.
Tool sandboxing
| Field | Type | Default | Purpose |
|---|---|---|---|
allowed_tools | [string] | [] | Build-time pruning of the tool registry. Glob suffix * allowed. Empty = all tools registered. |
tool_rate_limits | object | null | Per-tool rate limit patterns. Glob-matched. |
tool_args_validation.enabled | bool | true | Toggle JSON-schema validation of tool arguments. |
outbound_allowlist | object | {} | Per-plugin recipient allowlist (e.g. phone numbers, chat ids). Defense-in-depth for send tools. |
allowed_tools semantics:
- For legacy agents (no
inbound_bindings) the allowlist is applied at registry-build time — tools not matching the patterns are removed from the registry before the LLM sees them. - For agents with
inbound_bindingsthe base registry keeps every tool and enforcement happens per-binding at turn time (see Per-binding capability override) so a binding's override can both narrow AND expand within the registry. Defense-in-depth: the LLM only receives tools allowed by the matched binding, and the tool-call execution path rejects any hallucinated name outside the same allowlist.
In both modes the LLM never receives disallowed tool definitions; the difference is where the filter is applied.
System prompt & workspace
| Field | Type | Default | Purpose |
|---|---|---|---|
system_prompt | string | "" | Prepended to every LLM turn. Defines persona, rules, examples. |
workspace | path | "" | Directory with IDENTITY.md, SOUL.md, USER.md, AGENTS.md, MEMORY.md. Loaded at turn start. See Soul, identity & learning. |
extra_docs | [path] | [] | Workspace-relative markdown files appended as # RULES — <filename>. |
transcripts_dir | path | "" | Directory for per-session JSONL transcripts. Empty = disabled. |
skills_dir | path | "./skills" | Base directory for local skill files. |
skills | [string] | [] | Local skill ids to inject into the system prompt. Resolved from skills_dir. |
language | string | null | Output language for the LLM's reply. ISO code ("es", "en", "en-US") or human name ("Spanish", "español"). When set, the runtime renders a # OUTPUT LANGUAGE system block telling the model to keep workspace docs in English (single source of truth, plays nicely with recall + dreaming) but reply to the user in the configured language. Per-binding language overrides this for the matched channel. See Output language. |
Heartbeat
heartbeat:
enabled: true
interval: 30s
| Field | Type | Default | Purpose |
|---|---|---|---|
heartbeat.enabled | bool | false | Turn heartbeat on for this agent. |
heartbeat.interval | humantime | "5m" | Interval between on_heartbeat() fires. |
See Agent runtime — Heartbeat.
Runtime knobs
config:
debounce_ms: 2000
queue_cap: 32
| Field | Type | Default | Purpose |
|---|---|---|---|
config.debounce_ms | u64 | 2000 | Debounce window for burst-of-messages coalescing. |
config.queue_cap | usize | 32 | Per-agent mailbox capacity. |
sender_rate_limit.rps | f64 | — | Per-sender token-bucket refill rate. |
sender_rate_limit.burst | u64 | — | Bucket size. |
Agent-to-agent delegation
| Field | Type | Default | Purpose |
|---|---|---|---|
allowed_delegates | [glob] | [] | Peers this agent may delegate to. Empty = no restriction. |
accept_delegates_from | [glob] | [] | Inverse gate: peers allowed to delegate to this agent. |
Routing uses agent.route.<target_id> over NATS with a
correlation_id. See Event bus — Agent-to-agent routing.
Dreaming (memory consolidation)
dreaming:
enabled: false
interval_secs: 86400
min_score: 0.35
min_recall_count: 3
min_unique_queries: 2
max_promotions_per_sweep: 20
weights:
frequency: 0.24
relevance: 0.30
recency: 0.15
diversity: 0.15
consolidation: 0.10
Defaults shown. See Soul — Dreaming.
Workspace-git
workspace_git:
enabled: false
author_name: "agent"
author_email: "agent@localhost"
When enabled, the agent's workspace directory is a git repo that the
runtime commits to after dream sweeps, forge_memory_checkpoint, and
session close. Good for forensic replay.
Google auth (per-agent OAuth)
google_auth:
client_id: ${GOOGLE_CLIENT_ID}
client_secret: ${file:./secrets/google_secret.txt}
scopes:
- https://www.googleapis.com/auth/gmail.readonly
token_file: ./data/workspace/ana/google_token.json
redirect_port: 17653
Used by crates/plugins/google to run OAuth PKCE per agent.
Deprecated in Phase 17 — prefer declaring Google accounts in a
dedicated config/plugins/google-auth.yaml and binding them from
credentials.google (see next section). Inline google_auth still
boots with a warn so existing deployments keep working; it is
auto-migrated into the credential store at startup.
Credentials (per-agent WhatsApp / Telegram / Google)
Pins each agent to the plugin instance / Google account it may use for outbound traffic. The runtime resolves the target at publish time from the agent id — the LLM cannot pick the instance via tool args, closing the prompt-injection vector.
credentials:
whatsapp: personal # must match whatsapp.yaml instance label
telegram: ana_bot # must match telegram.yaml instance label
google: ana@gmail.com # must match google-auth.yaml accounts[].id
# Silence the "inbound ≠ outbound" warning when intentional:
# telegram_asymmetric: true
Validated at boot by the gauntlet (agent --check-config runs the same
checks without starting the daemon). Omitting credentials: keeps the
legacy single-account behavior for back-compat.
Full schema + migration guide:
config/credentials.md.
Relationship diagram
flowchart LR
AG[agent entry] --> MOD[model provider]
AG --> PL[plugins list]
AG --> IB[inbound_bindings]
AG --> AT[allowed_tools]
AG --> OA[outbound_allowlist]
AG --> WS[workspace]
AG --> HB[heartbeat]
AG --> DEL[delegation gates]
IB -->|per-binding override| AT
IB -->|per-binding override| OA
MOD -->|resolved from| LLM[llm.yaml]
PL -->|tools from| PLUG[plugins/*.yaml]
WS -->|files| SOUL[SOUL.md /<br/>IDENTITY.md /<br/>MEMORY.md]
Per-binding capability override
A single agent can expose distinct capability surfaces per
InboundBinding without running two agent processes. Typical use:
the same Ana agent answers WhatsApp with a narrow sales-only surface
and Telegram with the full catalogue.
Schema
Every inbound_bindings[] entry accepts the following optional
overrides. Unset fields inherit the agent-level value.
| Field | Type | Strategy | Notes |
|---|---|---|---|
allowed_tools | [string] | replace | ["*"] = every registered tool |
outbound_allowlist | object | replace (whole) | Whatsapp/telegram recipient lists |
skills | [string] | replace | Resolved from agent-level skills_dir |
model | object | replace | Must keep the same provider |
system_prompt_extra | string | append | Rendered as # CHANNEL ADDENDUM block |
sender_rate_limit | inherit | disable | {rps, burst} | 3-way | Untagged enum |
allowed_delegates | [string] | replace | Peer allowlist for the delegate tool |
language | string | replace | Output language for replies on this channel. Falls through to the agent-level language field when omitted. See Output language. |
Anything else (workspace, transcripts_dir, heartbeat, memory,
workspace_git, google_auth) stays at the agent level — identity
and persistent state do not change per channel.
Example
agents:
- id: ana
model: { provider: anthropic, model: claude-haiku-4-5 }
plugins: [whatsapp, telegram]
workspace: ./data/workspace/ana
skills_dir: ./skills
system_prompt: |
You are Ana.
allowed_tools: [] # agent-level = permissive; bindings narrow
outbound_allowlist: {}
inbound_bindings:
- plugin: whatsapp
allowed_tools: [whatsapp_send_message]
outbound_allowlist:
whatsapp: ["573115728852"]
skills: []
sender_rate_limit: { rps: 0.5, burst: 3 }
system_prompt_extra: |
Channel: WhatsApp sales. Follow the ETB/Claro lead flow.
- plugin: telegram
instance: ana_tg
allowed_tools: ["*"]
outbound_allowlist:
telegram: [1194292426]
skills: [browser, github, openstreetmap]
model: { provider: anthropic, model: claude-sonnet-4-5 }
allowed_delegates: ["*"]
sender_rate_limit: disable
system_prompt_extra: |
Channel: private Telegram. Full tool access allowed.
Boot-time validation
The runtime rejects configs with:
- Duplicate
(plugin, instance)tuples in the same agent. - Telegram
instancereferenced by a binding but not declared inconfig/plugins/telegram.yaml. - Binding
model.providerdifferent from the agent-level provider (the LLM client is wired once per agent). - Skills listed in a binding whose directory does not exist under
skills_dir.
A binding that sets no overrides is allowed but logs a warn.
Matching order
Bindings are evaluated top-to-bottom; the first match wins. If
you have both {plugin: telegram, instance: None} (wildcard) and
{plugin: telegram, instance: "admin"}, declare the specific entry
first — otherwise the wildcard consumes every Telegram event.
Runtime isolation
- Tool list shown to the LLM is filtered through the binding's
allowed_tools; tools hidden on WhatsApp remain invisible even if the LLM hallucinates the name. - Tool-call execution re-checks the allowlist and returns
not_allowedfor anything outside — stops hallucination loops without executing the forbidden tool. - Outbound tools (
whatsapp_send_message,telegram_send_message) readoutbound_allowlistfrom the matched binding, so WhatsApp sends on the sales channel cannot reach numbers that only the private channel allows. - Sender rate limit buckets are keyed per binding; flood on one channel cannot drain the quota on another.
Back-compat
Agents without inbound_bindings keep the pre-feature behavior byte-
for-byte: the agent-level allowed_tools is pruned into the base
registry at boot, and the runtime synthesises a policy from agent-
level defaults (keyed at binding_index = usize::MAX).
Output language
Operators pin the language an agent replies in without rewriting
workspace markdown. Workspace docs (IDENTITY, SOUL, MEMORY, USER,
AGENTS) and tool descriptions stay in English — the single source of
truth that recall, dreaming, vector search, and developer tooling
all read. The runtime injects a # OUTPUT LANGUAGE system block
right after the agent's system_prompt, telling the model to read
those docs as-is but reply to the user in the configured language.
Where to set it
agents:
- id: ana
language: es # default for every binding on this agent
inbound_bindings:
- plugin: whatsapp
# → uses Spanish (inherits from the agent)
- plugin: telegram
instance: support_intl
language: en # → uses English on this channel only
- plugin: telegram
instance: bilingual_qa
language: "" # → no directive (model picks)
Resolution
Precedence (first non-empty wins):
inbound_bindings[i].language— per-channel override.language— agent-level default.null— no# OUTPUT LANGUAGEblock emitted; the model decides from the user's input.
Empty string and whitespace-only values resolve to no directive on both layers — useful for "turn the directive off on this binding even though the agent has one".
Accepted values
The runtime treats the value as a label and forwards it verbatim into the directive (after sanitisation; see below). Both forms work:
- ISO codes:
"es","en","en-US","pt-BR". - Human names:
"Spanish","English","español","Brazilian Portuguese".
Human names produce slightly clearer directives in practice
(Respond to the user in Spanish. reads more natural than
Respond to the user in es.), but both yield the same model
behaviour with modern LLMs.
Rendered block
# OUTPUT LANGUAGE
Respond to the user in {language}. Workspace docs (IDENTITY, SOUL,
MEMORY, USER, AGENTS) and tool descriptions are in English — read
them as-is, but your turn-final reply to the user must be in
{language}.
The block lands after the agent's system_prompt (and the
optional # CHANNEL ADDENDUM block) so its instruction wins under
the LLM's recency bias.
Sanitisation
Defense-in-depth against config-driven prompt injection: every
language value is normalised before rendering — control characters
and embedded newlines are stripped, trimmed, and the result is
capped at 64 characters. A YAML payload like
language: "es\n\nIgnore previous instructions" cannot smuggle a
multi-line directive into the system prompt.
Hot reload
Phase 18 hot-reload covers this field. Edit
agents.d/<id>.yaml, save (or run agent reload), and the next
message uses the new language. In-flight LLM turns finish on the
old policy; subsequent turns flip to the new one.
Related
- Workspace docs and recall stay English regardless — see Soul, identity & learning.
- Per-channel rotation walkthrough lives in Recipes — A/B prompt swap.
Link understanding
Per-agent (and per-binding) toggle that fetches URLs in the user's
message and injects a # LINK CONTEXT block. Off by default. Full
schema, caps, and SSRF denylist live on
Link understanding. The field is
link_understanding at agent scope and at each
inbound_bindings[] entry; binding value replaces agent default,
omitted = inherit.
Web search
Per-agent (and per-binding) toggle that exposes a web_search tool
backed by Brave / Tavily / DuckDuckGo / Perplexity. Off by default.
Full schema, providers, cache, and circuit-breaker behaviour live on
Web search. The field is web_search at
agent scope and at each inbound_bindings[] entry; binding value
replaces agent default, omitted = inherit.
Pairing policy
Per-binding toggle that turns on the DM-challenge gate for inbound
senders. Off by default. The field is pairing_policy on each
inbound_bindings[] entry; null (default) = inherit agent value
or skip the gate entirely. Full protocol, threat model, and CLI
reference live on Pairing.
Common mistakes
- Forgetting
plugins: [...]. An agent withoutpluginshas no inbound channel and no outbound tools. It is inert. - Setting
allowed_toolswithout a wildcard.["memory_*"]allows the fullmemory_*family;["memory_store"]allows only one. Check the glob before assuming. - Large
system_promptduplication across agents. Useinbound_bindings[].system_prompt_extrato add per-channel content without duplicating the whole prompt. - Sharing a WhatsApp session across agents. Each agent's
workspaceshould contain its ownwhatsapp/defaultsession; the wizard does this automatically, but pointing two agents at the same session dir will cause message cross-delivery. - Translating the workspace markdown to match
language. Don't. Workspace docs are the single source of truth read by recall, dreaming, and developer tooling — keep them in English. The# OUTPUT LANGUAGEblock tells the model to translate the reply on its way out.
Next
- Drop-in agents — merging multiple agent files
- llm.yaml — where
model.provideris resolved - Skills catalog — names that go in
allowed_tools
llm.yaml
LLM provider registry. Each agent's model.provider must resolve to a
key in this file.
Source: crates/config/src/types/llm.rs.
Shape
providers:
minimax:
api_key: ${MINIMAX_API_KEY:-}
group_id: ${MINIMAX_GROUP_ID:-}
base_url: https://api.minimax.io
rate_limit:
requests_per_second: 2.0
quota_alert_threshold: 100000
anthropic:
api_key: ${ANTHROPIC_API_KEY:-}
base_url: https://api.anthropic.com
rate_limit:
requests_per_second: 2.0
auth:
mode: oauth_bundle
bundle: ./secrets/anthropic_oauth.json
retry:
max_attempts: 5
initial_backoff_ms: 1000
max_backoff_ms: 60000
backoff_multiplier: 2.0
Per-provider fields
| Field | Type | Required | Default | Purpose |
|---|---|---|---|---|
api_key | string | ✅ | — | API key. Supports ${ENV_VAR} and ${file:…}. |
base_url | url | ✅ | — | API endpoint. Override to use a proxy or a local server. |
group_id | string | — | — | MiniMax-only. Group identifier. |
rate_limit.requests_per_second | f64 | — | 2.0 | Outbound throttle. |
rate_limit.quota_alert_threshold | u64 | — | — | Optional soft-alarm tokens-per-day threshold. |
api_flavor | enum | — | openai_compat | openai_compat or anthropic_messages. Lets MiniMax expose the Anthropic wire. |
embedding_model | string | — | — | Override model used for embeddings (e.g. Gemini's text-embedding-004). |
safety_settings | JSON | — | — | Gemini-only; attached verbatim to requests. |
Top-level retry block
Applies to every provider that doesn't define its own:
| Field | Default | Purpose |
|---|---|---|
max_attempts | 5 | Total attempts including the first try. |
initial_backoff_ms | 1000 | First backoff. |
max_backoff_ms | 60000 | Cap. |
backoff_multiplier | 2.0 | Exponential factor. |
Retries are jittered to avoid thundering-herd reconnects. See Fault tolerance — Retry policies.
Auth modes
auth:
mode: auto | static | token_plan | oauth_bundle
bundle: ./secrets/anthropic_oauth.json
setup_token_file: ./secrets/anthropic_setup.json
refresh_endpoint: https://auth.example.com/refresh
client_id: your-oauth-client
mode | When |
|---|---|
auto | Let the provider client decide from available credentials. |
static | Use api_key verbatim. |
token_plan | MiniMax "Token Plan" OAuth bundle. |
oauth_bundle | Anthropic PKCE OAuth bundle written by agent setup. |
Supported providers
| Key | Notes |
|---|---|
minimax | Primary provider. MiniMax M2.5. OpenAI-compat or Anthropic-flavour wire. |
anthropic | Claude models. API key or OAuth subscription. |
openai | OpenAI API and anything speaking its wire (Ollama, Groq, local proxies). |
gemini | Google Gemini, including embedding support. |
Provider-specific docs
Common mistakes
api_key: sk-…committed to git. Use${ENV_VAR}or${file:./secrets/…}; thesecrets/directory is gitignored.- Mismatched
embedding_modeldimensions. The vector store assertsembedding.dimensionsmatches the model output. A mismatch aborts startup with an explicit message. - Setting both
api_keyandauth.mode: oauth_bundle. The auth mode wins. Theapi_keyis kept as a fallback for tools that bypass the OAuth path.
Input-token reduction (context_optimization)
Four independent kill switches for prompt caching, online history compaction, pre-flight token counting, and the workspace bundle cache. Full schema, defaults, and rollout guidance in Operations → Context optimization.
broker.yaml
Broker topology, disk persistence, and fallback behavior.
Source: crates/config/src/types/broker.rs.
Shape
broker:
type: nats # nats | local
url: nats://localhost:4222
auth:
enabled: false
nkey_file: ./secrets/nats.nkey
persistence:
enabled: true
path: ./data/queue
limits:
max_payload: 4MB
max_pending: 10000
fallback:
mode: local_queue
drain_on_reconnect: true
Fields
| Field | Type | Default | Purpose |
|---|---|---|---|
type | nats | local | local | local keeps the whole bus in-process; nats uses a real NATS server. |
url | url | — | NATS connection URL (ignored when type: local). |
auth.enabled | bool | false | Turn on NKey mTLS. |
auth.nkey_file | path | — | Path to the NKey file when auth.enabled. |
persistence.enabled | bool | true | Turn on the SQLite disk queue. |
persistence.path | path | ./data/queue | Directory for the disk queue SQLite DB. |
limits.max_payload | size | 4MB | Reject events larger than this. |
limits.max_pending | u64 | 10000 | Hard cap on the disk queue; past this, oldest events are shed. |
fallback.mode | local_queue | drop | local_queue | What to do when NATS is unreachable. |
fallback.drain_on_reconnect | bool | true | Replay the disk queue when NATS returns. |
Operational notes
type: localfor single-machine dev. You don't need NATS running just to try the agent. The local broker matches NATS subject semantics, so everything works the same.- Disk queue always on in production. Even on a single machine. It's the guarantee against losing events on a NATS blip.
drain_on_reconnect: trueis FIFO. See Event bus — Disk queue.
See also:
memory.yaml
Short-term sessions, long-term SQLite storage, and optional vector search.
Source: crates/config/src/types/memory.rs.
Shape
short_term:
max_history_turns: 50
session_ttl: 24h
max_sessions: 10000
long_term:
backend: sqlite
sqlite:
path: ./data/memory.db
vector:
enabled: false
backend: sqlite-vec
default_recall_mode: hybrid
embedding:
provider: http
base_url: https://api.openai.com/v1
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY}
dimensions: 1536
timeout_secs: 30
Short-term
Per-session conversation buffer held in memory by
SessionManager.
| Field | Default | Purpose |
|---|---|---|
max_history_turns | 50 | Turns kept before oldest are pruned into long-term memory. |
session_ttl | 24h | How long a session lives idle before eviction. humantime syntax. |
max_sessions | 10000 | Soft cap. On overflow the oldest-idle session is evicted (fires on_expire). 0 = unbounded. |
Long-term
Persisted memory, durable across restarts.
| Field | Options | Default | Purpose |
|---|---|---|---|
backend | sqlite | redis | sqlite | Storage engine. |
sqlite.path | path | ./data/memory.db | SQLite file (with sqlite-vec extension loaded when vector enabled). |
redis.url | url | — | Redis connection string (when backend: redis). |
Vector
Opt-in semantic memory.
| Field | Default | Purpose |
|---|---|---|
enabled | false | Opt-in. |
backend | sqlite-vec | Zero-extra-infrastructure vector index. |
default_recall_mode | hybrid | Used when the memory tool call omits mode. Options: keyword, vector, hybrid. |
embedding.provider | http | Where to fetch embeddings. http = any OpenAI-compatible embeddings server. |
embedding.base_url | — | Embeddings endpoint. |
embedding.model | — | Model id, e.g. text-embedding-3-small, nomic-embed-text. |
embedding.api_key | — | Key for the embeddings server. Supports ${ENV_VAR} / ${file:…}. |
embedding.dimensions | — | Must match the model output (1536 for OpenAI 3-small; 768 for nomic). Mismatch aborts startup. |
embedding.timeout_secs | 30 | Embeddings request timeout. |
Memory layers
flowchart LR
MSG[incoming message] --> STM[short-term<br/>in-memory buffer]
STM -->|turns exceed max| PRUNE[prune]
PRUNE --> LTM[(long-term<br/>SQLite)]
LTM --> EMB{vector<br/>enabled?}
EMB -->|yes| VEC[(sqlite-vec index)]
TOOL[memory tool] --> RECALL{recall mode}
RECALL -->|keyword| LTM
RECALL -->|vector| VEC
RECALL -->|hybrid| LTM
RECALL -->|hybrid| VEC
Per-agent isolation
Each agent's memory DB lives under its workspace when
workspace_git is enabled — keeps memories forensically reviewable and
prevents one agent from reading another's history.
See also:
Drop-in agents
config/agents.d/*.yaml is a merge-directory for agent definitions
that should not live in agents.yaml — typically anything with
business content (sales prompts, pricing tables, internal phone
numbers, customer-facing identities).
Source: crates/config/src/lib.rs (merge logic).
Why it exists
- Keep
agents.yamlpublic-safe and checked into git - Keep sensitive content gitignored and loaded at runtime
- Compose layered configs (
00-dev.yaml,10-prod.yaml) without editing a single monolithic file - Ship
.example.yamltemplates so the shape stays discoverable
.gitignore rules include:
config/agents.d/*.yaml
!config/agents.d/*.example.yaml
The .example.yaml files are committed and serve as templates; the
real .yaml files are not.
Merge order
Files are loaded in lexicographic filename order and their agents
arrays are concatenated to the base agents.yaml:
flowchart TD
BASE[agents.yaml] --> MERGE[merged catalog]
D1[agents.d/00-shared.yaml] --> MERGE
D2[agents.d/10-ana.yaml] --> MERGE
D3[agents.d/20-kate.yaml] --> MERGE
EX[agents.d/ana.example.yaml] -.->|gitignored template<br/>usually not loaded| MERGE
Every file must have the top-level agents: [...] shape:
# config/agents.d/10-ana.yaml
agents:
- id: ana
model:
provider: minimax
model: MiniMax-M2.5
plugins: [whatsapp]
inbound_bindings:
- plugin: whatsapp
system_prompt: |
…private content…
Agent id collisions
Two files cannot define the same agent.id. On collision the loader
fails fast with a clear message. If you want to override an agent,
either:
- Replace the entry (rename or remove the original)
- Use
inbound_bindings[]per-binding overrides inside a single entry
Common patterns
Public vs. private split
config/agents.yaml # committed, only support/ops agents
config/agents.d/ana.yaml # gitignored, full sales prompt
config/agents.d/kate.yaml # gitignored, personal assistant
config/agents.d/ana.example.yaml # committed, empty template
Environment layering
config/agents.d/00-common.yaml # shared defaults
config/agents.d/10-dev.yaml # dev-only overrides (loaded only on dev box)
Swap the 10-*.yaml file per environment. Docker compose can mount
the right one from a secret volume.
Validation
#[serde(deny_unknown_fields)]still applies to every filevalidate_agents()runs after the merge — checks duplicate ids, missing plugin references, invalid skill directories- Errors name the file and the offending agent id
Per-agent credentials
Bind each agent to specific WhatsApp / Telegram / Google accounts so outbound traffic originates from the right number, bot, or mailbox — never from a shared pool.
Mental model
Three layers:
- Plugin instance — a labelled WhatsApp session or Telegram bot in
config/plugins/{whatsapp,telegram}.yaml. Each instance owns its own token / session_dir and an optionalallow_agentslist. - Google account — an entry in the optional
config/plugins/google-auth.yaml. Each account is 1:1 with anagent_id. - Agent binding — in
config/agents.d/<agent>.yaml, thecredentials:block pins the agent to the instance / account it may use for outbound tool calls.
The runtime runs a boot-time gauntlet that cross-checks all three layers before any plugin boots. Every invariant violation surfaces in a single report so you can fix the full YAML in one edit.
Config schemas
config/agents.d/ana.yaml
agents:
- id: ana
credentials:
whatsapp: personal # must match whatsapp.yaml instance
telegram: ana_bot # must match telegram.yaml instance
google: ana@gmail.com # must match google-auth.yaml accounts[].id
# Opt-out for the symmetric-binding warning when inbound bot and
# outbound bot are intentionally different:
# telegram_asymmetric: true
inbound_bindings:
- { plugin: whatsapp, instance: personal }
- { plugin: telegram, instance: ana_bot }
config/plugins/whatsapp.yaml
whatsapp:
- instance: personal
session_dir: ./data/workspace/ana/whatsapp/personal
media_dir: ./data/media/whatsapp/personal
allow_agents: [ana] # defense-in-depth ACL
- instance: work
session_dir: ./data/workspace/kate/whatsapp/work
media_dir: ./data/media/whatsapp/work
allow_agents: [kate]
config/plugins/telegram.yaml
telegram:
- instance: ana_bot
token: ${file:./secrets/telegram/ana_token.txt}
allow_agents: [ana]
allowlist:
chat_ids: [1194292426]
- instance: kate_bot
token: ${file:./secrets/telegram/kate_token.txt}
allow_agents: [kate]
config/plugins/google-auth.yaml
google_auth:
accounts:
- id: ana@gmail.com
agent_id: ana # 1:1 — the gauntlet enforces it
client_id_path: ./secrets/google/ana_client_id.txt
client_secret_path: ./secrets/google/ana_client_secret.txt
token_path: ./secrets/google/ana_token.json
scopes:
- https://www.googleapis.com/auth/gmail.modify
Agents that still declare the legacy inline google_auth block are
auto-migrated into this store on boot (a warning tells you to migrate).
What the gauntlet validates
| Check | Lenient | Strict |
|---|---|---|
Duplicate session_dir across instances | error | error |
session_dir that is a parent of another | error | error |
| Credential file with lax permissions (linux 0o077) | error | error |
credentials.<ch> points to an instance that does not exist | error | error |
Agent listens on >1 instance without declaring credentials.<ch> | error | error |
Instance allow_agents excludes a binding agent | error | error |
Inbound instance ≠ outbound instance (no <ch>_asymmetric) | warn | error |
Inline agents.<id>.google_auth without matching google-auth.yaml | warn | warn |
Linux permission check is skipped for /run/secrets/* (Docker secrets)
and can be disabled entirely with CHAT_AUTH_SKIP_PERM_CHECK=1.
Topics
Outbound tool calls land on instance-suffixed topics when the resolver has a binding:
plugin.outbound.whatsapp.<instance>
plugin.outbound.telegram.<instance>
Unlabelled (instance: None) plugin entries keep publishing to the
legacy bare topic plugin.outbound.whatsapp / plugin.outbound.telegram
for full back-compat.
CLI gate
# Run the full gauntlet without booting the daemon. Exits 0 clean,
# 1 on errors, 2 on warnings-only.
agent --config ./config --check-config
# Promote warnings to errors (CI lane).
agent --config ./config --check-config --strict
The gate scans agents.yaml, every agents.d/*.yaml,
whatsapp.yaml, telegram.yaml, and google-auth.yaml. Sample
failure:
credentials: FAILED with 1 error(s):
1. agent 'ana_per_binding_example' binds credentials.telegram='ana_tg' but no such telegram instance exists (available: [])
Secrets in logs
The credential layer never logs a raw account id. Every reference is
via an 8-byte sha256(account_id) fingerprint rendered as hex:
2025-04-24T16:03:42Z INFO credentials.audit agent="ana" channel="whatsapp" fp=a3f2…7c direction=outbound
The fingerprint is pinned — switching the algorithm is an explicit
breaking change tracked by crates/auth/tests/fingerprint_stability.rs.
Observability
Nine Prometheus series land at /metrics:
| Series | Type | Labels |
|---|---|---|
credentials_accounts_total | gauge | channel |
credentials_bindings_total | gauge | agent, channel |
channel_account_usage_total | counter | agent, channel, direction, instance |
channel_acl_denied_total | counter | agent, channel, instance |
credentials_resolve_errors_total | counter | channel, reason |
credentials_breaker_state | gauge | channel, instance |
credentials_boot_validation_errors_total | counter | kind |
credentials_insecure_paths_total | gauge | — |
credentials_google_token_refresh_total | counter | account_fp, outcome |
Back-compat
- Configs without a
credentials:block keep working — the resolver infers outbound from the singleinbound_bindingsentry when it is unambiguous; otherwise outbound tools are marked unbound and fall back to the legacy bare topic. - Plugin entries with
instance: Nonestay on the legacy bare topic. agents.<id>.google_authstill registersgoogle_*tools for that agent;google-auth.yamlis preferred going forward.
Hot-reload (no daemon restart)
Edit agents.d/*.yaml, plugins/whatsapp.yaml, plugins/telegram.yaml,
or plugins/google-auth.yaml, then trigger a reload via the loopback
admin endpoint:
curl -fsSX POST http://127.0.0.1:9091/admin/credentials/reload | jq
{
"accounts_wa": 2,
"accounts_tg": 2,
"accounts_google": 1,
"warnings": [],
"version": 4
}
The resolver runs the gauntlet against the fresh files, then atomically
swaps bindings in place. Plugin tools holding Arc<…> references see
the new state on their next call. Failure mode: gauntlet errors
return HTTP 400 with the error list; the previous bindings stay
active so a typo in YAML does not knock out the runtime.
CredentialHandles already issued to in-flight tool calls keep
working — handles are by-value clones; the resolver only mediates
lookup of future calls.
What the reload does NOT cover
- Adding a brand-new WhatsApp / Telegram instance still requires a
restart for the plugin (each instance owns its own session_dir
- websocket). The resolver picks up the new account but the plugin side stays as-was until next boot.
- Removing an account leaks its breaker entry in
BreakerRegistryuntil restart. No correctness impact.
Google client_id / client_secret rotation
Rewriting the secret files (./secrets/<agent>_google_client_id.txt,
..._client_secret.txt) is picked up automatically on the next
google_* tool call — GoogleAuthClient checks file mtime before
each network hop and re-reads when it advanced. No reload call
required for that case. Audit log line:
INFO credentials.audit event="google_secrets_refreshed" \
google_*: re-read client_id/client_secret after on-disk rotation
Strict mode
agent --check-config --strict promotes warnings to errors. Two
checks behave differently under strict:
| Condition | Lenient | Strict |
|---|---|---|
Inline agents.<id>.google_auth block (legacy) | warn + auto-migrate | BuildError::LegacyInlineGoogleAuth, fail boot |
Asymmetric inbound ≠ outbound (no <ch>_asymmetric: true) | warn | error |
Run --strict in CI to gate PRs that touch credential YAML.
Migrating
- Add
instance:+allow_agents:to each entry inwhatsapp.yaml/telegram.yaml. - Create
config/plugins/google-auth.yamlwith oneaccounts[]per agent that needs Gmail. - Add
credentials:to eachagents.d/*.yaml. - Run
agent --check-config --strict. Fix every listed error. - Commit.
pollers.yaml
The Phase 19 generic poller subsystem. One runner orchestrates N
modules — each module is an impl Poller (gmail, rss, calendar,
webhook_poll, or anything you write yourself) — and every module
shares the same scheduler, lease, breaker, cursor persistence, and
outbound dispatch via Phase 17 credentials.
Source: crates/poller/, crates/config/src/types/pollers.rs.
Top-level shape
pollers:
enabled: true
state_db: ./data/poller.db
default_jitter_ms: 5000
lease_ttl_factor: 2.0
failure_alert_cooldown_secs: 3600
breaker_threshold: 5
jobs:
- id: ana_leads
kind: gmail
agent: ana
schedule: { every_secs: 60 }
config:
query: "is:unread subject:lead"
deliver: { channel: whatsapp, to: "57300...@s.whatsapp.net" }
message_template: |
New lead 🚨
{snippet}
Absent file → subsystem off (no jobs spawn, no admin endpoint).
Top-level fields
| Field | Default | Purpose |
|---|---|---|
enabled | true | Master switch. false skips everything below. |
state_db | ./data/poller.db | SQLite path for poll_state + poll_lease. Created if missing. |
default_jitter_ms | 5000 | Random offset added to next_run_at when a job's schedule does not declare its own. Avoids thundering herd. |
lease_ttl_factor | 2.0 | Lease TTL = factor × interval (min 30s). A daemon that crashes mid-tick releases the lease via expiry; another worker takes over without rerunning side effects unless your module is non-idempotent. |
failure_alert_cooldown_secs | 3600 | Per-job cooldown for failure_to alerts. Persisted in poll_state.last_failure_alert_at so it survives restarts. |
breaker_threshold | 5 | Consecutive Transient errors before the per-job circuit breaker opens. |
jobs | [] | Per-job entries (see below). |
Per-job fields
| Field | Required | Purpose |
|---|---|---|
id | ✅ | Unique. Used as session key for state, metrics, admin endpoints, lease. |
kind | ✅ | Discriminator. Must match a registered Poller::kind() (see Built-ins and Build a poller). |
agent | ✅ | Agent whose Phase 17 credentials this job uses. The runner looks up the binding for whatever channel the module needs (Google for fetch, WhatsApp/Telegram for outbound, etc). |
schedule | ✅ | One of every, cron, at (see Schedules). |
config | — | Module-specific options. Validated by Poller::validate at boot. Bad config rejects this job only — siblings keep loading. |
failure_to | — | { channel, to } for an alert when consecutive_errors crosses breaker_threshold. Optional — omit to log only. |
paused_on_boot | false | Persist paused = 1 in state at startup. Useful for staged rollouts. |
Schedules
# Repeat every N seconds. Most common.
schedule: { every_secs: 60 }
# 6-field cron: sec min hour dom mon dow.
schedule:
cron: "0 */5 * * * *" # every 5 minutes on the boundary
tz: "America/Bogota" # accepted; evaluated in UTC unless cron-tz feature on
stagger_jitter_ms: 2000 # local override for this job
# One-shot at an RFC3339 instant. After it fires the job stays paused.
schedule: { at: "2026-04-26T15:00:00Z" }
Built-ins
kind | Purpose | Cursor | Auth |
|---|---|---|---|
gmail | Search Gmail, regex extract, dispatch | Reserved (Gmail UNREAD + mark_read does dedup) | Phase 17 Google |
rss | RSS / Atom feeds | ETag + bounded seen-id ring | None |
webhook_poll | Generic JSON GET / POST | Bounded seen-id ring | None / custom headers |
google_calendar | Calendar v3 events incremental sync | nextSyncToken | Phase 17 Google |
gmail
- id: ana_leads
kind: gmail
agent: ana
schedule: { every_secs: 60 }
config:
query: "is:unread subject:(lead OR interesado)"
newer_than: "1d" # avoids back-filling years on first deploy
max_per_tick: 20
dispatch_delay_ms: 1000 # throttle between dispatches in same tick
sender_allowlist: ["@mycompany.com"]
extract:
name: "Nombre:\\s*(.+)"
phone: "Tel:\\s*(\\+?\\d+)"
require_fields: [name, phone]
message_template: |
New lead 🚨 {name} — {phone}
{snippet}
mark_read_on_dispatch: true
deliver: { channel: whatsapp, to: "57300...@s.whatsapp.net" }
Multiple gmail jobs for the same agent share a cached
GoogleAuthClient — token refreshes happen once across all jobs.
google_* errors are classified: 401 / invalid_grant / revoked
→ Permanent (auto-pause), 5xx / network → Transient (backoff).
rss
- id: ana_blog_watch
kind: rss
agent: ana
schedule: { every_secs: 600 }
config:
feed_url: https://example.com/feed.xml
max_per_tick: 5
message_template: "{title}\n{link}"
deliver: { channel: telegram, to: "1194292426" }
ETag from the previous response is sent as If-None-Match. 304 Not Modified produces a zero-cost tick.
webhook_poll
- id: ana_jira_assigned
kind: webhook_poll
agent: ana
schedule: { every_secs: 300 }
config:
url: https://company.atlassian.net/rest/api/3/search
method: GET
headers:
Authorization: "Bearer ${JIRA_TOKEN}"
Accept: "application/json"
items_path: "issues" # dotted path to the array; "" for root
id_field: "id" # field used for dedup
max_per_tick: 10
message_template: "[{key}] {fields}"
deliver: { channel: telegram, to: "1194292426" }
# SSRF guard — must opt in to hit private / loopback hosts:
# allow_private_networks: true
401 / 403 → Permanent. Any other 4xx → Permanent. 5xx →
Transient.
google_calendar
- id: ana_calendar_sync
kind: google_calendar
agent: ana
schedule: { every_secs: 300 }
config:
calendar_id: primary
skip_cancelled: true
message_template: "📅 {summary} — {start}\n{html_link}"
deliver: { channel: telegram, to: "1194292426" }
First tick captures nextSyncToken and dispatches nothing (baseline).
Subsequent ticks use syncToken=... and dispatch the diff. 410 Gone
(token expired) is classified Permanent — operator runs
agent pollers reset <id> to re-baseline.
Multi-job per built-in
Same agent + same kind, multiple jobs — completely independent. The
runner gives each its own cursor, breaker, schedule, metrics, and
pause/resume controls. The GoogleAuthClient is the only thing
shared (intentional, so quota and refresh costs aren't multiplied).
# Three Gmail polls for Ana, all independent
- id: ana_leads
kind: gmail
agent: ana
schedule: { every_secs: 60 }
config:
query: "is:unread label:lead"
deliver: { channel: whatsapp, to: "57300...@s.whatsapp.net" }
# …
- id: ana_invoices
kind: gmail
agent: ana
schedule: { every_secs: 600 }
config:
query: "is:unread label:invoice"
deliver: { channel: telegram, to: "1194292426" }
# …
- id: ana_alerts
kind: gmail
agent: ana
schedule: { cron: "0 */15 * * * *" }
config:
query: "is:unread from:monitor@infra.com"
deliver: { channel: telegram, to: "9876543210" }
# …
Pause ana_invoices independently with
agent pollers pause ana_invoices.
CLI
agent pollers list # plain table; --json for machine output
agent pollers show ana_leads # detail of one job
agent pollers run ana_leads # manual tick (bypasses schedule + lease)
agent pollers pause ana_invoices # paused = 1
agent pollers resume ana_invoices
agent pollers reset ana_calendar_sync --yes # destructive; clears cursor
agent pollers reload # re-read pollers.yaml + diff
The daemon must be running (CLI hits the loopback admin server at
127.0.0.1:9091).
Admin endpoints
GET /admin/pollers
GET /admin/pollers/<id>
POST /admin/pollers/<id>/run
POST /admin/pollers/<id>/pause
POST /admin/pollers/<id>/resume
POST /admin/pollers/<id>/reset
POST /admin/pollers/reload
reload returns a ReloadPlan JSON: { add, replace, remove, keep }.
Validation runs across every job in the new file before any task is
touched — a typo never knocks healthy siblings offline.
Agent tools
When the poller subsystem is up, every agent gets six LLM-callable
tools registered on its ToolRegistry:
| Tool | Effect |
|---|---|
pollers_list | List every job + status |
pollers_show | Inspect one job |
pollers_run | Trigger a tick out-of-band |
pollers_pause | Set paused = 1 |
pollers_resume | Set paused = 0 |
pollers_reset | Wipe cursor + errors (destructive) |
Each registered Poller impl can also expose per-kind custom tools
via Poller::custom_tools() — gmail ships gmail_count_unread out
of the box. See Build a poller.
Create / delete are intentionally not exposed: prompt-injection
could plant a webhook_poll aimed at internal infra. Operators
own pollers.yaml + agent pollers reload.
Failure-destination
- id: ana_leads
kind: gmail
# …
failure_to:
channel: telegram
to: "1194292426" # alerts on the operator's chat
When the per-job circuit breaker trips
(consecutive_errors >= breaker_threshold), the runner publishes a
text message to the configured channel (resolved via Phase 17 just
like the happy path) and records the timestamp for cooldown
gating. Cooldown is failure_alert_cooldown_secs global default,
overridable per job in a future revision.
Observability
Seven Prometheus series exposed under /metrics:
| Series | Type | Labels |
|---|---|---|
poller_ticks_total | counter | kind, agent, job_id, status={ok,transient,permanent,skipped} |
poller_latency_ms | histogram | kind, agent, job_id |
poller_items_seen_total | counter | kind, agent, job_id |
poller_items_dispatched_total | counter | kind, agent, job_id |
poller_consecutive_errors | gauge | job_id |
poller_breaker_state | gauge | job_id (0=closed, 1=half-open, 2=open) |
poller_lease_takeovers_total | counter | job_id |
Migrating from gmail-poller.yaml
The legacy crate nexo-plugin-gmail-poller keeps its YAML schema
but no longer drives its own loop. On boot the wizard
auto-translates every legacy job into a kind: gmail entry, folds
it into cfg.pollers.jobs, and logs a deprecation warn. Explicit
entries in pollers.yaml win on id collision so a manual migration
is never clobbered.
To migrate cleanly:
- Run
agent --check-configto print every translated id. - Copy each into
config/pollers.yamlunderpollers.jobs, adjusting theagent:field if the legacyagent_idwas inferred. - Delete
config/plugins/gmail-poller.yaml.
MiniMax M2.5
MiniMax M2.5 is the primary LLM provider for nexo-rs. It's the first provider implemented and the recommended default for new agents.
Source: crates/llm/src/minimax.rs, crates/llm/src/minimax_auth.rs.
Why it's primary
- Strong tool-calling support on both the OpenAI-compat wire and the Anthropic Messages wire
- Token Plan auth lets you run agents on a subscription without per-request billing headaches
- Aggressive price/performance for multi-agent deployments
If you don't have a specific reason to pick another provider, start with MiniMax.
Configuration
# config/llm.yaml
providers:
minimax:
api_key: ${MINIMAX_API_KEY:-}
group_id: ${MINIMAX_GROUP_ID:-}
base_url: https://api.minimax.io
rate_limit:
requests_per_second: 2.0
quota_alert_threshold: 100000
Per-agent selection:
# config/agents.d/ana.yaml
agents:
- id: ana
model:
provider: minimax
model: MiniMax-M2.5
Wire formats (api_flavor)
MiniMax exposes two HTTP shapes. The client auto-detects from
base_url but can be overridden via api_flavor.
api_flavor | Endpoint | Shape | When |
|---|---|---|---|
openai_compat (default) | {base_url}/text/chatcompletion_v2 | OpenAI chat completions | Regular API keys, most use cases |
anthropic_messages | {base_url}/v1/messages | Anthropic Messages | Token Plan / Coding keys served at api.minimax.io/anthropic |
Auto-detection: if base_url ends in /anthropic, the client picks
anthropic_messages automatically.
Authentication
Static API key
Simple path: put the key in env or a secrets file.
Env var precedence (first wins):
MINIMAX_CODE_PLAN_KEYMINIMAX_CODING_API_KEY./secrets/minimax_code_plan_key.txtapi_keyfield inllm.yaml
Token Plan OAuth bundle
For subscription-based access. The wizard writes a bundle to
./secrets/minimax_token_plan.json:
{
"access_token": "...",
"refresh_token": "...",
"expires_at": "2026-05-01T12:00:00Z",
"region": "https://api.minimax.io"
}
Auto-refresh: 60 seconds before expires_at, a background task
POSTs to {region}/oauth/token with grant_type=refresh_token and
rewrites the bundle atomically. Concurrent refreshes are serialized
behind a mutex — you never get two refresh calls in flight.
Mid-flight 401: if an API call returns 401 while holding what we thought was a valid token (clock skew, revocation), the client force-refreshes once and retries the request. A second 401 is surfaced as a credential error.
Shared OAuth client id for the MiniMax Portal flow:
78257093-7e40-4613-99e0-527b14b39113.
Request / response flow
sequenceDiagram
participant A as Agent loop
participant RL as RateLimiter
participant C as MiniMaxClient
participant AU as AuthSource
participant MX as MiniMax API
A->>C: chat(ChatRequest)
C->>RL: acquire()
C->>AU: fresh_bearer()
AU->>AU: refresh if <60s to expiry
AU-->>C: access_token
C->>MX: POST chatcompletion_v2 / v1/messages
alt 200
MX-->>C: ChatResponse
else 401
C->>AU: force_refresh()
C->>MX: retry once
else 429
MX-->>C: Retry-After
C-->>A: LlmError::RateLimit
else 5xx
MX-->>C: error body
C-->>A: LlmError::ServerError
end
Supported features
| Feature | OpenAI-compat | Anthropic-messages |
|---|---|---|
| Chat completions | ✅ | ✅ |
| Tool calling | ✅ | ✅ |
| Streaming (SSE) | ✅ | ✅ |
| Token usage in stream | ✅ (stream_options.include_usage) | ✅ native |
| Multimodal (images) | ✅ | ✅ |
| JSON mode | ✅ | limited |
Rate limiting
Per-provider token bucket. requests_per_second: 2.0 refills one slot
every 500 ms. Acquired before every request.
An optional quota_alert_threshold emits a structured warn log when
the remaining quota (if the provider reports it) crosses the threshold.
Useful for Prometheus alerting.
Error classification
| Response | Mapping | Behavior |
|---|---|---|
| 429 | LlmError::RateLimit { retry_after_ms } | Retried by the LLM retry layer (up to 5 attempts) |
| 5xx | LlmError::ServerError { status, body } | Retried (up to 3 attempts) |
| 401 | Internal auth refresh + single retry, then LlmError::CredentialInvalid | Fail-fast after refresh attempt |
| Other 4xx | LlmError::Other | Fail fast |
Common mistakes
- Forgetting
group_id. MiniMax requires a group id alongside the key for most endpoints. The wizard sets this; manual configs often miss it. - Pointing
base_urlat/anthropicwith a regular API key. That endpoint is for Token Plan / Coding keys only — regular keys will 401. Leavebase_urlathttps://api.minimax.io. - Refreshing the bundle manually mid-flight. The client already serializes refreshes. Editing the file while the agent runs can lead to an atomic write race — stop the agent, edit, restart.
Anthropic / Claude
Native Anthropic client with multiple authentication paths: static API key, setup tokens, full OAuth PKCE subscription flow, or automatic import from the local Claude Code CLI.
Source: crates/llm/src/anthropic.rs, crates/llm/src/anthropic_auth.rs.
Phase 15 added the subscription flow end-to-end.
Configuration
# config/llm.yaml
providers:
anthropic:
api_key: ${ANTHROPIC_API_KEY:-}
base_url: https://api.anthropic.com
rate_limit:
requests_per_second: 2.0
auth:
mode: oauth_bundle
bundle: ./secrets/anthropic_oauth.json
Per-agent selection:
model:
provider: anthropic
model: claude-haiku-4-5
Authentication modes
auth.mode | Credential | Header |
|---|---|---|
static | api_key (sk-ant-…) | x-api-key: <key> |
setup_token | sk-ant-oat01-… (min 80 chars) | Authorization: Bearer <key> + anthropic-beta: oauth-2025-04-20 |
oauth_bundle | {access, refresh, expires_at} JSON | Authorization: Bearer <access> |
auto | tries all of the above in order | — |
auto resolution order
Used when auth.mode: auto or omitted:
flowchart TD
START[anthropic client build] --> B1{oauth_bundle<br/>file exists?}
B1 -->|yes| USE1[use OAuth bundle]
B1 -->|no| B2{Claude Code CLI<br/>credentials found?}
B2 -->|yes| USE2[import from<br/>~/.claude/.credentials.json]
B2 -->|no| B3{setup_token<br/>file exists?}
B3 -->|yes| USE3[use setup token]
B3 -->|no| B4{api_key<br/>set?}
B4 -->|yes| USE4[use static key]
B4 -->|no| FAIL([fail: no credentials])
OAuth bundle
The wizard runs a PKCE flow in the browser and writes the bundle to
./secrets/anthropic_oauth.json:
{
"access_token": "...",
"refresh_token": "...",
"expires_at": "2026-05-01T12:00:00Z"
}
- Refresh endpoint:
https://console.anthropic.com/v1/oauth/token - Refresh cadence: 60 seconds before
expires_at, background task POSTsgrant_type=refresh_token - Concurrency: all refreshes serialize behind a mutex
- Shared OAuth client id:
9d1c250a-e61b-44d9-88ed-5944d1962f5e - Stale-token handling: a 401 mid-flight marks the token stale so the next refresh fires immediately instead of waiting for the expiry window
CLI credentials import
If you're already running Claude Code CLI on the same host, the client
auto-detects and imports ~/.claude/.credentials.json. Zero config —
if it exists and is valid, it's used.
Tool calling
Native Anthropic shape:
- Tool definitions:
{name, description, input_schema} - Tool invocation:
tool_useblocks withid,name,input - Tool result:
tool_resultblocks correlated viatool_use_id
Streaming uses native SSE; a dedicated parser in
crates/llm/src/stream.rs handles message_start, content_block_*,
and message_delta events.
Error classification
| Response | Mapping | Behavior |
|---|---|---|
| 429 | LlmError::RateLimit { retry_after_ms } (fallback 60s) | Retried |
| 401 / 403 | LlmError::CredentialInvalid with context (API vs OAuth) | Marks OAuth token stale; fails fast so the operator sees it |
| 5xx | LlmError::ServerError | Retried |
| Other 4xx | LlmError::Other | Fail fast |
Supported features
- Chat completions ✅
- Tool calling ✅
- Streaming (SSE) ✅
- Multimodal (images) ✅
- Prompt caching ✅ (via Anthropic beta headers)
- Extended thinking ✅ (model-dependent)
Common mistakes
- Setup-token string under 80 chars. The setup-token validator refuses it at parse time. Make sure you pasted the full string.
api_key+oauth_bundleboth set. The auth mode wins. The static key is kept only as a fallback the auto-resolver may pick up if the bundle is missing.- Claude Code CLI credentials being used unintentionally. If
automode is on and you installed CLI on the host, that path wins beforeapi_key. Setauth.mode: staticto pin the static key.
OpenAI-compatible
Client for OpenAI itself and for any upstream that speaks the same wire: Ollama, Groq, OpenRouter, LM Studio, vLLM, Azure OpenAI, or your own proxy.
Source: crates/llm/src/openai_compat.rs.
Configuration
# config/llm.yaml
providers:
openai:
api_key: ${OPENAI_API_KEY:-}
base_url: https://api.openai.com/v1
rate_limit:
requests_per_second: 2.0
Per-agent:
model:
provider: openai
model: gpt-4o
Known-working upstreams
Point base_url at any of these and it works out of the box:
| Upstream | base_url |
|---|---|
| OpenAI | https://api.openai.com/v1 |
| Ollama | http://localhost:11434/v1 |
| Groq | https://api.groq.com/openai/v1 |
| OpenRouter | https://openrouter.ai/api/v1 |
| LM Studio | http://localhost:1234/v1 |
| vLLM | http://<host>:<port>/v1 |
| Azure OpenAI | Azure resource URL (watch for differences) |
| MiniMax (compat mode) | https://api.minimax.io |
Authentication
Single mode: static API key sent as Authorization: Bearer <key>.
Some upstreams ignore the key entirely (Ollama, local vLLM) — supply
any non-empty string to satisfy the config validator.
Features & gaps
| Feature | Status |
|---|---|
| Chat completions | ✅ |
| Tool calling | ✅ (OpenAI function-calling shape) |
| Streaming | ✅ |
tool_choice: auto | required | none | {type:function} | ✅ |
| JSON mode / structured outputs | upstream-dependent |
| Multimodal | upstream-dependent |
| Embeddings | supported for OpenAI proper; other upstreams may vary |
Feature gating when the upstream lacks support: we do not pre-probe
features — a call that requires a feature the upstream doesn't speak
will fail with the upstream's own error (typically a 400). The error
bubbles up as LlmError::Other and does not retry, so you notice
quickly.
Error classification
| Response | Mapping | Behavior |
|---|---|---|
| 429 | LlmError::RateLimit (fallback 30s) | Retried |
| 5xx | LlmError::ServerError | Retried |
| Other 4xx | LlmError::Other | Fail fast |
Common mistakes
- Trailing slash in
base_url. Some upstreams are lenient, some are not. Stick to the form shown in the table. - Using Azure OpenAI without the deployment path. Azure requires
an extra segment (
/openai/deployments/<name>/chat/completions) that the vanilla OpenAI path doesn't. Currently not supported out of the box; use a proxy or a custom provider if you need Azure. - Relying on JSON mode everywhere. Many local servers don't enforce schemas. Validate the response yourself when using Ollama / LM Studio for critical tool args.
DeepSeek
Connector for DeepSeek's hosted models. The API is OpenAI-compatible
end to end (same /v1/chat/completions shape, same SSE streaming,
same Bearer auth) so the connector is a thin factory that wraps
OpenAiClient with DeepSeek's default endpoint.
Source: crates/llm/src/deepseek.rs.
Configuration
# config/llm.yaml
providers:
deepseek:
api_key: ${DEEPSEEK_API_KEY}
# base_url defaults to https://api.deepseek.com/v1 when blank.
# Override only for self-hosted gateways or testing fixtures.
base_url: ""
rate_limit:
requests_per_second: 2.0
quota_alert_threshold: 100000
Pin the agent to it:
agents:
- id: ana
model:
provider: deepseek
model: deepseek-chat
Models
| Model id | Use case |
|---|---|
deepseek-chat | General-purpose. Supports tool calling. |
deepseek-reasoner | Long-form reasoning. No tool calling in current API revision. |
deepseek-reasoner agents must therefore leave allowed_tools empty
(or list only tools the agent never plans to invoke). Tool calls fired
against the reasoner endpoint return an error from upstream.
Streaming
Identical to OpenAI's SSE format, so OpenAiClient::chat_stream parses
it without per-provider code. nexo_llm_stream_ttft_seconds and
nexo_llm_stream_chunks_total Prometheus series labelled with
provider="deepseek" show up automatically.
Tool calling
deepseek-chat follows OpenAI's tool-calling spec verbatim. JSON
arguments deserialise the same way; parallel_tool_calls is honoured.
Rate limits
DeepSeek returns standard 429 with a retry-after header. The
existing retry plumbing (crates/llm/src/retry.rs) consumes that
header so 429s back off cleanly without touching the connector.
Quota / cost
DeepSeek's pricing is per-1M-tokens; the TokenUsage returned by
each ChatResponse is forwarded to the standard
agent_llm_tokens_total counter (labels: provider="deepseek",
model, usage_kind).
Known limitations
- No native embeddings client — DeepSeek does not currently
publish an embeddings endpoint. Use a different provider for
embedding_modelif your agent needs vector search. - Reasoner tool-call gap — see Models. Validate at boot
by leaving
allowed_tools: []on agents pinned todeepseek-reasoner. - Cache awareness — DeepSeek's KV-cache hit information is
surfaced through the same
cache_usagefield as the OpenAI client reports it.
See also
- OpenAI-compatible — same wire format, full notes on the underlying client.
- Rate limiting & retry — backoff policy.
Rate limiting & retry
Every LLM provider client sits behind a token bucket and a bounded retry policy with decorrelated jittered exponential backoff. This page is the definitive reference for those two mechanisms.
Source: crates/llm/src/retry.rs, crates/llm/src/rate_limiter.rs,
crates/llm/src/quota_tracker.rs.
Rate limiter
Token bucket, acquired before every outbound request.
interval = 1 / requests_per_second- One token per request
- Bucket fully refills after
intervalper slot - Per-provider, per-agent — each client has its own bucket, so one noisy agent can't starve another even when they share a provider
rate_limit:
requests_per_second: 2.0
quota_alert_threshold: 100000 # optional
At 2.0 rps, the bucket tops up a slot every 500 ms. A burst of 3
requests will wait briefly on the third.
Quota tracker
Optional. When a provider returns remaining-quota info (header,
response body), quota_tracker records it via record_usage() on the
token response. If the remaining crosses quota_alert_threshold, a
structured warn log is emitted:
WARN quota threshold crossed provider=minimax remaining=99500 threshold=100000
Pair with a Prometheus log-scraping rule for an alert.
Retry policy
Retries live above the circuit breaker. They handle transient failures that don't warrant flipping the breaker.
| Error class | Max attempts | Backoff curve |
|---|---|---|
| 429 (rate limit) | 5 | max(retry-after, jittered_backoff) |
| 5xx (server) | 3 | jittered_backoff |
| 401 (auth) | 1 refresh + 1 retry | (internal to the client) |
| Other 4xx | 0 (fail fast) | — |
Decorrelated jittered backoff
Not simple exponential — the next backoff is a uniform random draw in a growing range:
next = uniform(base, max(base, last × multiplier))
Defaults from llm.yaml retry block:
| Field | Default |
|---|---|
initial_backoff_ms | 1000 |
max_backoff_ms | 60000 |
backoff_multiplier | 2.0 |
Why decorrelated jitter: multiple clients hitting the same 429 don't re-fire in lockstep. Desynchronization is built-in.
flowchart LR
REQ[request] --> API{API response}
API -->|200| OK[return ChatResponse]
API -->|429| RL[RateLimit]
API -->|5xx| SE[ServerError]
API -->|401| AU[CredentialInvalid]
API -->|4xx| F[Other fail fast]
RL --> D1{attempts<br/>< 5?}
SE --> D2{attempts<br/>< 3?}
AU --> REF[auth refresh<br/>+ single retry]
D1 -->|yes| BO1[wait max(retry_after,<br/>jittered_backoff)]
D1 -->|no| F
D2 -->|yes| BO2[wait jittered_backoff]
D2 -->|no| F
BO1 --> REQ
BO2 --> REQ
REF --> REQ
Error classification per provider
The providers classify HTTP responses into a shared LlmError so the
retry layer can be common code:
| HTTP | LlmError variant | Retried? |
|---|---|---|
| 200 | Ok(ChatResponse) | — |
| 429 | RateLimit { retry_after_ms } | ✅ up to 5 |
| 5xx | ServerError { status, body } | ✅ up to 3 |
| 401 / 403 | CredentialInvalid | ❌ (client handles refresh internally) |
| Other 4xx | Other | ❌ |
Tuning
- Bursty workloads: bump
requests_per_secondcautiously; the upstream's own rate limits won't move, so you'll just pay more 429s to find the ceiling. - Flaky networks: raise
max_attemptsfor 5xx; keepmax_backoff_msbounded so slow agents don't spiral. - Subscription plans: lower
requests_per_secondto keep daily usage under caps; pair withquota_alert_threshold.
See also
End-to-end WhatsApp channel: Signal Protocol pairing, inbound message bridge, outbound send/reply/reaction/media tools, optional voice transcription.
Source: crates/plugins/whatsapp/ (thin wrapper over the
whatsapp-rs crate).
Topics
| Direction | Subject | Notes |
|---|---|---|
| Inbound | plugin.inbound.whatsapp | Legacy single-account |
| Inbound | plugin.inbound.whatsapp.<instance> | Multi-account routing |
| Outbound | plugin.outbound.whatsapp | Legacy single-account |
| Outbound | plugin.outbound.whatsapp.<instance> | Multi-account routing |
During pairing the plugin also publishes qr lifecycle events on the
inbound topic so the wizard can render the QR.
Config
# config/plugins/whatsapp.yaml
whatsapp:
enabled: true
session_dir: "" # empty → per-agent default
media_dir: ./data/media/whatsapp
instance: default
acl:
allow_list: [] # empty + empty env = open ACL
from_env: WA_AGENT_ALLOW
behavior:
ignore_chat_meta: true
ignore_from_me: true
ignore_groups: false
bridge:
response_timeout_ms: 30000
on_timeout: noop # noop | apology_text
transcriber:
enabled: false
skill: whisper
public_tunnel:
enabled: false
only_until_paired: true
Key fields:
| Field | Default | Purpose |
|---|---|---|
session_dir | per-agent | Signal Protocol state. Each account needs its own dir. |
instance | None | Label for multi-account routing. Unlabelled keeps the legacy bare topic. |
allow_agents | [] | Agents permitted to publish from this instance. Empty = accept any agent holding a resolver handle. Defense-in-depth for the per-agent credentials binding. |
acl.allow_list | [] | Bare JIDs allowed to reach the agent. Empty + empty env = open. |
behavior.ignore_chat_meta | true | Skip muted / archived / locked chats on the phone. |
behavior.ignore_from_me | true | Drop the agent's own replies to prevent loops. |
behavior.ignore_groups | false | Skip group chats entirely when true. |
bridge.response_timeout_ms | 30000 | Per-message handler deadline. |
bridge.on_timeout | noop | noop (no reply) or apology_text. |
transcriber.enabled | false | Voice → text via skill. |
public_tunnel.enabled | false | Expose /whatsapp/pair through a Cloudflare tunnel. |
public_tunnel.only_until_paired | true | Tear down the tunnel after Connected. |
Pairing
Pairing is setup-time only. The runtime refuses to start without paired credentials.
sequenceDiagram
participant U as Operator
participant W as agent setup
participant WA as whatsapp-rs Client
participant P as Phone
U->>W: setup pair whatsapp --agent ana
W->>WA: new_in_dir(session_dir)
WA-->>W: QR image
W-->>U: render QR (Unicode blocks)
U->>P: Settings → Linked Devices → scan
P->>WA: pair
WA-->>W: Connected
W->>W: persist creds to session_dir/.whatsapp-rs/creds.json
- Credentials at
<session_dir>/.whatsapp-rs/creds.json - Daemon-collision check at
<session_dir>/.whatsapp-rs/daemon.jsonblocks a second process on the same account - Multi-account via
Client::new_in_dir()— no XDG_DATA_HOME mutation - Credential expiry mid-run (401 loop) → operator must re-pair; no runtime QR fallback
Tools exposed to the LLM
| Tool | Signature | Notes |
|---|---|---|
whatsapp_send_message | (to, text) | Send to arbitrary JID. |
whatsapp_send_reply | (chat, reply_to_msg_id, text) | Quote a specific inbound message. |
whatsapp_send_reaction | (chat, msg_id, emoji) | Emoji tap-back. |
whatsapp_send_media | (to, file_path, caption?, mime?) | File attachment. |
All tools honor the per-binding outbound_allowlist.whatsapp —
empty list = unrestricted, populated = hard allowlist.
Event shapes
Inbound payloads (on plugin.inbound.whatsapp[.<instance>]):
// message
{
"kind": "message",
"from": "573000000000@s.whatsapp.net",
"chat": "573000000000@s.whatsapp.net",
"text": "hi",
"reply_to": null,
"is_group": false,
"timestamp": 1714000000,
"msg_id": "3EB0..."
}
// media_received
{
"kind": "media_received",
"from": "...",
"chat": "...",
"msg_id": "...",
"local_path": "./data/media/whatsapp/abc.jpg",
"mime": "image/jpeg",
"caption": null
}
// qr (pairing only)
{"kind": "qr", "ascii": "...", "png_base64": "...", "expires_at": ...}
// lifecycle
{"kind": "connected" | "disconnected" | "reconnecting" | "credentials_expired"}
// observability
{"kind": "bridge_timeout", "msg_id": "...", "waited_ms": 30000}
Gotchas
- Shared
session_diracross agents = cross-delivery. Each agent should point at its own<workspace>/whatsapp/default. The wizard does this automatically; manual configs need care. ignore_chat_meta: truesilently skips muted/archived chats. If a user archives a chat on the phone, the agent never sees it again until they unarchive.- Credential expiry is irreversible without re-pair.
whatsapp-rswill loop on 401. Watch forcredentials_expiredlifecycle events and alert.
See Setup wizard — WhatsApp pairing.
Telegram
Bot API channel with long-polling intake, multi-bot routing, full send/reply/reaction/edit/location/media tool surface, and optional voice auto-transcription.
Source: crates/plugins/telegram/.
Topics
| Direction | Subject | Notes |
|---|---|---|
| Inbound | plugin.inbound.telegram | Legacy single-bot |
| Inbound | plugin.inbound.telegram.<instance> | Per-bot routing |
| Outbound | plugin.outbound.telegram | Legacy single-bot |
| Outbound | plugin.outbound.telegram.<instance> | Per-bot routing |
Each instance subscribes only to its own outbound topic, so two bots in the same process don't cross-wire.
Config
# config/plugins/telegram.yaml
telegram:
token: ${file:./secrets/telegram_token.txt}
instance: sales_bot
polling:
enabled: true
interval_ms: 25000
offset_path: ./data/media/telegram/sales_bot.offset
allowlist:
chat_ids: [] # empty = accept all
auto_transcribe:
enabled: false
command: ./extensions/openai-whisper/target/release/openai-whisper
language: es
bridge_timeout_ms: 120000
Key fields:
| Field | Default | Purpose |
|---|---|---|
token | — (required) | Bot API token from @BotFather. |
instance | None | Label for multi-bot routing. Unlabelled keeps the legacy bare topic. |
allow_agents | [] | Agents permitted to publish from this bot. Empty = accept any agent holding a resolver handle. Defense-in-depth for the per-agent credentials binding. |
polling.enabled | true | Long-polling intake. Webhook not yet supported. |
polling.interval_ms | 25000 | Long-poll timeout hint. Telegram clamps to [1 s, 50 s]. |
polling.offset_path | ./data/media/telegram/offset | File to persist update offset across restarts. |
allowlist.chat_ids | [] | Numeric chat ids allowed. Empty = accept all. |
auto_transcribe.enabled | false | Voice → text. |
auto_transcribe.command | ./extensions/openai-whisper/.../openai-whisper | Path to whisper binary. |
bridge_timeout_ms | 120000 | Handler deadline before a bridge_timeout event fires. |
Auth
Single mode: static bot token. No OAuth. Store it under
./secrets/ and reference via ${file:...}.
flowchart LR
SETUP[agent setup] --> ASK[ask for bot token]
ASK --> F[./secrets/telegram_token.txt]
F -.->|${file:...}| CFG[config/plugins/telegram.yaml]
CFG --> RUN[runtime: HTTP Bot API with long-poll]
Tools exposed to the LLM
| Tool | Notes |
|---|---|
telegram_send_message | Send text to chat id (negative for groups/channels). |
telegram_send_reply | Quote a specific prior message. |
telegram_send_reaction | Emoji on a message. |
telegram_edit_message | Modify a prior message's text. |
telegram_send_location | GPS coordinates. |
telegram_send_media | File upload with caption and mime hint. |
All tools enforce outbound_allowlist.telegram per binding.
Event shapes
// message
{
"kind": "message",
"from": "12345",
"chat": "12345",
"chat_type": "private",
"text": "hi",
"reply_to": null,
"is_group": false,
"timestamp": 1714000000,
"msg_id": "42",
"username": "jdoe",
"media": [],
"latitude": null,
"longitude": null,
"forward": null
}
// media item (inside `media`)
{
"kind": "voice" | "photo" | "video" | "document" | "audio",
"local_path": "./data/media/telegram/....ogg",
"file_id": "AgACAgEA...",
"mime_type": "audio/ogg",
"duration_s": 4,
"width": null,
"height": null,
"file_name": null
}
// callback_query (inline-keyboard button press, auto-ACKed)
{"kind": "callback_query", "from": "...", "chat": "...", "data": "buy"}
// chat_membership
{"kind": "chat_membership", "chat": "...", "status": "added" | "kicked" | ...}
// lifecycle
{"kind": "connected" | "disconnected"}
{"kind": "bridge_timeout", "msg_id": "...", "waited_ms": ...}
Forwarded messages include a forward object:
"forward": {
"source": "user" | "channel" | "chat",
"from_user_id": 12345,
"from_chat_id": null,
"date": 1714000000
}
Gotchas
- Webhook mode is not supported yet. Long-polling only.
polling.interval_msis clamped by Telegram. Values outside [1000, 50000] get capped by the server side; default 25000 is a good middle ground.- Negative chat ids are groups/channels. Telegram uses negative ids for group chats; positive for private. Don't strip the sign.
- Auto-transcribe requires the whisper skill extension. The
command path must point at a working binary, otherwise inbound
voice messages arrive without
text.
Generic SMTP/IMAP plugin. Scaffolded but not yet wired — config shape is defined, but no tool surface or inbound bridge ships today. For a working Gmail → agent pipeline today, use gmail-poller.
Source: crates/plugins/email/ (empty lib.rs),
config in crates/config/src/types/plugins.rs.
Config
# config/plugins/email.yaml
email:
smtp:
host: smtp.example.com
port: 587
username: agent@example.com
password: ${file:./secrets/email_password.txt}
imap:
host: imap.example.com
port: 993
| Field | Default | Purpose |
|---|---|---|
smtp.host | — (required) | SMTP server. |
smtp.port | 587 | SMTP port. |
smtp.username | — (required) | SMTP auth user. |
smtp.password | — (required) | SMTP auth password. |
imap.host | — | IMAP server (inbound). |
imap.port | 993 | IMAP port. |
Status
- No NATS topics active
- No tools exposed to the LLM
- No inbound bridge
- Config schema reserved so future phases can land incrementally
What to use instead
For inbound triage:
- gmail-poller — cron-style Gmail
polling with regex capture groups and template-based dispatch to
any
plugin.outbound.*topic. Production-ready.
For outbound notifications:
- Delegate to a send agent wired to a transactional-email provider via a custom extension, until this plugin lands.
Track progress under the future Phase 17 in ../PHASES.md.
Browser (Chrome DevTools Protocol)
Drives a real Chrome/Chromium instance via CDP. Agents can navigate, click, fill, screenshot, and run JS — with stable element refs that work across DOM mutations within a single turn.
Source: crates/plugins/browser/.
Topics
| Direction | Subject | Notes |
|---|---|---|
| Outbound | plugin.outbound.browser | Tool invocations |
| Events | plugin.events.browser.<method_suffix> | Mirrored CDP notifications |
Browser is an outbound-only plugin — there is no unsolicited inbound event from a web page to the agent.
Config
# config/plugins/browser.yaml
browser:
headless: false
executable: "" # empty → search PATH
cdp_url: "" # empty → launch new Chrome
user_data_dir: ./data/browser/profile
window_width: 1280
window_height: 800
connect_timeout_ms: 10000
command_timeout_ms: 15000
args: [] # extra CLI flags for Chrome
| Field | Default | Purpose |
|---|---|---|
headless | false | Launch Chrome without a UI. |
executable | "" | Chrome binary path. Empty = search PATH. |
cdp_url | "" | Connect to an existing Chrome DevTools server (e.g. http://127.0.0.1:9222). Empty = launch a new instance. |
user_data_dir | ./data/browser/profile | Chrome profile cache. Keeps cookies, logins. |
window_width / window_height | 1280 / 800 | Viewport. |
connect_timeout_ms | 10000 | How long to wait for Chrome startup / remote connect. |
command_timeout_ms | 15000 | Per-CDP-command execution timeout. |
args | [] | Extra CLI flags forwarded verbatim to the spawned Chrome. Ignored when cdp_url is set. Later args win — use this to override built-in flags when a restricted environment needs it (e.g. --no-sandbox on Termux). |
Auth
None. CDP is an unauthenticated protocol — use cdp_url only with a
loopback / firewalled Chrome.
Tools exposed to the LLM
| Tool | Purpose |
|---|---|
browser_navigate | Load URL and wait for load event. |
browser_click | Click by element ref (@e12) or CSS selector. |
browser_fill | Type into input / textarea / contenteditable. Replaces content. |
browser_screenshot | Base64 PNG of the viewport. |
browser_evaluate | Run JS, return value as JSON. |
browser_snapshot | Text DOM tree with stable element refs. |
browser_scroll_to | Scroll a target element into view. |
browser_current_url | Current page URL. |
browser_wait_for | Poll for an element to appear. |
browser_go_back / browser_go_forward | Navigation history. |
browser_press_key | Keyboard events. |
All tools are prefixed browser_* for glob filtering in
allowed_tools.
Element refs
browser_snapshot emits a text tree where every actionable element
has a ref like @e12. Those refs are stable within the snapshot
turn but invalidated by any subsequent DOM mutation:
sequenceDiagram
participant A as Agent
participant B as Browser plugin
participant C as Chrome
A->>B: browser_snapshot
B->>C: DOM.describeNode(..)
C-->>B: tree
B-->>A: "Login @e12\nEmail @e13\n..."
A->>B: browser_fill(@e13, "user@…")
B->>C: DOM.focus + Input.dispatch
A->>B: browser_click(@e12)
Note over A,B: refs still valid<br/>(same snapshot turn)
A->>B: browser_snapshot
Note over B: refs from prior snapshot<br/>now INVALID
Rule: take a snapshot, act on refs from that snapshot, take a new snapshot before acting again.
Gotchas
browser_fillreplaces content. No append mode. To add text to existing content, read the current value first (viaevaluate) then send the merged string.- Connecting to an existing Chrome (
cdp_url) skips the profile setup. Any login state is whatever that Chrome already has. - Element refs expire on DOM mutation. The plugin does not auto-refresh — refs from a stale snapshot will error or misfire.
- Headless sites break. Some sites detect headless Chrome and
behave differently. Use
headless: falsefor those.
Google (OAuth, Gmail, Calendar, Drive) + gmail-poller
Two related subsystems:
googleplugin — per-agent OAuth client plus a genericgoogle_calltool that lets an agent hit any Google API the granted scopes allowgmail-pollerplugin — cron-style scheduler that polls Gmail, matches subjects/bodies with regex, and dispatches results to any outbound topic (WhatsApp, Telegram, another agent)
Sources: crates/plugins/google/ and crates/plugins/gmail-poller/.
google — per-agent OAuth
Config
Two shapes supported:
Preferred (Phase 17) — declare accounts in a dedicated store and
bind them from the agent via credentials.google:
# config/plugins/google-auth.yaml
google_auth:
accounts:
- id: ana@gmail.com
agent_id: ana # 1:1; gauntlet enforces the binding
client_id_path: ./secrets/google/ana_client_id.txt
client_secret_path: ./secrets/google/ana_client_secret.txt
token_path: ./secrets/google/ana_token.json
scopes:
- https://www.googleapis.com/auth/gmail.modify
Gmail-poller picks these up automatically; agents see google_* tools
when the store has an entry matching their agent_id.
Legacy inline (still works, logs a migration warn):
# agents.yaml
google_auth:
client_id: ${GOOGLE_CLIENT_ID}
client_secret: ${file:./secrets/google_secret.txt}
scopes:
- gmail.readonly
- gmail.send
- calendar
- drive.file
token_file: ./data/workspace/ana/google_token.json
redirect_port: 17653
| Field | Default | Purpose |
|---|---|---|
client_id / client_secret | — | OAuth app creds from Google Cloud Console. |
scopes | — | OAuth scopes. Short-form (gmail.readonly) auto-expanded to full URL. |
token_file | google_tokens.json | Persistent refresh-token JSON. Relative paths resolve from workspace. |
redirect_port | 8765 | Loopback callback port. Must match the "Authorized redirect URI" in the OAuth client. |
Pairing flow
sequenceDiagram
participant A as Agent LLM
participant T as google_auth_start
participant B as Browser
participant L as Loopback listener<br/>127.0.0.1:<port>/callback
participant G as Google OAuth
A->>T: invoke
T->>L: start listener
T-->>A: return consent URL
A->>B: ask user to open URL
B->>G: consent flow
G->>L: redirect w/ code
L->>G: exchange code → tokens
L->>L: persist refresh_token<br/>(mode 0o600)
L-->>A: success
The wizard wraps this as a one-shot step, but runtime tools expose the same primitives for re-auth.
Device-code flow (headless setup)
agent setup google offers a second consent path that does not
require a local browser — useful for servers, CI, and SSH-only
environments. The wizard:
- POSTs to
oauth2.googleapis.com/device/codewith the account'sclient_idand scopes. - Prints a 6-character
user_code+ averification_urlto the terminal. - Polls
oauth2.googleapis.com/token(default every 5 s) until the operator approves on any device. - Persists the resulting refresh_token at
token_pathwith mode0o600.
╭─ Device-code OAuth ───────────────────────────────────────
│ Open in any browser: https://www.google.com/device
│ Code to enter: HBQM-WLNF
│ (valid for 1800s)
╰───────────────────────────────────────────────────────────
Waiting for approval...
✔ Tokens persisted at ./secrets/ana_google_token.json.
The Google Cloud Console OAuth client must be type "TVs and
Limited Input devices" for this flow — Desktop/Web clients reject
device-code with client_type_disabled.
Lazy-refresh of client_id / client_secret
GoogleAuthClient.config is ArcSwap<GoogleAuthConfig>. Every
network call (exchange_code, request_device_code,
poll_device_token, refresh_token) first invokes
refresh_secrets_if_changed, which compares mtime on
client_id_path and client_secret_path and re-reads them when
they advance. Rotating the secret files (e.g. quarterly key
rotation in Google Cloud Console) takes effect on the next
tool call without a daemon restart.
Steady-state cost: one fs::metadata call per outbound request.
Audit trail (target credentials.audit):
INFO event="google_secrets_refreshed" \
google_*: re-read client_id/client_secret after on-disk rotation
Tools exposed
| Tool | Purpose |
|---|---|
google_auth_start | Start OAuth, return the consent URL. |
google_auth_status | Report {authenticated, expires_in_secs, has_refresh, scopes}. Safe to poll. |
google_call | Generic {method, url, body?} against any *.googleapis.com endpoint. Auto-refreshes access token. |
google_auth_revoke | Revoke the refresh token; forces full re-auth. |
Supported APIs
Anything under *.googleapis.com that the granted scopes permit.
Common call shapes:
- Gmail v1 —
https://gmail.googleapis.com/gmail/v1/users/me/messages?q=is:unread - Calendar v3 —
https://www.googleapis.com/calendar/v3/calendars/primary/events - Drive v3 —
https://www.googleapis.com/drive/v3/files?q=mimeType='application/pdf' - Sheets v4 —
https://sheets.googleapis.com/v4/spreadsheets/<id>/values/A1:D10
Gotchas
- 401 means the refresh token was revoked. Re-auth via
google_auth_start. - 403 means a scope wasn't granted. Add the scope, revoke, re-auth.
- Token file leaks → revoke immediately. The file holds a refresh token with the granted scopes.
gmail-poller — cron-style Gmail bridge
Poll Gmail, extract fields via regex, render a template, dispatch to any outbound topic. Multi-account, allowlisted by sender substring, rate-limited per dispatch.
Config
# config/plugins/gmail-poller.yaml
gmail_poller:
enabled: true
interval_secs: 60
accounts:
- id: default
agent_id: ana # Phase 17 — binds the account to an agent; defaults to `id` when omitted
token_path: ./data/workspace/ana/google_token.json
client_id_path: ./secrets/google_client_id.txt
client_secret_path: ./secrets/google_client_secret.txt
jobs:
- name: lead_forward
account: default
query: "is:unread subject:(lead OR interesado)"
newer_than: 1d
interval_secs: 120
forward_to_subject: plugin.outbound.whatsapp.default
forward_to: "573000000000@s.whatsapp.net"
extract:
name: "Nombre:\\s*(.+)"
phone: "Tel:\\s*(\\+?\\d+)"
require_fields: [name, phone]
message_template: |
New lead 🚨
{name} — {phone}
Subject: {subject}
{snippet}
mark_read_on_dispatch: true
max_per_tick: 20
dispatch_delay_ms: 1000
sender_allowlist: ["@mycompany.com", "partners@"]
Per-job fields
| Field | Default | Purpose |
|---|---|---|
name | — (required) | Job id. |
account | "default" | Which OAuth account to use. |
query | — (required) | Gmail search (is:unread, etc.). |
newer_than | — | Gmail newer_than: suffix (1d, 2h) — avoids back-filling. |
interval_secs | root interval | Override per-job poll cadence. |
forward_to_subject | — | Broker topic to publish dispatched message. |
forward_to | — | Recipient passed through (JID, chat id, phone). |
extract | {} | Named regex groups applied to the email body. First group wins. |
require_fields | [] | Skip dispatch if any listed extracted field is empty. |
message_template | — (required) | Template with {field}, {subject}, {snippet} placeholders. |
mark_read_on_dispatch | true | Mark the thread as read after successful dispatch. |
dispatch_delay_ms | 1000 | Sleep between multi-match dispatches. |
max_per_tick | 20 | Hard cap per poll cycle. |
sender_allowlist | [] | Substring/domain filter on From: header. Empty = accept all. |
Event shape
{
"to": "<forward_to>",
"kind": "text",
"text": "<rendered message_template>",
"subject": "<email subject>",
"<extract key>": "<captured group>"
}
Published to <forward_to_subject>.
Error backoff
Sustained errors are backed off: [0, 0, 0, 30, 60, 120, 300] seconds
(caps at 300). Transient failures don't stop the poll loop.
Gotchas
- Gmail API only — no IMAP. This plugin is Google-specific. For generic IMAP triage, use a custom extension.
sender_allowlistis substring, not regex. Simpler to read, simpler to get wrong. Quote boundary characters explicitly.extractregex must compile. Invalid regex fails the whole job at boot with an error naming the field.
See also
Short-term memory
Per-session conversational buffer held entirely in memory. Tracks the turns of the ongoing conversation so the LLM has context on every completion request.
Source: crates/core/src/session/ (types.rs, manager.rs) — the
Session struct owns the short-term buffer.
What lives in a session
Each Session stores:
| Field | Type | Purpose |
|---|---|---|
history | Vec<Interaction> | FIFO of turns (role + content + timestamp) |
context | serde_json::Value | Free-form JSON blob for per-session state |
last_access | timestamp | Used by TTL sweeper and cap eviction |
An Interaction is {role: User | Assistant | Tool, content, timestamp}.
Sliding window — max_history_turns
short_term:
max_history_turns: 50
Hard cap, sliding FIFO. When history.len() > max_history_turns, the
oldest entry is removed on the next push:
flowchart LR
MSG[new turn] --> PUSH[history.push]
PUSH --> CHECK{len > max?}
CHECK -->|no| DONE[done]
CHECK -->|yes| DROP[history.remove(0)]
DROP --> DONE
Old content is lost, not promoted. If you need long-term
persistence, the agent must explicitly call the memory tool with
action remember. See Long-term memory.
Session cap and eviction
short_term:
max_sessions: 10000
Soft cap across the whole process. On overflow, the oldest-idle
session (lowest last_access) is evicted to make room. Eviction
fires the on_expire callbacks — used by workspace-git to
checkpoint before tearing down the session.
max_sessions: 0 disables the cap (unbounded). Leave it at the default
unless you have a specific reason — the cap is DoS protection against
a spammer rotating chat_ids.
TTL sweeper
short_term:
session_ttl: 24h
Sessions expire after session_ttl of inactivity. The sweeper runs
every ttl / 4 (so every 6 h with the default 24 h TTL) and drops
expired sessions.
stateDiagram-v2
[*] --> Active: first message
Active --> Active: message / event<br/>(last_access updated)
Active --> Expired: idle > session_ttl
Active --> Evicted: cap exceeded
Expired --> [*]: sweeper
Evicted --> [*]: on_expire callbacks fire
Expiry also fires on_expire — good place to hook session-close
commits to a workspace-git repo.
Relationship to other memory layers
flowchart LR
STM[short-term<br/>in-memory Vec] -.->|tool call:<br/>memory.remember| LTM[(long-term<br/>SQLite)]
LTM -.->|vector enabled| VEC[(sqlite-vec)]
STM -.->|transcripts_dir| TR[(JSONL transcripts)]
STM -.->|session close| WSG[(workspace-git)]
STM does not auto-promote to LTM. Promotion happens via:
- Explicit
memory.remembertool call from the agent - Dream sweeps (Phase 10.6) that scan recall-event signals and promote hot memories
- Session-close commits to workspace-git if enabled
Gotchas
- Lost turns are gone. Once a turn falls off the sliding window it is not recoverable. If it mattered, save it to LTM before the next turn.
max_sessions: 0has no DoS guard. Only do this in single-tenant setups where you control the sender id space.last_accessupdates on any access. That includes heartbeat ticks if they read the session — effectively keeping a session alive past its TTL as long as the agent is alive.
Long-term memory (SQLite)
Durable memory shared by every agent in the process. One SQLite file,
multi-tenant via an agent_id column on every row. Survives restarts.
Source: crates/memory/src/long_term.rs.
Storage location
long_term:
backend: sqlite
sqlite:
path: ./data/memory.db
One file for all agents. Per-agent isolation is enforced by
WHERE agent_id = ? on every query — not by separate DB files. An
idx_memories_agent(agent_id, created_at DESC) index keeps those
queries fast.
If you want per-agent file separation, override sqlite.path per
agent via an inbound_bindings[] override or a per-agent config
directory.
Schema
The runtime creates these tables at boot if they don't exist.
memories — atomic facts
CREATE TABLE memories (
id TEXT PRIMARY KEY, -- UUID
agent_id TEXT NOT NULL,
content TEXT NOT NULL,
tags TEXT DEFAULT '[]', -- JSON array
concept_tags TEXT DEFAULT '[]', -- auto-derived (phase 10.7)
created_at INTEGER NOT NULL -- ms since epoch
);
CREATE INDEX idx_memories_agent ON memories(agent_id, created_at DESC);
memories_fts — full-text search (FTS5)
CREATE VIRTUAL TABLE memories_fts USING fts5(
content,
id UNINDEXED,
agent_id UNINDEXED
);
Powers the keyword recall mode with BM25 ranking.
interactions — conversation archive
CREATE TABLE interactions (
id TEXT PRIMARY KEY,
session_id TEXT NOT NULL,
agent_id TEXT NOT NULL,
role TEXT,
content TEXT,
created_at INTEGER
);
CREATE INDEX idx_interactions_session ON interactions(session_id, created_at DESC);
reminders — phase 7 heartbeat reminders
CREATE TABLE reminders (
id TEXT PRIMARY KEY,
agent_id TEXT NOT NULL,
session_id TEXT NOT NULL,
plugin TEXT,
recipient TEXT,
message TEXT,
due_at INTEGER,
claimed_at INTEGER,
delivered_at INTEGER,
created_at INTEGER
);
CREATE INDEX idx_reminders_due
ON reminders(agent_id, delivered_at, due_at ASC);
recall_events — signal tracking (phase 10.5)
CREATE TABLE recall_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
agent_id TEXT,
memory_id TEXT,
query TEXT,
score REAL,
ts_ms INTEGER
);
Every recall() hit records a row. Dream sweeps read this to decide
what to promote.
memory_promotions — dreaming ledger (phase 10.6)
CREATE TABLE memory_promotions (
memory_id TEXT PRIMARY KEY,
agent_id TEXT,
promoted_at INTEGER,
score REAL,
phase TEXT
);
Prevents double-promotion across sweeps.
vec_memories — vector index (phase 5.4, optional)
Created on demand when vector.enabled: true. See
Vector search.
What gets written when
| Action | Writes to |
|---|---|
Agent calls memory.remember(content, tags) | memories, memories_fts, vec_memories (if enabled) |
| Every turn | interactions (used for transcripts, not promoted into memories) |
Agent calls forge_reminder(...) | reminders |
Every recall() hit | recall_events (one row per result returned) |
| Dream sweep promotes hot memory | memory_promotions |
Memory tool
Single unified tool with three actions, visible to the LLM as memory:
| Action | Required | Optional | Returns |
|---|---|---|---|
remember | content | tags[], context | {ok, id} |
recall | query | limit (default 5), mode (keyword | vector | hybrid) | {ok, results: [{id, content, tags}]} |
forget | id | — | {ok} |
Results do not include similarity scores — only content and tags. Scores are used internally for dreaming signal tracking but aren't surfaced to the LLM to avoid encouraging score-gaming prompts.
Other memory-related tools:
forge_memory_checkpoint— snapshot the workspace-git repo (phase 10.9)memory_history— git log + optional unified diff (phase 10.9)
Per-agent isolation
flowchart TB
subgraph PROC[agent process]
DB[(./data/memory.db<br/>single SQLite file)]
end
A1[agent: ana] -->|WHERE agent_id = 'ana'| DB
A2[agent: kate] -->|WHERE agent_id = 'kate'| DB
A3[agent: ops] -->|WHERE agent_id = 'ops'| DB
One LongTermMemory instance per process, shared across agents via
Arc. The MemoryTool attached to each agent passes
ctx.agent_id to every query.
Workspace-git (phase 10.9)
A separate per-agent git repo lives in the agent's workspace
directory (not inside the memory DB). When workspace_git.enabled: true, the runtime commits after:
- Dream sweeps (Phase 10.6)
forge_memory_checkpointtool calls- Session close (
on_expire)
Good for forensic replay — you can git log to see the memory state
at any point. See Soul — MEMORY.md.
Gotchas
- One DB, multi-tenant. A query missing its
agent_idfilter would leak across agents. All runtime code goes through theLongTermMemoryAPI which injects it automatically. - Vacuum is manual. SQLite does not auto-compact after deletes.
Run
VACUUM;periodically (orPRAGMA auto_vacuum=incrementalfrom day one). recall_eventsgrows unboundedly. Dream sweeps periodically prune, but a dreaming-disabled agent's table will grow forever. Add a retention job if you run without dreaming.
Vector search
Optional semantic memory via sqlite-vec — a virtual table inside the same SQLite file used for long-term memory. No separate service, no extra process, no migration.
Source: crates/memory/src/vector.rs,
crates/memory/src/embedding/.
Turning it on
vector:
enabled: true
backend: sqlite-vec
default_recall_mode: hybrid
embedding:
provider: http
base_url: https://api.openai.com/v1
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY}
dimensions: 1536
timeout_secs: 30
Dimension must match the model output:
| Model | Dimensions |
|---|---|
text-embedding-3-small | 1536 |
text-embedding-3-large | 3072 |
nomic-embed-text | 768 |
Gemini text-embedding-004 | 768 |
A mismatch aborts startup with an explicit error. If you already have vectors at a different dimension, you must delete the DB (or the vector table) and rebuild the index.
Storage
CREATE VIRTUAL TABLE vec_memories USING vec0(
memory_id TEXT PRIMARY KEY,
embedding FLOAT[<dimensions>]
);
The virtual table lives in the same SQLite file as memories. A join
on memory_id brings you back the content and tags.
Embedding provider
#![allow(unused)] fn main() { trait EmbeddingProvider { fn dimension(&self) -> usize; async fn embed(&self, texts: &[String]) -> Result<Vec<Vec<f32>>>; } }
Phase 5.4 ships one provider: http — any OpenAI-compatible
/embeddings endpoint. That covers OpenAI, Gemini (via its API),
Ollama, LM Studio, and self-hosted inference.
Local-only providers (fastembed, candle) are intentional follow-ups — the HTTP provider is enough to unblock everything downstream.
Recall modes
Set the default in memory.yaml and override per tool call with the
mode argument.
keyword — FTS5 + concept expansion
flowchart LR
Q[query] --> CT[derive 3 concept tags]
Q --> M[FTS5 MATCH<br/>query OR tag1 OR tag2 OR tag3]
CT --> M
M --> R[rank by BM25]
R --> RES[top N]
- Fast, no embedding cost
- Misses semantic neighbors that don't share vocabulary
- The extra concept tags are auto-derived from the query and help narrow down concept matches
vector — nearest-neighbor
flowchart LR
Q[query] --> EMB[embed]
EMB --> VEC[vec_memories<br/>MATCH k=N*2]
VEC --> JOIN[join memories<br/>filter by agent_id]
JOIN --> RES[top N by distance]
- Catches paraphrases and cross-vocabulary matches
- Embedding request on every call — watch costs and latency
- Falls back to
keywordon provider error (viahybrid) — not on purevectormode, where errors surface
hybrid — Reciprocal Rank Fusion
The default recommendation. Runs both keyword and vector, then fuses
ranks with the RRF formula 1 / (K + rank + 1) where K = 60:
flowchart LR
Q[query] --> K[keyword search]
Q --> V[vector search]
K --> RRF[RRF fusion<br/>K=60]
V --> RRF
RRF --> RES[top N by fused score]
Vector errors degrade gracefully to keyword-only without raising.
Tool interaction
The memory tool takes an optional mode param:
{
"action": "recall",
"query": "what's the client's address?",
"limit": 5,
"mode": "hybrid"
}
If omitted, default_recall_mode is used.
Cost and latency profile
| Mode | Per recall |
|---|---|
keyword | 1 SQL query, no LLM call |
vector | 1 embedding HTTP call + 1 SQL query |
hybrid | 1 embedding HTTP call + 2 SQL queries + fusion |
For high-throughput agents that recall on every turn, start with
keyword and upgrade to hybrid only where you see miss rate
matter.
Gotchas
- Changing embedding model = full reindex. The dimension check catches the obvious case, but even same-dimension model swaps produce semantically different vectors; the old index becomes stale.
sqlite3_auto_extensionregisters once per process. Not a problem in production, but test suites that instantiate multiple SQLite connections across tests may hit edge cases.- Vector returns distance, not similarity. Lower is closer. Hybrid fusion normalizes across both, so callers don't see this directly unless they bypass the tool.
Manifest (plugin.toml)
Every extension ships a plugin.toml at its root. It declares
identity, transport, capabilities, runtime requirements, and any
bundled MCP servers. The runtime parses and validates the manifest
before spawning anything.
Source: crates/extensions/src/manifest.rs.
Minimal example
[plugin]
id = "weather"
version = "0.1.0"
name = "Weather"
description = "Fetch weather by city name."
min_agent_version = "0.1.0"
priority = 0
[capabilities]
tools = ["get_weather"]
hooks = []
[transport]
type = "stdio"
command = "./weather"
args = []
[requires]
bins = ["curl"]
env = ["WEATHER_API_KEY"]
[context]
passthrough = false
[meta]
author = "you"
license = "MIT OR Apache-2.0"
Sections
[plugin]
| Field | Required | Purpose |
|---|---|---|
id | ✅ | Unique id. Regex ^[a-z][a-z0-9_-]*$, ≤ 64 chars. Must not be a reserved id (see below). |
version | ✅ | Semver. |
name | — | Human-readable label. |
description | — | ≤ 512 UTF-8 chars. |
min_agent_version | — | Semver. Checked against the running agent version at load time. |
priority | — | i32, default 0. Lower fires first in hook chains. |
Reserved ids: agent, browser, core, email, heartbeat,
memory, telegram, whatsapp. The host may register more via
register_reserved_ids().
[capabilities]
[capabilities]
tools = ["get_weather", "get_forecast"]
hooks = ["before_message", "after_tool_call"]
channels = []
providers = []
At least one capability list must be non-empty. Names match
^[a-z][a-z0-9_]*$, ≤ 64 chars, no duplicates.
[transport]
One of three forms:
# stdio — spawn a child process
[transport]
type = "stdio"
command = "./my-extension"
args = ["--verbose"]
# nats — talk over a NATS subject prefix
[transport]
type = "nats"
subject_prefix = "ext.myext"
# http — call over HTTP
[transport]
type = "http"
url = "https://localhost:8080"
Validation: command, subject_prefix, url non-empty; url must
be http(s)://.
[requires]
[requires]
bins = ["ffmpeg", "imagemagick"]
env = ["OPENAI_API_KEY"]
Declarative preconditions used for gating: when the runtime
discovers the extension, it calls Requires::missing(). If any
bins is not on $PATH or any env is unset, the extension is
skipped (warn, not fail) and its tools are not registered.
[context]
[context]
passthrough = true
When true, every tool call sent to this extension has
_meta = { agent_id, session_id } injected into the JSON args. Lets
the extension tell calls apart per-agent without the runtime having
to encode the split into every tool signature.
[mcp_servers] (phase 12.7)
Inline MCP server declarations bundled with the extension:
[mcp_servers.gmail]
type = "stdio"
command = "./gmail-mcp"
args = []
[mcp_servers.calendar]
type = "streamable_http"
url = "https://mcp.example.com/calendar"
Each server name must match ^[a-z][a-z0-9_-]*$, ≤ 32 chars. Alternatively, drop a sidecar .mcp.json next to plugin.toml if the
manifest has no [mcp_servers] section.
Validation at a glance
flowchart TD
READ[read plugin.toml] --> PARSE[parse TOML]
PARSE --> ID{id valid?<br/>regex + length<br/>+ not reserved}
ID --> VER{version<br/>valid semver?}
VER --> MIN{min_agent_version<br/>satisfied?}
MIN --> CAPS{at least one<br/>capability declared?}
CAPS --> NAMES{capability names<br/>valid + unique?}
NAMES --> TRANS{transport<br/>non-empty +<br/>http scheme valid?}
TRANS --> MCP{mcp_server names<br/>valid?}
MCP --> OK([Manifest accepted])
ID --> FAIL([Diagnostic: Error])
VER --> FAIL
MIN --> FAIL
CAPS --> FAIL
NAMES --> FAIL
TRANS --> FAIL
MCP --> FAIL
Any failure produces a DiagnosticLevel::Error in the discovery
report — the candidate is dropped but scanning continues so an
operator sees every broken manifest at once.
Agent-version gating
[plugin]
min_agent_version = "0.2.0"
On load the runtime compares against the agent build version. A
mismatch logs a diagnostic and drops the candidate. Useful for
shipping a manifest that relies on a newer host API without
crash-looping older deployments. The host can override the reported
version for tests via set_agent_version().
Next
- Discovery and NATS runtime — how the manifest drives spawn
- CLI —
agent ext validate <path>checks a manifest without touching the registry - Templates — prebuilt skeletons to copy
Stdio runtime + Discovery
The stdio runtime is the default way extensions run: a child process speaking line-delimited JSON-RPC over stdin/stdout. This page covers how the runtime discovers, spawns, supervises, and registers tools from a stdio extension.
Source: crates/extensions/src/discovery.rs,
crates/extensions/src/runtime/stdio.rs.
Discovery
# config/extensions.yaml
extensions:
enabled: true
search_paths: [./extensions]
ignore_dirs: [node_modules, .git, target]
disabled: []
allowlist: [] # empty = all allowed
max_depth: 4
follow_links: false
watch:
enabled: false
debounce_ms: 500
ExtensionDiscovery walks each search path, looking for
plugin.toml files:
flowchart TD
ROOT[search_paths root] --> WALK[walkdir max_depth]
WALK --> IGNORE{dir in<br/>ignore_dirs?}
IGNORE -->|yes| SKIP[skip]
IGNORE -->|no| FIND[find plugin.toml]
FIND --> PARSE[parse + validate manifest]
PARSE --> SIDE[sidecar .mcp.json if manifest<br/>has no mcp_servers]
SIDE --> PRUNE[prune nested candidates]
PRUNE --> DEDUP[dedupe by id]
DEDUP --> DIS[apply disabled filter]
DIS --> ALLOW[apply allowlist filter]
ALLOW --> SORT[sort by root_index, id]
SORT --> CANDS[DiscoveryReport<br/>candidates + diagnostics]
Prune-nested removes any candidate whose root_dir is a strict
descendant of another — avoids registering an extension twice if it
happens to live inside another extension's tree. Algorithm is
O(N × depth).
follow_links = false is the default (monorepo-safe). When enabled,
symlink escapes out of the root raise DiagnosticLevel::Error.
Gating
Before spawn, Requires::missing() runs:
flowchart LR
CAND[candidate] --> REQ[requires.bins<br/>+ requires.env]
REQ --> BINS{all on $PATH?}
BINS -->|no| SKIP1[warn + skip]
BINS -->|yes| ENV{all env set?}
ENV -->|no| SKIP2[warn + skip]
ENV -->|yes| SPAWN[spawn runtime]
A skipped extension does not register any tools. The warn log names exactly which bin or env var was missing.
Spawn model
sequenceDiagram
participant H as Host (agent)
participant S as StdioRuntime
participant C as Child process
H->>S: spawn(manifest, cwd)
S->>C: tokio::process::Command
S->>C: {"jsonrpc":"2.0","method":"initialize",<br/>"params":{"agent_version","extension_id"},"id":0}
C-->>S: {"result":{"server_version","tools":[...],"hooks":[...]}}
S-->>H: HandshakeInfo
H->>H: register each tool as ExtensionTool
H->>H: register each hook as ExtensionHook
- Child is spawned with the extension's directory as
cwd stdin+stdoutis the RPC channel (line-delimited JSON)stderris routed to the agent'stracingoutput- Handshake timeout: default 10 s
Tool descriptors
{
"name": "get_weather",
"description": "Look up weather by city.",
"input_schema": { "type": "object", "properties": { "city": { "type": "string" } }, "required": ["city"] }
}
The host wraps each descriptor in an ExtensionTool:
- Registered name:
ext_{plugin_id}_{tool_name}(truncated with hash suffix if it exceeds 64 chars) - Description prefixed with
[ext:{id}]so the LLM knows the origin input_schemacopied to the registered tool
Context passthrough
If the manifest sets context.passthrough = true, every call()
injects:
{ "_meta": { "agent_id": "...", "session_id": "..." }, ...user_args }
The extension can decide how to split state per agent or session.
Env injection
The host passes through most env vars to the child, but blocks secret-like names via substring/suffix rules:
- Suffixes:
_TOKEN,_KEY,_SECRET,_PASSWORD,_CREDENTIAL,_PAT,_AUTH,_APIKEY,_BEARER,_SESSION - Substrings:
PASSWORD,SECRET,CREDENTIAL,PRIVATE_KEY
Extensions that need a secret should read it from a file path the
host passes by argument, or have the secret baked into their own
requires.env entry (which the operator whitelists consciously).
Supervision
stateDiagram-v2
[*] --> Spawning
Spawning --> Ready: handshake ok
Ready --> Restarting: child crash
Restarting --> Ready: handshake ok again
Restarting --> Failed: max attempts<br/>in restart_window
Ready --> Shutdown: graceful signal
Failed --> Shutdown
Shutdown --> [*]
Supervisor policy:
- Max restart attempts within a sliding
restart_window - Exponential backoff
base_backoff→max_backoff - Each transport is wrapped in a
CircuitBreakernamedext:stdio:{id}so hung children don't freeze the agent loop
Graceful shutdown sends an empty message, waits shutdown_grace
(default 3 s), then kills the child.
Watcher (phase 11.2 follow-up)
With extensions.watch.enabled: true the runtime watches
search_paths for changes to any plugin.toml. Change-set is
debounced (debounce_ms) and compared by SHA-256 of the file to
squash spurious writes.
On change the runtime logs — it does not auto-reload. The operator restarts the agent to pick up the new manifest. Hot reload is a future phase.
Gotchas
- Blocked env vars surprise extensions. If an extension expected
OPENAI_API_KEYto come through and it wasn't declared inrequires.env, the name-based block may silently strip it. Declare the env you need — that whitelists it. follow_links: true+ symlinked monorepo layouts can cause discovery to traverse out of the search root. Keepfollow_links: falseunless you know the layout is bounded.- Children crashing during handshake. You get a single
DiagnosticLevel::Errorper candidate, not a retry loop. Fix the binary, restart the host.
NATS runtime
For extensions that run out-of-process and manage their own lifecycle — a long-lived service on another machine, a container in an orchestrator, an operator-maintained daemon. The agent talks to them over NATS RPC instead of stdin/stdout.
Source: crates/extensions/src/runtime/nats.rs.
When to pick NATS over stdio
| Use stdio | Use NATS |
|---|---|
| Extension is a binary you ship with the agent | Extension is a separate service you operate |
| Lifecycle is tied to the agent | Lifecycle is independent (k8s, systemd) |
| Fast local startup; co-resident on same host | Might be remote or shared between hosts |
| Dev-loop: install once and forget | Sensitive deployment — deploy independently of the agent |
Stdio is the default. Reach for NATS when the extension's failure domain must be separated from the agent's.
Manifest
[plugin]
id = "heavy-compute"
version = "0.3.0"
[capabilities]
tools = ["long_running_job"]
[transport]
type = "nats"
subject_prefix = "ext.heavy-compute"
Wire shape
Single request/reply subject:
{subject_prefix}.{extension_id}.rpc
sequenceDiagram
participant A as Agent
participant N as NATS
participant E as Extension service
A->>N: publish ext.heavy-compute.rpc<br/>{method:"initialize", ...}
N->>E: deliver
E->>N: reply HandshakeInfo
N-->>A: tools + hooks
A->>A: register ExtensionTool per tool
Note over A,E: steady state
loop tool call
A->>N: {method:"tools/long_running_job", params, id}
N->>E: deliver
E-->>N: result
N-->>A: reply
end
The JSON-RPC shape is identical to stdio — only the transport changes. Extensions don't need to know which form the host chose.
Liveness
Instead of supervising a child process, the NATS runtime uses heartbeats:
| Field | Default | Purpose |
|---|---|---|
heartbeat_interval | 15 s | Expected beacon cadence from the extension. |
heartbeat_grace_factor | 3 | Mark failed after grace_factor × interval silence. |
A failed extension logs a warn and is marked unavailable. Tools stay registered in the registry but calls error out immediately. When the extension starts beaconing again, it's automatically marked available.
Circuit breaker
Same pattern as stdio: one CircuitBreaker per extension,
ext:nats:{id}, wrapping every RPC. Prevents a flapping extension
from piling up outstanding calls against it.
Deployment recipes
Docker compose side service
services:
agent:
image: nexo-rs:latest
depends_on: [nats, heavy-compute]
nats:
image: nats:2.10-alpine
heavy-compute:
image: my-ext:0.3.0
command: ["--nats-url", "nats://nats:4222",
"--subject-prefix", "ext.heavy-compute"]
Kubernetes
Run the extension as its own Deployment with its own resource
limits, rollouts, and observability. Share the NATS cluster via a
Service. Scale extensions independently of agents.
Gotchas
subject_prefixcollisions. Two extensions with the same prefix will step on each other. Enforce uniqueness in your ops convention.- Latency. NATS over LAN is sub-millisecond, but any network hop is orders of magnitude slower than stdio's pipe. Don't pick NATS for a 1 kHz tool call pattern.
- Auth on the broker. NATS auth applies to extensions too — if you turn on NKey mTLS, every extension service must be enrolled.
CLI (agent ext)
Operator-facing commands for discovering, installing, validating, and
toggling extensions. Every subcommand accepts --json for scripting.
Source: crates/extensions/src/cli/.
Subcommands
agent ext list [--json]
agent ext info <id> [--json]
agent ext enable <id>
agent ext disable <id>
agent ext validate <path>
agent ext doctor [--runtime] [--json]
agent ext install <path> [--update] [--enable] [--dry-run] [--link] [--json]
agent ext uninstall <id> --yes [--json]
list — discovered extensions
Walks the configured search_paths, prints each candidate, its
transport, and its enabled/disabled state.
info <id> — manifest + status
Prints the full parsed manifest, the runtime state if the agent is currently running, and any diagnostics attached to the candidate.
enable / disable — toggle in extensions.yaml
Rewrites the disabled list in config/extensions.yaml:
extensions:
disabled: [weather]
No runtime side effect; operator must restart the agent to apply.
validate <path> — manifest check without registering
Parses and validates a plugin.toml at <path>. Good for CI checks
on an extension's manifest before shipping.
doctor — preflight checks
Runs the same Requires::missing() logic as discovery, plus
transport-specific checks:
flowchart TB
START([agent ext doctor]) --> DISC[discover candidates]
DISC --> REQ[check requires.bins + requires.env]
REQ --> RUNT{--runtime?}
RUNT -->|yes| SPAWN[spawn each stdio extension<br/>and handshake]
RUNT -->|no| DONE([report table])
SPAWN --> DONE
--runtime actually spawns each stdio extension and runs the
handshake — useful to catch a broken binary before production
boot.
install <path> — copy or symlink
Adds an extension to the active search_paths:
agent ext install ./extensions/weather
agent ext install /abs/path/to/my-ext --link --enable
--updatereplaces an existing extension with the same id--enableadds it toextensions.yamlenabled (default: disabled until youenable)--dry-runprints what would happen without writing--linkcreates a symlink instead of copying — requires an absolute source path. Good for dev loops.
uninstall <id> --yes
Removes the extension's directory from the active search path (or the
symlink, in --link installs). --yes is mandatory — no accidental
destruction.
Exit codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Extension not found / --update target missing |
| 2 | Invalid manifest / invalid source / --link needs absolute path |
| 3 | Config write failed |
| 4 | Invalid id (reserved or empty) |
| 5 | Target exists (use --update) |
| 6 | Id collision across roots |
| 7 | uninstall missing --yes confirmation |
| 8 | Copy / atomic swap failed |
| 9 | Runtime check(s) failed (doctor --runtime) |
Non-zero codes are stable for scripting.
JSON mode
Every subcommand that produces human output also supports --json
for machine consumption. Fields are stable per code-phase; schema is
not officially frozen yet — pin to a specific agent version in CI.
Common ops flows
Ship an extension to staging
agent ext validate ./my-ext/plugin.toml
agent ext install ./my-ext --link --enable
agent ext doctor --runtime
Disable a flapping extension without redeploying
agent ext disable weather # writes to extensions.yaml
systemctl reload agent # or restart, depending on deployment
CI gate
# .github/workflows/extension.yml
- run: cargo build --release
- run: agent ext validate ./plugin.toml
Templates
The repo ships two extension templates as starting points. Copy one, rename it, fill in the tools, done.
Location: extensions/template-rust/ and extensions/template-python/.
What's shared
Both templates follow the same wire protocol and directory shape:
<your-ext>/
├── plugin.toml # manifest (see ./manifest.md)
├── README.md # what the extension does
├── <binary or script> # stdio-RPC entry point
└── ... # build files specific to the language
The agent talks to both in the same JSON-RPC 2.0 shape:
initialize— handshake; returns{server_version, tools, hooks}tools/<name>— tool invocation; returns the tool's resulthooks/<name>— hook invocation (when any hook is declared)
Line-delimited JSON over stdin/stdout. stderr is forwarded to the agent's tracing output — that's your debug log.
Rust template (extensions/template-rust/)
Standalone Cargo project outside the agent workspace — its own
Cargo.toml, own Cargo.lock, own target/. Keeps your extension's
deps independent of the agent's.
template-rust/
├── Cargo.toml
├── Cargo.lock
├── plugin.toml
├── README.md
├── src/
│ └── main.rs # JSON-RPC loop
└── target/ # (gitignore)
src/main.rs implements:
#![allow(unused)] fn main() { // pseudocode loop { let line = read_line_from_stdin(); let req: JsonRpcRequest = parse(line); let result = match req.method.as_str() { "initialize" => handshake_info(), "tools/ping" => ping(req.params), "tools/add" => add(req.params), "hooks/before_message" => pass(), _ => method_not_found(), }; write_line_to_stdout(json!({ "jsonrpc": "2.0", "id": req.id, "result": result })); } }
Build with cargo build --release; the release binary at
./target/release/template-rust is what plugin.toml::transport.command
points at.
Python template (extensions/template-python/)
template-python/
├── plugin.toml
├── main.py # #!/usr/bin/env python3
└── README.md
stdlib only (no pip install). Same JSON-RPC loop over stdin/stdout.
Logs to stderr via print(..., file=sys.stderr).
Good for quick extensions where starting a Python interpreter per tool call is acceptable (batch workloads, cron-ish tasks, one-off scripting).
Promoting a template to your own extension
flowchart LR
COPY[copy template-rust<br/>to my-extension] --> EDIT[edit plugin.toml<br/>id, version, tools]
EDIT --> CODE[implement tools/...]
CODE --> BUILD[cargo build --release]
BUILD --> VAL[agent ext validate<br/>./my-extension/plugin.toml]
VAL --> INSTALL[agent ext install<br/>./my-extension --link --enable]
INSTALL --> DOCTOR[agent ext doctor<br/>--runtime]
Conventions in the shipped templates
plugin.tomldeclares the minimum required capabilities — no phantom hooks or toolsrequires.bins/requires.envleft empty; add your own[context] passthrough = false— opt in explicitly when you need per-agent / per-session state- License left blank — pick one and add it to
[meta]
Gotchas
- Rust template builds in its own workspace. Don't
cargo addfrom the repo root — that edits the agent workspace, not the extension. - Python template spawns a new interpreter per extension, not per
tool call. Stdin/stdout stay open for the life of the process.
Don't
exitafter one tool call. - JSON-RPC ids must echo back. If your handler drops the
idfield, the agent can't correlate the reply.
1Password extension
A bundled stdio extension that wraps the op CLI
with a service-account token. Read-only: it never creates or edits
secrets. Two main use cases:
- Look up a secret you don't already have in env (
read_secret). - Use a secret in a command without ever exposing it to the agent
(
inject_template).
Source: extensions/onepassword/. Skill prompt: skills/onepassword/SKILL.md.
Tools
| Tool | Reveals secret? | Audited |
|---|---|---|
status | no | no |
whoami | no | no |
list_vaults | no | no |
list_items | no — strips field values | no |
read_secret | only if OP_ALLOW_REVEAL=true | yes |
inject_template | template-only mode reveals only with OP_ALLOW_REVEAL=true; exec mode never reveals to the LLM | yes |
read_secret
{ "action": "read_secret", "reference": "op://Prod/Stripe/api_key" }
Default response (reveal off):
{
"ok": true,
"reference": "op://Prod/Stripe/api_key",
"vault": "Prod", "item": "Stripe", "field": "api_key",
"length": 26,
"fingerprint_sha256_prefix": "3f9a7c2e1b48d5a0",
"reveal": false
}
With OP_ALLOW_REVEAL=true|1|yes set on the agent process, the
response also contains { "value": "...", "reveal": true }.
inject_template
Resolves {{ op://Vault/Item/field }} placeholders via op inject.
Two execution paths:
Template-only
{ "action": "inject_template",
"template": "Authorization: Bearer {{ op://Prod/API/token }}\n" }
- Reveal off →
{ length, fingerprint_sha256_prefix, reveal: false } - Reveal on →
{ rendered: "Authorization: Bearer abc…", reveal: true }
Exec (piped to a command)
{ "action": "inject_template",
"template": "Bearer {{ op://Prod/API/token }}",
"command": "curl",
"args": ["-H", "@-", "https://api.example.com/me"] }
commandmust be inOP_INJECT_COMMAND_ALLOWLIST(comma-separated). Default empty → exec mode disabled.- Rendered template is never returned to the LLM. Only the
downstream command's
exit_code,stdout(capped atmax_stdout_bytes, default 4096, max 16384), andstderr. - Both
stdoutandstderrare redacted before being returned — Bearer JWT,sk-…,sk-ant-…,AKIA…, and 32+ char hex tokens are replaced with[REDACTED:<label>].
Dry run
{ "action": "inject_template",
"template": "{{ op://A/B/c }} {{ op://X/Y/z }}",
"dry_run": true }
Validates each op:// reference's shape without resolving values.
Returns references_validated.
Configuration
Environment variables consumed by the extension:
| Var | Purpose | Default |
|---|---|---|
OP_SERVICE_ACCOUNT_TOKEN | required | — |
OP_ALLOW_REVEAL | true/1/yes to allow value reveal | off |
OP_AUDIT_LOG_PATH | JSONL audit log path | ./data/secrets-audit.jsonl |
OP_INJECT_COMMAND_ALLOWLIST | comma-separated allowed exec commands | empty (exec disabled) |
OP_INJECT_TIMEOUT_SECS | per-call timeout (capped at MAX_TIMEOUT_SECS) | 30 |
OP_TIMEOUT_SECS | per-call timeout for non-inject commands | 15 |
AGENT_ID | injected by the host on spawn — appears in audit | — |
AGENT_SESSION_ID | injected by the host on spawn | — |
Audit log
read_secret and inject_template append one JSON line per call to
OP_AUDIT_LOG_PATH. The log is append-only and contains only
metadata — never the secret value.
{"ts":"2026-04-25T18:00:00Z","action":"read_secret","agent_id":"kate","session_id":"f1...","op_reference":"op://Prod/Stripe/token","fingerprint_sha256_prefix":"a1b2c3d4e5f6789a","reveal_allowed":false,"ok":true}
{"ts":"2026-04-25T18:00:05Z","action":"inject_template","agent_id":"kate","session_id":"f1...","references":["op://Prod/Stripe/token"],"command":"curl","args_count":4,"dry_run":false,"ok":true,"exit_code":0,"stdout_total_bytes":124,"stdout_returned_bytes":124,"stdout_truncated":false}
{"ts":"2026-04-25T18:00:10Z","action":"inject_template","agent_id":"kate","session_id":null,"references":["op://Bad/Ref"],"command":"rm","args_count":0,"dry_run":false,"ok":false,"error":"command_not_in_allowlist"}
Failures writing the log are reported to stderr and never block the tool — the secret has already been read or piped; refusing to log would be worst-of-both-worlds.
Rotate with logrotate or any append-aware rotator. Keeping the log
on a partition with limited write access (separate user, AppArmor,
or dedicated tmpfs) reduces forensic tampering surface.
Threat model
- The agent process is trusted. Reveal is gated by an env var the operator controls; once on, the value is just a string in memory that flows through the LLM, transcripts, and any tool that touches it.
- Exec mode is the recommended path for any operation that does not require the agent to see the secret. The LLM only knows that the operation succeeded, not what the credential looked like.
- Redaction is best-effort. Stdout from a poorly-behaved command
could still leak a secret in a shape we don't recognize. Cap the
max_stdout_bytesaggressively when in doubt. - The audit log is not encrypted. It contains references and fingerprints, not values. If even the references are sensitive, put the log on a permissioned filesystem.
Model Context Protocol (MCP)
nexo-rs is both an MCP client (consumes tools from external MCP servers) and an MCP server (exposes its own tools so editors like Claude Desktop, Cursor, Zed can use them). Same wire, different directions.
Source: crates/mcp/, bridges in crates/core/src/agent/mcp_*.
The two directions
flowchart LR
subgraph IDE[MCP clients]
CD[Claude Desktop]
CUR[Cursor]
ZED[Zed]
end
subgraph AGENT[agent process]
AS[Agent-as-server<br/>stdio bridge]
AC[Agent-as-client<br/>session runtime]
end
subgraph EXT[External MCP servers]
GS[Gmail MCP]
DB[DB MCP]
WF[Workflow MCP]
end
IDE --> AS
AS --> AR[Agent tools registry]
AC --> EXT
AR --> AC
- Server side — an MCP client (e.g. Claude Desktop) runs
agent mcp serve. The agent's internal tools appear as MCP tools in that client. - Client side — the agent spawns external MCP servers (stdio or
HTTP) and registers their tools into its own
ToolRegistry, so agents can call them exactly like built-ins or extensions.
Phase map
| Phase | What it adds |
|---|---|
| 12.1 | MCP client over stdio |
| 12.2 | MCP client over HTTP (streamable + SSE fallback) |
| 12.3 | Tool catalog — merge MCP tools with extensions and built-ins |
| 12.4 | Session runtime — per-session child spawn, sentinel-shared default |
| 12.5 | Resources — resources/list + resources/read with optional LRU cache |
| 12.6 | Agent as MCP server (stdio) |
| 12.7 | MCP servers declared by extensions |
| 12.8 | tools/list_changed debounced hot-reload |
All eight landed. See PHASES.md.
Why both sides
Being a client lets agents tap any MCP ecosystem without needing a custom extension per service — if the thing you want speaks MCP, you can reach it today.
Being a server lets the carefully-sandboxed tool surface of
nexo-rs (allowed_tools, outbound_allowlist, etc.) be reused from
any MCP-speaking client. Your LLM-driven IDE gets access to WhatsApp
send, Gmail poll, browser CDP, and everything else — without you
wiring each one into the IDE's config.
Wire shape (both directions)
JSON-RPC 2.0. For transports:
- stdio — child process, line-delimited JSON on stdin/stdout
- streamable HTTP — modern MCP 2024-11-05 shape
- SSE — legacy; used as automatic fallback
sequenceDiagram
participant H as Host (agent or IDE)
participant S as MCP server
H->>S: initialize (id=0)
S-->>H: InitializeResult (capabilities, serverInfo)
H->>S: notifications/initialized (fire-and-forget)
loop steady state
H->>S: tools/list
S-->>H: tools[]
H->>S: tools/call {name, args}
S-->>H: content blocks
end
alt tool list changes
S-->>H: notifications/tools/list_changed
H->>S: tools/list (debounced refresh)
end
Where to go next
- Client (stdio + HTTP) — consuming external MCP servers from agents
- Agent as MCP server — exposing the agent's tools over MCP
MCP client (stdio + HTTP)
How nexo-rs consumes tools from external MCP servers. Every MCP tool
ends up in the same ToolRegistry that hosts built-ins and
extensions — the LLM calls them identically.
Source: crates/mcp/src/client.rs, crates/mcp/src/http/client.rs,
crates/mcp/src/manager.rs, crates/mcp/src/session.rs,
crates/core/src/agent/mcp_catalog.rs.
Config
# config/mcp.yaml
mcp:
enabled: true
session_ttl: 30m
idle_reap_interval: 60s
connect_timeout_ms: 10000
call_timeout_ms: 30000
shutdown_grace_ms: 3000
servers:
gmail:
transport:
type: stdio
command: ./mcp-gmail
args: []
env:
GMAIL_TOKEN: ${file:./secrets/gmail_token.json}
workflow:
transport:
type: http
url: https://mcp.example.com/workflow
mode: auto # streamable_http | sse | auto
headers:
Authorization: Bearer ${WORKFLOW_TOKEN}
resource_cache:
enabled: true
ttl: 30s
max_entries: 256
resource_uri_allowlist: [] # empty = permissive
strict_root_paths: false
context:
passthrough: true
sampling:
enabled: false
watch:
enabled: false
debounce_ms: 200
Transports
stdio
Child process per server. Line-delimited JSON-RPC 2.0 over
stdin/stdout. stderr is routed to the agent's tracing output.
sequenceDiagram
participant M as McpRuntimeManager
participant S as Server (child process)
M->>S: spawn Command(cmd, args, env)
M->>S: {"method":"initialize","id":0, ...}
S-->>M: capabilities + serverInfo
M->>S: notifications/initialized (no-reply)
Note over M,S: steady state — tools/list, tools/call, resources/*
M->>S: notifications/cancelled (per in-flight id)<br/>then shutdown_grace
HTTP — streamable vs SSE
Three modes selectable per server:
mode | Behavior |
|---|---|
streamable_http | MCP 2024-11-05 spec — modern |
sse | Legacy Server-Sent Events fallback |
auto (default) | Try streamable_http; on 404/405/415, fall back to SSE |
Each connection gets an mcp-session-id header. Additional headers
(auth, routing) pass through a HeaderMap; values are env-resolved
at config load.
Session runtime
A single McpRuntimeManager lives per process. Inside, a
SessionMcpRuntime per conversation session keeps its own map of
live MCP clients:
flowchart TB
MGR[McpRuntimeManager<br/>one per process]
MGR --> SENT[Sentinel session<br/>UUID = nil<br/>shared by all agents]
MGR --> S1[session A runtime]
MGR --> S2[session B runtime]
SENT --> C1[mcp client: gmail]
SENT --> C2[mcp client: workflow]
S1 --> CX[session-scoped clients<br/>for stateful servers]
- Sentinel session (UUID =
nil) is the default shared namespace — all agents see the same clients, avoiding duplicate child processes for servers that don't need per-session isolation - Per-session runtimes are spawned when a server genuinely needs independent state (example: a workflow engine that tracks its own context per user)
- Idle reap — every
idle_reap_interval, the manager disposes sessions unused for longer thansession_ttl, shutting their clients down gracefully - Config fingerprinting — changes to the
serversset produce a new fingerprint; runtimes are rebuilt on request; concurrent requests de-dupe so only one rebuild happens
Tool catalog
McpToolCatalog::build() calls tools/list on every configured
server in parallel and merges the results:
flowchart LR
LIST[tools/list per server<br/>parallel] --> PREFIX[prefix names:<br/>server_toolname]
PREFIX --> MERGE[merge into ToolRegistry]
MERGE --> LLM[tools visible to LLM]
LIST -.->|single-server error| ERR[non-fatal:<br/>server visible with error=...]
- Names are always prefixed
{server_name}_{tool_name}so collisions across servers can't happen - Duplicates within the same server → first wins, warn log
input_schemais passed through verbatim- Server capability
resourcesunlocks two meta-tools for reading resources
Tool call flow
sequenceDiagram
participant A as Agent
participant C as McpCatalog tool
participant R as SessionMcpRuntime
participant S as MCP server
participant CB as CircuitBreaker
A->>C: invoke gmail_list_messages(...)
C->>R: call(server=gmail, tool=list_messages, args)
R->>CB: allow?
CB-->>R: yes
R->>S: tools/call {name, args, _meta}
S-->>R: content blocks
R-->>C: content
C-->>A: result
Every RPC goes through a per-server CircuitBreaker. If the breaker
is open, the call fails fast instead of hanging on a dead server.
Context passthrough
When mcp.context.passthrough: true, tools/call injects:
{ "_meta": { "agent_id": "ana", "session_id": "..." }, ...args }
Server-side code can use this to scope state per agent without the schema leaking that concern.
Resources
Servers advertising resources capability unlock:
resources/list(paginated viacursor, max 64 pages)resources/read(optionally cached via LRU)resources/templates/list(URI templates)
Cache config:
resource_cache:
enabled: true
ttl: 30s
max_entries: 256
Cache invalidates on
notifications/resources/list_changed. Optional per-scheme allowlist
(resource_uri_allowlist: ["file", "db"]) rejects unknown URI
schemes before dispatch.
Hot reload (phase 12.8)
flowchart LR
S[server notifies<br/>tools/list_changed] --> DBC[200 ms debounce]
DBC --> REL[catalog rebuild]
REL --> REG[ToolRegistry re-populated<br/>with new schema]
Same flow for resources. Agents in flight at the moment of the rebuild keep their references to the old tool definitions — next turn uses the refreshed registry.
Gotchas
- One MCP child per server by default. Turn on per-session isolation only for servers that genuinely need it; spawning a child per session multiplies resource cost.
notifications/initializedis fire-and-forget. If the server insists on acknowledging it, you have a broken server.- SSE is a last resort. It's in
autofor compatibility; new server deployments should speak streamable HTTP. - Circuit breakers are per-server. One bad server doesn't freeze the catalog; but a flapping one still slows the agent loop via backoff waits.
Agent as MCP server
Expose the agent's tools over MCP so Claude Desktop, Cursor, Zed, or any other MCP-speaking client can use them. Stdio transport; the agent runs as a child process of the consuming client.
Source: crates/mcp/src/server/, crates/core/src/agent/mcp_server_bridge.rs.
Config
# config/mcp_server.yaml
enabled: true
name: agent
allowlist: [] # empty = every native tool; populated = strict allowlist
expose_proxies: false # set true to also expose ext_* and mcp_* proxy tools
auth_token_env: "" # optional env var holding a shared bearer token
| Field | Default | Purpose |
|---|---|---|
enabled | false | Must be true for the server subcommand to start. |
name | "agent" | Reported as serverInfo.name in handshake. |
allowlist | [] | Empty = all native tools. Populated = only these names reach the MCP client. Globs (memory_*) supported. |
expose_proxies | false | Whether ext_* (extension) and mcp_* (upstream MCP) proxy tools are surfaced. |
auth_token_env | "" | If set, the initialize request must present this token; unauthenticated clients get rejected. |
Running it
agent mcp serve --config ./config
The process reads JSON-RPC from stdin and writes responses to stdout — exactly the shape Claude Desktop, Cursor, etc. expect.
Claude Desktop example
~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"nexo": {
"command": "/usr/local/bin/agent",
"args": ["mcp", "serve", "--config", "/srv/nexo-rs/config"],
"env": {
"ANTHROPIC_API_KEY": "sk-ant-..."
}
}
}
}
The Anthropic client spawns the agent, handshakes, and then every agent tool shows up in the conversation's tool list.
Wire flow
sequenceDiagram
participant IDE as MCP client (Claude Desktop)
participant A as agent mcp serve
participant TR as ToolRegistry
participant AG as Agent tools
IDE->>A: initialize (auth_token if configured)
A-->>IDE: capabilities + serverInfo (name, version)
IDE->>A: notifications/initialized
loop every turn
IDE->>A: tools/list
A->>TR: filtered by allowlist + expose_proxies
A-->>IDE: tool defs
IDE->>A: tools/call {name, args}
A->>AG: invoke tool
AG-->>A: result
A-->>IDE: content blocks
end
Tool exposure rules
flowchart TD
ALL[every tool registered in ToolRegistry]
ALL --> FILT1{allowlist<br/>empty?}
FILT1 -->|yes| NATIVE[keep native tools only]
FILT1 -->|no| GLOB[keep tools matching allowlist]
NATIVE --> FILT2{expose_proxies?}
GLOB --> FILT2
FILT2 -->|yes| OUT[include ext_* and mcp_* too]
FILT2 -->|no| SKIP[drop ext_* and mcp_*]
OUT --> EMIT[tools/list response]
SKIP --> EMIT
- Native tools —
memory_*,whatsapp_*,telegram_*,browser_*,forge_*, etc. - Proxy tools —
ext_<id>_<tool>for extensions,<server>_<tool>for upstream MCP. Hidden by default to avoid proxying an external server through to another external client.
Capabilities advertised
tools— alwaysresources— advertised only if the agent exposes any via the server handler (phase 12.5 puts the groundwork in, consumer features follow)prompts— reserved, not advertised yetlogging— conditional on handler implementation
Auth
When auth_token_env is set, the initialize request must present
the token (via a server-specific header convention or as an _meta
field). Clients that don't know the token get rejected before
anything else happens. Useful when the agent is launched through a
shared-host proxy rather than a local command: spawn.
Security model
- Read-only by default? No — the server exposes whatever the
allowlist permits. Model it explicitly:
allowlist: - memory_recall # read memory - memory_store # write memory (remove for read-only) - Outbound channels (
whatsapp_send_message,telegram_send_message) will send real messages from the agent's configured accounts. Include them in the allowlist only if the IDE user should be able to do that. expose_proxies: trueis transitive power. It gives the IDE the full tool set of every extension and upstream MCP server too.
Gotchas
- Allowlist globs match tool names, not prefixes.
memory_*matchesmemory_recallandmemory_storebut notmemory_history(phase 10.9 tool). Write the pattern to match the real set. - No per-IDE-user identity. The server has one identity = the agent's configured credentials. If multiple humans share the IDE, they share the agent's blast radius.
- Proxies forward the agent's rate limits. Calling
whatsapp_send_messagethrough the MCP server is the same as an agent calling it — counts against the same WhatsApp rate bucket.
Skills catalog
nexo-rs uses "skill" to mean two different things. Both are covered on this page; gating semantics for each live in Gating by env / bins.
- Extension skills — shipped under
extensions/in the repo, discovered and spawned like any other stdio extension. 22 of them landed in Phase 13. - Local skills — markdown files under an agent's
skills_dir/that get injected into the system prompt at turn start.
The two overlap in name but not in mechanism:
| Extension skill | Local skill | |
|---|---|---|
| Where it lives | extensions/<id>/ with plugin.toml | skills/<name>/SKILL.md |
| How it's loaded | Extension discovery → stdio spawn | SkillLoader at turn time |
| What it produces | Tools in ToolRegistry | Text injected into the prompt |
| Gating | Warn + continue, tools still registered | Warn + skip entirely |
Extension skills (Phase 13)
All shipped as stdio extensions written in Rust. _common is a shared
Rust library (circuit-breaker primitives), not an extension itself.
Core utilities
| Id | Purpose | Requires |
|---|---|---|
weather | Current + forecast via Open-Meteo (no auth). | — |
openstreetmap | Forward / reverse geocoding via Nominatim. | — |
wikipedia | Article search + summaries. | — |
fetch-url | HTTP GET / POST with SSRF guard, retries, circuit breaker. | — |
rss | Fetch & parse RSS / Atom / JSON feeds. | — |
dns-tools | A/AAAA/MX/TXT/NS/SOA/SRV + reverse + whois. | — |
endpoint-check | HTTP probe (status + latency) + TLS cert inspection. | — |
pdf-extract | Extract text from PDFs. | — |
translate | LibreTranslate self-hosted or DeepL API. | — |
summarize | Chat-based text/file summary via OpenAI-compat endpoint. | — |
openai-whisper | Audio transcription via OpenAI-compat /audio/transcriptions. | — |
Search & knowledge
| Id | Purpose | Requires |
|---|---|---|
brave-search | Web search. | env BRAVE_SEARCH_API_KEY |
goplaces | Google Places text search + details. | — |
wolfram-alpha | Computational queries (short + full pods). | env WOLFRAM_APP_ID |
Infra & ops
| Id | Purpose | Requires | Write-gate |
|---|---|---|---|
github | REST API: PRs, checks, issues. | env GITHUB_TOKEN | — |
cloudflare | DNS, zones, cache purge. | env CLOUDFLARE_API_TOKEN | — |
docker-api | ps, inspect, logs, stats, start, stop, restart. | bin docker | env DOCKER_API_ALLOW_WRITE |
proxmox | Proxmox VE: nodes, VMs, containers, lifecycle. | env PROXMOX_TOKEN | env PROXMOX_ALLOW_WRITE, env PROXMOX_INSECURE_TLS for self-signed certs |
onepassword | 1Password secrets metadata; reveal gated. | bin op, env OP_SERVICE_ACCOUNT_TOKEN | env OP_ALLOW_REVEAL |
ssh-exec | Remote command execution with host allowlist. | bin ssh, scp | host allowlist in config |
tmux-remote | Drive tmux sessions (create, send keys, capture, kill). | bin tmux | — |
Media & content
| Id | Purpose | Requires |
|---|---|---|
msedge-tts | Text-to-speech via Edge Read Aloud. | — |
rtsp-snapshot | Frames / clips from RTSP or HTTP camera streams. | bin ffmpeg |
video-frames | Extract frames + audio from videos. | bin ffmpeg, ffprobe |
tesseract-ocr | OCR with language packs + PSM modes. | bin tesseract |
yt-dlp | Download video / audio / metadata. | bin yt-dlp |
spotify | Now-playing, search, play, pause, skip. | env SPOTIFY_ACCESS_TOKEN |
Google (phase 13.18)
Single google extension covering 32 tools across Gmail,
Calendar, Tasks, Drive, People, and Photos. Uses OAuth refresh-token
flow. Writes gated by five independent env flags:
GOOGLE_ALLOW_SEND— Gmail sendGOOGLE_ALLOW_CALENDAR_WRITEGOOGLE_ALLOW_DRIVE_WRITEGOOGLE_ALLOW_TASKS_WRITEGOOGLE_ALLOW_PEOPLE_WRITE
See Plugins — Google for the OAuth setup and
the generic google_call tool that fronts the extension.
LLM providers (phase 13.19)
anthropic and gemini are native LLM clients living under
crates/llm/, not extensions. See
LLM providers and children.
Templates
| Id | Purpose | Language |
|---|---|---|
template-rust | Copy-and-edit skeleton (ping, add). | Rust |
template-python | stdlib-only skeleton. | Python |
Local skills
Local skills are markdown files loaded by SkillLoader and injected
into the system prompt at turn time. Defined in the agent config:
# agents.yaml
agents:
- id: kate
skills_dir: ./skills
skills:
- weather
- github
- summarize
- google-auth
Each entry resolves to <skills_dir>/<name>/SKILL.md:
---
name: "Weather"
description: "Current conditions and forecasts"
requires:
bins: ["curl"]
env: ["WEATHER_API_KEY"]
max_chars: 5000
---
# Weather skill
Call `weather_forecast(city)` to get a 3-day forecast.
Use metric units. Default to the user's locale when unspecified.
Loading flow
flowchart TD
CFG[agents.yaml skills: list] --> LOOP[for each name]
LOOP --> READ[read skills_dir/name/SKILL.md]
READ --> FM[parse YAML frontmatter]
FM --> GATE{bins on PATH<br/>AND env set?}
GATE -->|no| SKIP[warn + skip<br/>not injected]
GATE -->|yes| RENDER[render into prompt:<br/>heading + blockquote + body]
RENDER --> TRUNC[truncate to max_chars]
TRUNC --> INJECT[inject into system prompt]
Why local skills skip-on-miss (vs extensions warn-and-continue)
A local skill is a text instruction to the LLM describing a capability. If the backing bin/env isn't available the tool will fail — but worse, the LLM was told the capability exists and will repeatedly try to use it. Skipping the skill prevents lying to the model.
An extension is a registered tool. If the LLM invokes it and the backing bin is missing, the tool returns an error — the LLM observes and adapts. Warn-and-continue is fine.
See Gating for the full semantics.
How to pick
- Need the LLM to know how to do something (usage pattern, style rules, examples)? → local skill.
- Need the LLM to do something (make a call, return data)? → extension skill.
- Both? → ship the extension and write a local skill next to it that explains when to use it.
Gating by env / bins
Both kinds of skills (extension skills under extensions/ and local
skills under skills_dir) declare what they need to work. The
runtime checks those preconditions at load time and reacts
differently depending on skill kind.
The declaration
Both kinds use the same shape. For an extension, it lives in
plugin.toml:
[requires]
bins = ["ffmpeg", "ffprobe"]
env = ["OPENAI_API_KEY"]
For a local skill it lives in the YAML frontmatter of SKILL.md:
---
name: "Whisper transcription"
requires:
bins: ["ffmpeg"]
env: ["OPENAI_API_KEY"]
---
Check semantics (source: crates/extensions/src/manifest.rs
Requires::missing(), crates/core/src/agent/skills.rs):
- bins — each name looked up on
$PATH. On Windows also<bin>.exe. - env — each name must be set and non-empty.
Two reactions, one mechanism
flowchart TD
CHECK[Requires::missing] --> ANY{missing bin<br/>or env?}
ANY -->|no| OK[proceed]
ANY -->|yes| KIND{skill kind}
KIND -->|extension| WARN[warn<br/>continue<br/>tools still registered]
KIND -->|local skill| SKIP[warn<br/>skip<br/>not injected into prompt]
| Skill kind | On missing preconditions |
|---|---|
| Extension | Warn log, still spawn + register tools. A subsequent tool call will fail visibly when the bin/env is absent. |
| Local skill | Warn log, do not inject into the system prompt. The LLM never hears the skill existed. |
Why the difference
A local skill is a description the LLM reads and internalizes —
"you have a transcription skill, call whisper_transcribe." If the
backing binary is missing, the tool call will fail. But the LLM was
told the capability exists, so it will keep trying. Not injecting
the skill prevents promising capabilities that can't be delivered.
An extension tool is observable: the LLM calls it, gets a
concrete error back ("command tesseract not found on PATH"), and
can adapt in the same turn. Warn-and-continue is the friendlier
behavior — the operator sees the warning and can fix the config
without the agent crash-looping.
Where this is logged
Both kinds emit the same structured warn log fields:
WARN skill=weather missing_bins=[] missing_env=[WEATHER_API_KEY]
"skill disabled: required env vars unset or empty"
WARN extension=docker-api missing_bins=[docker] missing_env=[]
"extension preflight: declared requires not satisfied (continuing anyway)"
Filter on missing_env or missing_bins to alert proactively.
Pre-deploy verification
Use the CLI:
agent ext doctor --runtime
This runs Requires::missing() for every discovered extension,
and with --runtime actually spawns each stdio extension to run
the handshake. Nothing is left to chance.
For local skills, a failing agent turn logs all skipped skills — a dry run against the smallest scripted input gives you the same signal without needing a separate command.
Reserved env for secrets
Extensions receive a filtered copy of the host's env. Names matching
the secret-like patterns below are stripped before spawn
(crates/extensions/src/runtime/stdio.rs):
- Suffixes:
_TOKEN,_KEY,_SECRET,_PASSWORD,_PASSWD,_PWD,_CREDENTIAL,_CREDENTIALS,_PAT,_AUTH,_APIKEY,_BEARER,_SESSION - Substrings:
PASSWORD,SECRET,CREDENTIAL,PRIVATE_KEY
Declaring an env in requires.env whitelists it past the
blocklist. That's the only supported way for an extension to
receive a secret env var. Gating and whitelisting come from the same
field — preconditions you declare travel alongside the value you
want.
Write-gating in practice
Some shipped extensions gate destructive operations behind dedicated
flags — separate from requires.env:
| Extension | Write gate env var |
|---|---|
docker-api | DOCKER_API_ALLOW_WRITE |
proxmox | PROXMOX_ALLOW_WRITE |
onepassword | OP_ALLOW_REVEAL (reveal vs metadata-only) |
google | GOOGLE_ALLOW_SEND, GOOGLE_ALLOW_CALENDAR_WRITE, GOOGLE_ALLOW_DRIVE_WRITE, GOOGLE_ALLOW_TASKS_WRITE, GOOGLE_ALLOW_PEOPLE_WRITE |
These are not handled by the generic gating layer — the extension reads them itself and refuses destructive methods when unset. Good pattern to adopt when your own extension wraps an API with destructive endpoints.
Gotchas
- Empty env counts as missing.
EXAMPLE_KEY=is treated the same asEXAMPLE_KEYunset. This is intentional — empty strings rarely mean "use the default" for a secret. requires.binschecks$PATHat discovery. A binary installed after the agent starts won't be picked up until restart — or until you runagent ext doctor --runtimeas a secondary gate.- Local-skill skip is silent to the LLM. If you expected a skill to be present and you don't see it in the system prompt, check the warn logs for the skip reason before debugging agent behavior.
Dependencies — modes and bin versions
A skill that depends on a CLI tool or an environment variable can
declare those needs in requires. The runtime resolves the
declarations at load time and decides whether to expose the skill,
hide it, or expose it with a visible warning the LLM can see.
---
name: ffmpeg-tools
requires:
bins: [ffmpeg]
env: [TRANSCODE_OUTPUT_DIR]
bin_versions:
ffmpeg: ">=4.0"
mode: strict # default
---
Modes
| Mode | When deps are missing | LLM sees the skill? |
|---|---|---|
strict (default) | Skill is dropped | No |
warn | Skill loads with a > ⚠️ MISSING DEPS … banner prepended to its body | Yes — with the warning inline |
disable | Skill is always dropped, even when deps are satisfied | No |
Per-agent override
Operators override a skill's declared mode without editing the skill file:
agents:
- id: kate
skills: [ffmpeg-tools]
skill_overrides:
ffmpeg-tools: warn
Resolution order:
agents.<id>.skill_overrides[<name>](operator wins)- Skill frontmatter
requires.mode strict(built-in default)
Bin versions
requires.bin_versions adds a semver constraint on top of mere bin
presence. Failing the constraint is treated like a missing dep —
the active mode decides whether to skip or warn.
Constraint syntax
semver request strings:
| Want | Constraint |
|---|---|
| At least 4.0 | ">=4.0" |
| Any 4.x compatible release | "^4.0" |
| 4.x but no 5 | ">=4.0, <5.0" |
| Exact 4.2.1 | "=4.2.1" |
| Patch-compatible to 5.1.3 | "~5.1.3" |
Versions like 4.2 are normalized to 4.2.0 before comparison so
constraint matching works against partial outputs.
Custom probe
Defaults: <bin> --version, regex \d+\.\d+(?:\.\d+)?. Override
when a tool emits something idiosyncratic:
requires:
bin_versions:
curl:
constraint: ">=8.0"
command: "--help"
regex: 'curl (\d+\.\d+(?:\.\d+)?)'
The shorthand form bin: ">=4.0" and the long form
bin: { constraint: …, command: …, regex: … } are both accepted.
Probe fail modes
| Reason | When |
|---|---|
bin_not_found | Binary not on PATH |
probe_failed | Spawn errored or timed out (5 s cap) |
parse_failed | The default regex (or override) didn't match |
constraint_unsatisfied | Found version doesn't match the constraint |
invalid_constraint | Constraint string couldn't be parsed as semver |
Invalid constraints log at error level; the skill is treated as
having a missing dep — boot continues so a typo in one skill doesn't
take the whole agent down. Probes are cached process-wide by absolute
path so a bin shared across skills only spawns once.
Banner format
When mode: warn and any dep is missing, the skill body is rendered
to the LLM with this prefix:
> ⚠️ MISSING DEPS for skill `ffmpeg-tools`:
> - bin not found: ffmpeg
> - env unset: TRANSCODE_OUTPUT_DIR
> - version mismatch: ffmpeg requires >=4.0 (found 3.4.2)
> Calls into this skill may fail.
The LLM treats this like any other markdown context, so it has the information it needs to either avoid the skill or report a useful error to the user when a tool call fails.
Backwards compatibility
Skills without requires.mode, requires.bin_versions, or
agents.<id>.skill_overrides keep the prior behavior (strict, no
version checks). The defaults are chosen so an unmodified skill
catalog and existing agents.yaml continue to work unchanged.
TaskFlow model
TaskFlow is a durable, multi-step flow runtime that survives process restarts and external waits. It's designed for work that spans more LLM turns than a single conversation buffer can hold — approvals, data pipelines, delegated subtasks, scheduled actions.
Source: crates/taskflow/ (types.rs, store.rs, engine.rs).
When to use it
Use TaskFlow when any of the following apply:
- A task needs to pause and resume later (hours, days)
- Multiple agents collaborate on one outcome
- You need a full audit trail of what happened and when
- You need recovery from a crash mid-task
If it's a one-shot turn, don't reach for TaskFlow — the runtime's normal session buffer is enough.
Flow shape
A flow is an opaque state_json (free-form JSON) plus metadata:
| Field | Purpose |
|---|---|
id | UUID generated on creation. |
controller_id | String label identifying the flow definition (e.g. kate/inbox-triage). |
goal | Human-readable statement of intent. |
owner_session_key | agent:<id>:session:<session_id> — hard tenancy gate. |
requester_origin | Who asked (user id, external system id). |
current_step | String label for the current phase ("classify", "await_approval", …). |
state_json | Free-form JSON owned by the flow — the LLM mutates this over time. |
wait_json | Current wait condition while status = Waiting. |
status | See state machine below. |
cancel_requested | Sticky flag that forces the next valid transition to Cancelled. |
revision | Monotonic integer; increments on every update. Used for optimistic concurrency. |
created_at / updated_at | Timestamps. |
state_json is shallow-merged on updates: a patch { "foo": 1 }
replaces only the foo key, everything else is preserved.
State machine
stateDiagram-v2
[*] --> Created
Created --> Running: start_running
Running --> Waiting: set_waiting(condition)
Waiting --> Running: resume
Running --> Finished: finish
Running --> Failed: fail
Waiting --> Failed: fail
Created --> Cancelled: cancel
Running --> Cancelled: cancel
Waiting --> Cancelled: cancel
Finished --> [*]
Failed --> [*]
Cancelled --> [*]
- Terminal states:
Finished,Failed,Cancelled. No further transitions allowed. - Sticky cancel:
cancel_requested = trueforces the next allowed transition to land onCancelled. The flag survives restart and is idempotent — multiple cancel requests converge on the same outcome.
Persistence
SQLite-backed via sqlx, pool size 5. Default path
./data/taskflow.db, override with TASKFLOW_DB_PATH.
Tables
CREATE TABLE flows (
id TEXT PRIMARY KEY,
controller_id TEXT,
goal TEXT,
owner_session_key TEXT,
requester_origin TEXT,
current_step TEXT,
state_json TEXT,
wait_json TEXT,
status TEXT,
cancel_requested BOOLEAN,
revision INTEGER,
created_at INTEGER,
updated_at INTEGER
);
CREATE TABLE flow_steps (
id TEXT PRIMARY KEY,
flow_id TEXT NOT NULL,
runtime TEXT, -- Managed | Mirrored
child_session_key TEXT,
run_id TEXT,
task TEXT,
status TEXT,
result_json TEXT,
created_at INTEGER,
updated_at INTEGER,
UNIQUE (flow_id, run_id)
);
CREATE TABLE flow_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
flow_id TEXT NOT NULL,
kind TEXT,
payload_json TEXT,
at INTEGER
);
flows.revisiondrives optimistic concurrency (see FlowManager).flow_eventsis append-only — every transition leaves a trail.flow_steps.(flow_id, run_id)UNIQUE catches duplicate observations at the DB layer, not in a race-prone managerial check.
Wait conditions
Persisted in wait_json while status = Waiting.
#![allow(unused)] fn main() { enum WaitCondition { Timer { at: DateTime<Utc> }, // auto-resume at time ExternalEvent { topic: String, correlation_id: String }, // resume when matching event arrives Manual, // resume only via explicit call } }
| Condition | Resumed by |
|---|---|
Timer | WaitEngine::tick() when now >= at |
ExternalEvent | try_resume_external(flow_id, topic, correlation_id, payload) |
Manual | FlowManager::resume(id, patch) — typically via CLI or a deliberate LLM turn |
There is no timeout built into the wait itself — you timeout by
pairing any wait with a Timer fallback (e.g. fan out "wait for
approval OR 24 h elapsed") via orchestration in the flow's step
logic.
Audit trail
Every transition writes a flow_events row with:
kind:created,started,waiting,resumed,finished,failed,cancelled,state_updated,step_observed, ...payload_json: contextual data (wait condition, result, reason, step info)at: timestamp
The audit append happens inside the same SQLite transaction as the state update — you can never see a flow state that doesn't have a matching audit event, even after a crash mid-operation.
Mirrored flows
Beyond Managed flows (owned by FlowManager), you can create Mirrored flows that just observe externally-driven work:
create_mirrored(input)inserts a flow already inRunningstaterecord_step_observation(StepObservation)upserts intoflow_stepsby(flow_id, run_id)— new observations merge with existing rows- Emits
step_observedaudit events
Useful for tracking tasks executed elsewhere — a delegation to another agent, a subprocess spawned out-of-band — while keeping one unified audit surface.
Next
- FlowManager — the mutation API, revision retry, and agent-facing tools
FlowManager, tools, and CLI
FlowManager owns the mutation API for flows. It wraps the
FlowStore with revision-checked atomic updates, the agent-facing
taskflow tool, the WaitEngine, and the agent flow CLI.
Source: crates/taskflow/src/manager.rs,
crates/taskflow/src/engine.rs,
crates/core/src/agent/taskflow_tool.rs.
Responsibilities
flowchart LR
subgraph FM[FlowManager]
CREATE[create_managed<br/>create_mirrored]
RUN[start_running<br/>set_waiting<br/>resume<br/>finish<br/>fail<br/>cancel]
PATCH[update_state<br/>request_cancel]
QUERY[get / list_by_owner / list_by_status / list_steps]
OBS[record_step_observation]
end
FM --> STORE[FlowStore<br/>SQLite]
FM --> ENG[WaitEngine]
TOOL[taskflow tool<br/>agent-facing] --> FM
CLI[agent flow CLI] --> FM
ENG --> STORE
One manager per store — typically one per process. Same database file can be opened by multiple managers safely as long as each goes through the revision protocol.
Optimistic concurrency
Every mutation follows this loop:
flowchart TD
START[mutation requested] --> FETCH[fetch current flow]
FETCH --> APPLY[apply closure:<br/>transition, patch, etc.]
APPLY --> SAVE[store.update_and_append<br/>WHERE id=? AND revision=?]
SAVE --> RES{result}
RES -->|ok| DONE([return updated flow])
RES -->|RevisionMismatch| REFETCH[refetch + retry]
REFETCH --> LIMIT{attempts >= 2?}
LIMIT -->|no| APPLY
LIMIT -->|yes| ERR([surface RevisionMismatch])
revisionis a monotonic integer on every flow- Update runs
UPDATE ... WHERE id=? AND revision=?— only one writer wins per revision - Retry budget is 2 attempts (1 fetch + 1 refetch); persistent conflict bubbles up to the caller
- Update and audit-event append happen inside a single SQLite transaction — crash mid-operation cannot produce a desync between state and audit trail
WaitEngine
Broker-agnostic scheduler. Pull-based tick() advances any flow
whose wait condition has fired.
flowchart LR
TICK[WaitEngine::tick_at] --> SCAN[scan all Waiting flows]
SCAN --> EVAL{evaluate wait}
EVAL -->|Timer expired| RESUME1[resume]
EVAL -->|still future| STAY1[stay waiting]
EVAL -->|ExternalEvent / Manual| STAY2[stay waiting]
EVAL -->|cancel_requested| CAN[transition to Cancelled]
EXT[try_resume_external<br/>topic + correlation_id] --> MATCH{wait condition<br/>matches?}
MATCH -->|yes| RESUME2[resume + merge payload into<br/>state.resume_event]
MATCH -->|no| NOOP[no-op]
tick_at(now)— a single scan. Returns aTickReportwith counters: scanned, resumed, cancelled, still waiting, errors.run(interval, shutdown_token)— long-running loop; drive from heartbeat or a dedicated tokio task.try_resume_external(flow_id, topic, correlation_id, payload)— called by a NATS subscriber or the CLI when an external event arrives; matches against the flow's persistedwait_jsonand resumes if it fits.
Correlation ids are caller-chosen strings. Typical pattern: when a
flow delegates to another agent via agent.route.<target_id>,
include the flow's id or a fresh UUID as the correlation id, and
have the receiver echo it on reply.
Agent-facing tool
Single taskflow tool with dispatch by action:
| Action | Params | Result |
|---|---|---|
start | controller_id, goal, optional current_step (default "init"), optional state | {ok, flow} — auto-transitions Created → Running |
status | flow_id | {ok, flow} or {ok:false, error:"not_found"} |
advance | flow_id, optional patch, optional current_step | {ok, flow} with merged state |
cancel | flow_id | {ok, flow} |
list_mine | — | {ok, count, flows: [...]} |
Session tenancy
Every call derives owner_session_key = "agent:<id>:session:<session_id>".
The manager rejects any mutation whose owner does not match the
flow's — "belongs to a different session" error. Cross-session
access from the LLM is not possible.
Revision hidden from the LLM
The tool fetches the flow before every mutation and uses the live revision internally. The LLM never sees or reasons about revision numbers — fewer tokens, fewer mistakes.
CLI
agent flow list [--json]
agent flow show <id> [--json]
agent flow cancel <id>
agent flow resume <id>
listprints a table sorted byupdated_at DESCshowprints the flow plus every recorded stepcancelcallsmanager.cancel(id)resumeis a manual unblock forManualorExternalEventwaits — useful in ops / testing when an expected event never arrived
All commands honor TASKFLOW_DB_PATH (default ./data/taskflow.db).
End-to-end example
From crates/taskflow/tests/e2e_test.rs:
#![allow(unused)] fn main() { // 1. Create + run + park. let f = manager.create_managed(input).await?; let f = manager.start_running(f.id).await?; let f = manager.set_waiting(f.id, json!({"kind": "manual"})).await?; // 2. Process exits. Reopen the SAME database file from a fresh manager. let reloaded = manager.get(f.id).await?.unwrap(); assert_eq!(reloaded.status, FlowStatus::Waiting); assert_eq!(reloaded.state_json["verses_done"], 10); // partial work survived // 3. Resume picks up where we left off. let resumed = manager.resume(reloaded.id, None).await?; assert_eq!(resumed.status, FlowStatus::Running); }
Shipped shape of CreateManagedInput:
{
"controller_id": "kate/inbox-triage",
"goal": "triage inbox",
"owner_session_key": "agent:kate:session:abc",
"requester_origin": "user-1",
"current_step": "classify",
"state_json": { "messages": 10, "processed": 0 }
}
There is no YAML flow-definition format — flows are built in code
(or driven by the taskflow tool's start action).
Garbage collection
store.prune_terminal_flows(retain_days) deletes flows whose
terminal state is older than the retention window. Wire this into a
scheduled job when your flows pile up — audit trails accumulate
forever otherwise.
Gotchas
state_jsonis shallow-merged. Nested updates require the caller to build the full replacement object for the key being changed.revisionconflicts retry only twice. If two callers are fighting over a flow continuously, the second persistently surfacesRevisionMismatch— treat that as a signal that you should either serialize at a higher level, or have the loser retry at the app layer.- No flow-level mutex. The DB-level
UNIQUE (flow_id, run_id)on steps keeps step-observation races safe; revision checks keep mutation races safe. But two observers can read a flow simultaneously — don't rely on read-time consistency for decisions. wait_jsonis cleared on resume. If you need to remember the wait condition for audit purposes, theflow_eventstable has it.
Wait / resume
Durable flows can park themselves between steps. The runtime drives
parked flows back to Running either on a wall-clock deadline (timer),
when an external signal arrives (NATS), or when an operator resumes
them by hand (manual).
Two pieces wire this together:
WaitEngine— single global tokio task. Everytick_intervalit scansWaitingflows and resumes any whose timer has fired or whose cancel intent has been set.taskflow.resumebridge — single broker subscriber that translates incoming events intoWaitEngine::try_resume_externalcalls.
Source: crates/taskflow/src/engine.rs, src/main.rs::spawn_taskflow_resume_bridge.
Wait conditions
The wait_json column on a flow stores one of:
| Kind | Shape | Resumed by |
|---|---|---|
timer | {kind:"timer", at:"<RFC3339>"} | WaitEngine.tick() once now >= at |
external_event | {kind:"external_event", topic:"…", correlation_id:"…"} | taskflow.resume bridge with matching (topic, correlation_id) |
manual | {kind:"manual"} | Explicit manager.resume(...) (CLI / ops) |
Timer.at is validated by the tool against taskflow.timer_max_horizon
(default 30 days). Past deadlines and topics/correlation_ids that are
empty are rejected before the flow ever enters Waiting.
Tool actions
The taskflow tool exposes the LLM-facing surface. Beyond the existing
start | status | advance | cancel | list_mine, three actions drive
the wait/resume lifecycle:
wait
{
"action": "wait",
"flow_id": "…uuid…",
"wait_condition": {"kind": "timer", "at": "2026-04-26T09:00:00Z"}
}
Move flow Running → Waiting. Validates wait_condition shape and
guardrails before persisting.
finish
{
"action": "finish",
"flow_id": "…uuid…",
"final_state": {"result": "ok"}
}
Move flow → Finished. final_state (optional) is shallow-merged
into state_json before transition.
fail
{
"action": "fail",
"flow_id": "…uuid…",
"reason": "downstream-error"
}
Move flow → Failed. reason is required. The reason is stamped
under state_json.failure.reason and recorded in the audit event.
NATS resume bridge
A single subscriber lives at taskflow.resume. Anything that wants to
wake a parked flow publishes a JSON message there:
{
"flow_id": "f5e0…",
"topic": "agent.delegate.reply",
"correlation_id": "corr-42",
"payload": {"answer": 42}
}
The bridge calls WaitEngine::try_resume_external(flow_id, topic, correlation_id, payload). If the flow is Waiting with a matching
external_event condition, it resumes; the payload (if any) is
merged into state_json.resume_event. Mismatches and unknown flow
ids are silent debug logs.
Example with the nats CLI:
nats pub taskflow.resume '{
"flow_id": "f5e0…",
"topic": "agent.delegate.reply",
"correlation_id": "corr-42",
"payload": {"answer": 42}
}'
Single subject (no flow_id in suffix) is intentional — it keeps the subject namespace flat and avoids per-flow subscription churn. Volume is expected to be low (<10/s); if that ever changes, the bridge can shard internally without protocol changes.
Configuration
config/taskflow.yaml (optional; absent → defaults):
tick_interval: 5s # WaitEngine cadence
timer_max_horizon: 30d # max future Timer.at allowed by tool
db_path: ./data/taskflow.db # also honored via TASKFLOW_DB_PATH
agents.yaml enables the tool per agent:
agents:
- id: kate
plugins: [taskflow, memory]
Without taskflow in plugins, the agent does not see the tool —
the engine and bridge still run process-wide.
Tick interval guidance
5s(default) is plenty for human-scale timers.- Bring it down to
1sonly if you have sub-minute timers and care about the worst-case lag. - The tick is idempotent and pull-based; missing a tick is harmless.
Telemetry
Each tick logs at debug level when scanned > 0:
DEBUG wait engine tick scanned=3 resumed=1 cancelled=0 still_waiting=2 errors=0
The bridge logs at info on each successful resume:
INFO taskflow resumed via NATS flow_id=… topic=…
Identity & workspace
Every agent has a workspace directory — a small set of markdown files that describe who it is, what it knows, and how it's meant to behave. The runtime loads those files at session start and injects them into the system prompt. The agent reads them; some of them, the agent also writes back to.
Source: crates/core/src/agent/workspace.rs,
crates/core/src/agent/self_report.rs.
Workspace files
<workspace>/
├── IDENTITY.md # 10.1 — persona facts (name, vibe, emoji)
├── SOUL.md # 10.2 — prompt-like character document
├── USER.md # who the human is (if single-user)
├── AGENTS.md # peers this agent knows about
├── MEMORY.md # 10.3 — self-curated facts index
├── DREAMS.md # dreaming diary (10.6)
├── notes/ # per-day notes
└── .git/ # 10.9 — per-agent repo for forensics
Configured per agent:
agents:
- id: kate
workspace: ./data/workspace/kate
workspace_git:
enabled: true
IDENTITY.md (phase 10.1)
Short, structured. Five optional fields parsed from a markdown bullet list:
- **Name:** Kate
- **Creature:** octopus
- **Vibe:** warm but sharp
- **Emoji:** 🐙
- **Avatar:** https://.../kate.png
The parser:
- Silently skips template placeholders in parens (e.g.
_(pick something)_) so the bootstrap template never leaks into the persona - Produces an
AgentIdentity { name, creature, vibe, emoji, avatar }struct, all fieldsOption<String>
Rendered into the system prompt as a single line:
# IDENTITY
name=Kate, emoji=🐙, vibe=warm but sharp
SOUL.md (phase 10.2)
Free-form markdown. No parsing. Injected verbatim after the IDENTITY block. This is where long-form character, operating principles, tone, and hard rules live.
Loaded on every session start. Main and shared sessions both see SOUL.md — the privacy boundary is MEMORY.md, not SOUL.md (shared groups should never leak private memories, but the persona is fine to surface).
MEMORY.md (phase 10.3)
The agent's self-curated index of things it remembers. Markdown sections with bullet lists — no special schema:
## People
- Luis prefers Spanish but is fine switching to English.
- Ana uses a Samsung, not an iPhone.
## Dreamed 2026-04-23 03:00 UTC
- User's timezone is America/Bogota _(score=0.42, hits=5, days=3)_
- Prefers short replies on WhatsApp _(score=0.38, hits=4, days=2)_
## Open questions
- What phone carrier does Luis use?
Scope rules:
- Loaded only in main (DM-style) sessions. Group and broadcast sessions never see MEMORY.md — per-user facts must not leak into multi-user chats.
- Appended automatically by dreaming sweeps (Phase 10.6)
- Truncation: 12 000 chars per file cap (whole workspace total
budget: 60 000 chars). Exceeding files get a
[truncated]marker.
USER.md and AGENTS.md
- USER.md — who this agent is talking to. Loaded in main sessions only.
- AGENTS.md — which peers this agent can delegate to. Pairs
with
allowed_delegatesin agents.yaml.
Both are free-form markdown read into the prompt.
Transcripts (phase 10.4)
Per-session, append-only JSONL files in transcripts_dir:
{"type":"session","version":1,"id":"<uuid>","timestamp":"2026-04-24T...","agent_id":"kate","source_plugin":"telegram"}
{"type":"entry","timestamp":"...","role":"user","content":"hello","message_id":"...","source_plugin":"telegram","sender_id":"user123"}
{"type":"entry","timestamp":"...","role":"assistant","content":"hello Luis","source_plugin":""}
- One file per session at
<transcripts_dir>/<session_id>.jsonl - No time-based rotation (session close = file close)
- First line is a session header with metadata, every subsequent line is a turn
Transcripts are write-only from the runtime's point of view — they're for replay, audit, and human review, not read-back into the prompt.
Self-report tools (phase 10.8)
Four tools let the agent inspect its own state:
| Tool | Returns | Use |
|---|---|---|
who_am_i | {agent_id, model, workspace_dir, identity{…}, soul_excerpt} | When asked "who are you?" |
what_do_i_know | {sections: [{heading, bullets}], truncated} with optional filter | Search MEMORY.md by section name |
my_stats | {sessions_total, memories_stored, memories_promoted, last_dream_ts, recall_events_7d, top_concept_tags_7d, workspace_files_present} | Meta-awareness |
session_logs | {ok, sessions/entries/hits, …} — actions: list_sessions, read_session, search, recent | Inspect own JSONL transcripts for self-reflection, debugging, cross-session search |
The first three return concise JSON designed for the LLM to consume in
one turn. Soul excerpt in who_am_i is truncated to 2 048 chars;
what_do_i_know caps at 6 144 bytes serialized with at most 10
bullets per section.
session_logs is registered automatically when the agent has a non-empty
transcripts_dir. It is scoped to that directory — agents cannot read
each other's transcripts. Default limits: 50 entries per call (max 500),
200 chars per content preview (max 4 000). When recent is invoked
without session_id, it defaults to the current session. If the agent's
allowed_tools patterns exclude session_logs, it is filtered after
registration like every other tool.
Load flow
flowchart TD
SESSION[new session] --> LOADER[WorkspaceLoader.load scope]
LOADER --> SCOPE{scope}
SCOPE -->|Main| FULL[load IDENTITY + SOUL + USER +<br/>AGENTS + daily notes + MEMORY]
SCOPE -->|Shared| SHARED[load IDENTITY + SOUL +<br/>AGENTS only]
FULL --> TRUNC[enforce 12k/file, 60k total]
SHARED --> TRUNC
TRUNC --> RENDER[render_system_blocks<br/>into prompt]
RENDER --> PROMPT[# IDENTITY<br/># SOUL<br/># USER<br/># AGENTS<br/># MEMORY]
Next
- MEMORY.md — write cadence and promotion rules
- Dreaming — how sleeps turn recall signals into MEMORY.md entries
MEMORY.md + recall signals + workspace-git
This page covers everything about how what the agent knows evolves over time: the MEMORY.md index, the recall signals that drive dreaming, how concept tags are derived, and how the workspace-git repo captures a full audit history.
For the underlying storage mechanics (tables, queries, vector index), see Memory — long-term.
What goes where
flowchart LR
subgraph DB[SQLite data/memory.db]
MEM[memories]
FTS[memories_fts]
REC[recall_events]
PROM[memory_promotions]
end
subgraph WS[workspace dir]
MD[MEMORY.md]
DRM[DREAMS.md]
GIT[.git]
end
TOOL[memory.remember] --> MEM
TOOL --> FTS
MEM -. recall hits .-> REC
REC --> DRM2[dream sweep]
DRM2 --> PROM
DRM2 --> MD
DRM2 --> DRM
CHK[forge_memory_checkpoint] --> GIT
DRM2 --> GIT
Three layers, each with a different update cadence:
| Layer | Write trigger | Consumer |
|---|---|---|
memories table | Agent calls memory.remember | Next turn's memory.recall |
recall_events table | Every memory.recall hit | Dream sweep (10.6) |
memory_promotions table | Promotion during dream | Prevents double-promote across sweeps |
MEMORY.md | Dream sweep (10.6) | Next session's system prompt (main scope only) |
DREAMS.md | Dream sweep (10.6) | Historical diary for humans + my_stats |
.git | Dream finish, session close, forge_memory_checkpoint | memory_history tool, post-mortem via git log |
Recall signals (phase 10.5)
The recall_events table captures every hit of memory.recall:
CREATE TABLE recall_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
agent_id TEXT,
memory_id TEXT,
query TEXT, -- the search string that surfaced this memory
score REAL, -- relevance score from the recall call
ts_ms INTEGER
);
Aggregation over a per-memory window produces the signals struct consumed by dreaming:
| Signal | Meaning |
|---|---|
frequency | Log-normalized count of hits |
relevance | Mean score across hits |
recency | Exponential decay from last-hit timestamp |
diversity | Distinct query strings, normalized (saturates at 5+) |
recall_count | Raw hit count — used by gates |
unique_days | Distinct UTC days the memory was surfaced |
Each weighted and summed into the score that drives promotion (see Dreaming).
Concept tags (phase 10.7)
Every memory row has a concept_tags JSON column populated at insert
time — not via TF-IDF but via a deterministic pipeline:
- Glossary match. Hard-coded list of protected tech terms
(multilingual) —
backup,openai,migration, etc. - Compound tokens. Regex preserves file paths and identifiers
(
src/main.rs,camelCaseNames). - Unicode word segmentation.
UAX #29word boundaries split the rest. - Per-token rules:
- NFKC normalization + lowercase
- 32-char max; 3-char min for Latin, 2-char min for CJK
- Reject pure digits, ISO dates, and 100+ shared stop-words across English, Spanish, and path noise
- Underscores converted to dashes
Output capped at 8 tags per memory. Stored as JSON array on the
memories row; expanded into keyword recall searches as part of the
FTS5 MATCH query.
Dream sweeps backfill tags for older memories that were created before the tagging pipeline existed.
MEMORY.md write cadence
Dreaming sweeps append blocks:
## Dreamed 2026-04-24 03:00 UTC
- Luis lives in Bogota and prefers Spanish _(score=0.42, hits=5, days=3)_
- Kate should default to short WhatsApp replies _(score=0.38, hits=4, days=2)_
- One block per sweep
- Promoted memories shown as bullets with score, hit count, unique days
- Existing sections preserved; the file is only ever appended to (manual editing by humans is fine — the dream sweep appends a new block rather than rewriting anything)
Privacy rules:
- MEMORY.md is injected into main-scope sessions only. Groups / broadcasts never see it.
transcripts_diris separate from workspace and is not committed to workspace-git by default.
Workspace-git (phase 10.9)
When workspace_git.enabled: true, the agent's workspace
directory is a git repo. Commits happen automatically at three
moments:
flowchart LR
T1[dream sweep finishes] --> C[commit_all promote]
T2[session close<br/>on_expire callback] --> C2[commit_all session-close]
T3[forge_memory_checkpoint<br/>tool call] --> C3[commit_all checkpoint:note]
C --> LOG[.git history]
C2 --> LOG
C3 --> LOG
Mechanics (crates/core/src/agent/workspace_git.rs):
- Staged: every non-ignored file (respects auto-generated
.gitignore) - Skipped: files larger than 1 MiB (
MAX_COMMIT_FILE_BYTES) - Idempotent: no-op commit when the tree is clean
- Author:
{agent_id} <agent@localhost>(configurable viaworkspace_git.author_name/author_email) - Auto
.gitignoreexcludestranscripts/,media/,*.tmp,*.swp,.DS_Store - No remote configured by default; operators add one if forensic archival matters
Tools that touch git
| Tool | Purpose | Returns |
|---|---|---|
forge_memory_checkpoint(note) | Commit right now with checkpoint: <note> subject | {ok, oid(short), subject, skipped} |
memory_history(limit?, include_diff?) | git log of the last limit commits (max 100); optional unified diff oldest→HEAD | {commits: [...], diff?} |
Good uses of explicit checkpoints:
- Before a risky update sequence the agent is about to perform
- After receiving a non-obvious instruction from the user
- As bookends around a
taskflowstep boundary
Gotchas
- MEMORY.md can grow unbounded over years. Workspace-git keeps
the history; but the in-prompt view is truncated at 12 KB. Keep an
eye on size, prune old
## Dreamedblocks if they stop being useful. - Concept-tag derivation is deterministic per content. Editing a memory's content in-place does not re-derive tags — the tags that were computed at insert stick. Re-insert to refresh.
git logreplays tell the truth. If you're debugging a surprising agent behavior,memory_history --include-diffis the fastest way to see what the agent wrote to itself and when.
Dreaming
"Dreaming" is a scheduled offline sweep that consolidates an agent's memory. It reads recall signals, scores each memory that was recently surfaced, promotes the strongest ones into MEMORY.md, and commits the workspace-git repo.
Source: crates/core/src/agent/dreaming.rs.
When it runs
# agents.yaml
agents:
- id: kate
heartbeat:
enabled: true
interval: 30s
dreaming:
enabled: false
interval_secs: 86400 # 24 h
min_score: 0.35
min_recall_count: 3
min_unique_queries: 2
max_promotions_per_sweep: 20
weights:
frequency: 0.24
relevance: 0.30
recency: 0.15
diversity: 0.15
consolidation: 0.10
Dreaming is heartbeat-driven: it ticks inside the heartbeat loop
and actually sweeps when interval_secs has elapsed since the last
sweep. Disable the heartbeat and dreaming stops firing.
Default interval_secs: 86400 (24 hours). Run nightly or tune down
for high-throughput agents.
Three phases (Light / REM / Deep)
Conceptually borrowed from the OpenClaw design, nexo-rs ships Light → Deep:
flowchart LR
START[sweep tick] --> LIGHT[Light:<br/>gather memories with<br/>>=1 recall event]
LIGHT --> DEEP[Deep:<br/>score + gate + promote]
DEEP --> WRITE[append MEMORY.md block]
WRITE --> DIARY[append DREAMS.md entry]
DIARY --> GIT[commit workspace]
(REM — thematic summarization with an LLM — is intentionally deferred.)
Scoring
For each candidate memory:
score = w.frequency × frequency
+ w.relevance × relevance
+ w.recency × recency
+ w.diversity × diversity
+ w.consolidation × consolidation
Where the signals come from recall_events.
Consolidation is a modest bias toward memories that recurred in diverse queries over multiple days — taking the memory from "hit once" to "actually load-bearing."
Gates
A candidate is promoted only if all of these hold:
| Gate | Default | Meaning |
|---|---|---|
recall_count >= min_recall_count | 3 | Surfaced at least 3 times |
unique_days >= 1 | 1 | Not all hits on the same day |
distinct_queries >= min_unique_queries | 2 | More than one query style hit it |
score >= min_score | 0.35 | Weighted composite over the threshold |
!is_promoted(memory_id) | — | Not already promoted in a prior sweep |
Up to max_promotions_per_sweep (default 20) promoted per run;
ordered by descending score.
Outputs
MEMORY.md append
## Dreamed 2026-04-24 03:00 UTC
- Luis lives in Bogota and prefers Spanish _(score=0.42, hits=5, days=3)_
- Kate should default to short WhatsApp replies _(score=0.38, hits=4, days=2)_
Only memories promoted this sweep appear in the block.
DREAMS.md diary
A longer-form diary entry the agent can read back in
my_stats().last_dream_ts context. One per sweep.
Side effects
memory_promotionsrow per promoted memory (prevents double-promote across sweeps)concept_tagsbackfilled on older memories that were created before the tagging pipeline landedworkspace_git.commit_all("promote", <body with delta>)captures the full change
Idempotency
Re-running a sweep during the same interval is a no-op:
- Promotions consult
memory_promotionsbefore writing - MEMORY.md is appended to, not rewritten
- Git commit returns cleanly with
skipped: truewhen the tree is unchanged
You can safely call a manual "dream now" during a stuck session
(currently via restart with a lowered interval_secs) without
corrupting state.
Safety rails
- Shutdown cancellation. Dream sweeps run under a cancellation
token tied to the shutdown sequence. Partial sweeps don't leave
inconsistent state — the atomic trio (DB row + MEMORY.md append
- git commit) runs after all candidates are scored and gated.
- Heartbeat-only. Dreaming never fires from a user message turn, so a long sweep cannot block a user response.
- Read-mostly. Sweep reads from
recall_events; the only writes arememory_promotions, MEMORY.md append, DREAMS.md append, and git commit. Existing memory rows are untouched except for tag backfill.
What dreaming is not
- Not a summarizer. It does not rewrite content.
- Not a deduplicator. Two similar memories remain two memories; the recall layer will simply surface both and let the LLM pick.
- Not an LLM call. The whole sweep is deterministic — no model inference, no per-sweep cost.
Tuning
| Situation | Change |
|---|---|
| Memories stay too cold to promote | Lower min_score (e.g. 0.25) |
| Too many noise promotions | Raise min_recall_count to 5 |
| MEMORY.md grows too fast | Lower max_promotions_per_sweep |
| Very chatty agent | Increase interval_secs — 24 h is already safe |
Observability
Every sweep emits a summary log line with:
- candidates scanned
- candidates promoted
- skipped (already promoted)
- score range of the promoted set
- workspace-git commit OID (or "clean tree")
Wire it into Prometheus via log scraping if you want time-series counters — no dedicated metric is exposed yet.
Gotchas
- Turning dreaming on with
min_scoredefault produces a long first sweep. If the agent has been running for weeks without dreaming, there are a lot of candidates. Expect the first sweep to promote near the cap and subsequent sweeps to tail off. - Concept-tag backfill is O(candidates). Large backlogs will show first-sweep latency proportional to the candidate count. Not a bug — run the first sweep in a maintenance window if the backlog is large.
interval_secsis measured from last completed sweep. A failed sweep does not reset the clock — a retry will fire on the next heartbeat tick regardless.
CLI reference
Single source of truth for every agent subcommand, flag, exit code,
and env var. agent is the one binary you'll ever run in production
— this is everything it can do.
Source: src/main.rs (Mode enum + parse_args),
crates/extensions/src/cli/, crates/setup/src/.
Invocation
agent [--config <dir>] [<subcommand> ...]
- Arg parser: hand-rolled, not
clap.--help/-hwork;-cis not an alias for--config(case-sensitive exact match). - No subcommand → run the daemon (default).
- Global flag:
--config <dir>(default./config).
Global environment variables
| Variable | Values | Purpose |
|---|---|---|
RUST_LOG | tracing-subscriber filter | Log level (e.g. info,agent=debug). Default info. |
AGENT_LOG_FORMAT | pretty | compact | json | Log format. Default pretty. |
AGENT_ENV | production (or prod) | Triggers JSON logs unless AGENT_LOG_FORMAT overrides. |
TASKFLOW_DB_PATH | file path | Flow CLI DB (default ./data/taskflow.db). |
CONFIG_SECRETS_DIR | dir path | Whitelists an extra root for ${file:...} YAML refs. |
Exit codes (generic)
| Code | Meaning |
|---|---|
0 | Success |
1 | General failure (not found, config invalid, connection refused) |
2 | Warnings-only outcome (currently only --check-config non-strict) |
Ext subcommand has its own richer code table — see below.
Subcommand index
| Subcommand | Purpose |
|---|---|
| (default) | Run the agent daemon |
setup | Interactive credential wizard |
status | Query running agent instances |
dlq | Dead-letter queue inspection |
ext | Extension management |
flow | TaskFlow operations |
mcp-server | Run as MCP stdio server |
admin | Run the web admin UI behind a Cloudflare quick tunnel |
reload | Trigger config hot-reload on a running daemon |
--check-config | Pre-flight config validation |
--dry-run | Load config and print the plan |
Daemon (default)
agent [--config ./config]
Boots every configured agent runtime, connects to the broker (NATS or
local fallback), starts metrics (:9090), health (:8080), and admin
(:9091 loopback) servers.
Exit codes:
0— clean shutdown via SIGTERM / Ctrl+C1— config load failed, broker unreachable at startup, plugin failed to initialize
Logs to: stderr. See Logging.
setup
Interactive credential wizard. Launches a prompt-driven flow for every service you want to enable — LLM keys, WhatsApp QR, Telegram bot token, Google OAuth, etc.
agent setup # full interactive wizard
agent setup list # list installable service ids
agent setup <service> # configure one service (e.g. minimax, whatsapp)
agent setup doctor # validate every credential / token (also runs the Phase 70.6 pairing-store audit)
agent setup telegram-link # print Telegram bot link-to-chat URL
Exit codes: 0 on completion; 1 on error.
See Setup wizard for the step-by-step.
status
Query the running daemon via the loopback admin console.
agent status # every agent, table
agent status ana # one agent, table
agent status --json # raw JSON
agent status --endpoint http://remote:9091 # override endpoint
Table output columns: ID | MODEL | BINDINGS | DELEGATES | DESCRIPTION
Exit codes:
0— query succeeded1— endpoint unreachable or agent id not found
dlq
Dead-letter queue inspection. See DLQ operations for the full picture.
agent dlq list # plain-text table, up to 1000 entries
agent dlq replay <id> # move back to pending_events for retry
agent dlq purge # drop every entry (destructive)
Exit codes: 0 success; 1 failure (entry not found, DB error).
list columns: id | topic | failed_at | reason.
ext
Extension management. See Extensions — CLI for details and workflows.
agent ext list [--json]
agent ext info <id> [--json]
agent ext enable <id>
agent ext disable <id>
agent ext validate <path>
agent ext doctor [--runtime] [--json]
agent ext install <path> [--update] [--enable] [--dry-run] [--link] [--json]
agent ext uninstall <id> --yes [--json]
Flags:
| Flag | Where | Purpose |
|---|---|---|
--json | list / info / doctor / install / uninstall | Machine-readable output |
--runtime | doctor | Also spawn stdio extensions to verify handshake |
--update | install | Overwrite if already installed |
--enable | install | Flip to enabled: true in extensions.yaml |
--link | install | Symlink source (absolute path required) instead of copy |
--dry-run | install | Validate without writing |
--yes | uninstall | Required confirmation |
Exit codes (extension-specific):
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Extension not found / --update target missing |
| 2 | Invalid manifest / invalid source / --link needs absolute path |
| 3 | Config write failed |
| 4 | Invalid id (reserved or empty) |
| 5 | Target exists (use --update) |
| 6 | Id collision across roots |
| 7 | uninstall missing --yes confirmation |
| 8 | Copy / atomic swap failed |
| 9 | Runtime check(s) failed (doctor --runtime) |
flow
TaskFlow operations. See TaskFlow — FlowManager.
agent flow list [--json]
agent flow show <id> [--json]
agent flow cancel <id>
agent flow resume <id>
Env var: TASKFLOW_DB_PATH (default ./data/taskflow.db).
Exit codes: 0 success; 1 on error (flow not found, wrong
state, DB inaccessible).
list sorts by updated_at DESC; show includes every recorded
step; resume only works on Manual or ExternalEvent waits.
mcp-server
Run the agent as an MCP stdio server so MCP clients (Claude Desktop, Cursor, Zed) can consume its tools.
agent mcp-server
- Reads JSON-RPC from stdin, writes responses to stdout
- Does not boot a daemon or broker
- Requires
config/mcp_server.yamlwithenabled: true
Exit codes: 0 on clean exit; 1 if mcp_server.yaml disabled.
See MCP — Agent as MCP server for deployment recipes (Claude Desktop config, allowlist, auth token).
admin
Run the web admin UI behind a fresh Cloudflare quick tunnel. A new ephemeral trycloudflare.com URL is minted on every launch — no account, no DNS, no TLS setup.
agent admin # listen on 127.0.0.1:9099 (default)
agent admin --port 9199 # pick a different loopback port
agent admin --port=9199 # same thing, equals form
What happens on launch:
- Install cloudflared if missing. The tunnel crate detects the host OS/arch and downloads the matching cloudflared binary into the platform data dir. Subsequent launches reuse the cached copy.
- Mint a fresh random password. 24 URL-safe characters from the
OS RNG. Printed once to stdout — copy it now; there is no
recovery short of relaunching
agent admin. - Start a loopback HTTP server. Listens on
127.0.0.1:<port>and serves the React bundle embedded at Rust compile time (seeadmin-ui/) behind HTTP Basic Auth. A bundle-missing fallback page is served ifadmin-ui/dist/was empty whencargo buildran. - Open a quick tunnel.
cloudflared tunnel --url http://127.0.0.1:<port>returns an ephemeralhttps://…trycloudflare.comURL, which the command prints to stdout alongside the username (admin) and the freshly-minted password. - Wait for Ctrl+C / SIGTERM. Graceful shutdown kills the cloudflared child and stops the HTTP listener.
Exit codes:
0— clean shutdown1— cloudflared install failed, port already bound, or tunnel negotiation failed
Notes:
- URL is re-generated every launch. If you need a stable URL, switch to a named Cloudflare tunnel (requires an account and wrangler config — out of scope for this command).
- Auth is HTTP Basic for now; the browser prompts for
admin/<password>on first load. Username is fixed; password is fresh every launch. Keep the shell scrollback if you need to re-paste it. - The password is never persisted — losing it means stopping
agent adminand starting again (which also rotates the tunnel URL).
reload
Triggers a config hot-reload on a running daemon. Publishes
control.reload on the broker the daemon is listening to (resolved
from broker.yaml), subscribes-before-publish to
control.reload.ack, waits up to 5 s, and prints the outcome.
agent reload # human-readable summary
agent reload --json # serialized ReloadOutcome
Example output:
$ agent reload
reload v7: applied=2 rejected=0 elapsed=18ms
✓ ana
✓ bob
Exit codes:
0— at least one agent reloaded1— no ack within 5 s (daemon not running)2— every agent rejected
Full semantics — what's reloaded, apply-on-next-message, failure modes — in Config hot-reload.
--check-config
Pre-flight validation. Loads every YAML file, resolves env vars, checks schema, validates credentials. No broker, no daemon. Meant for CI.
agent --check-config # warnings-only mode
agent --check-config --strict # warnings become errors
Exit codes:
0— all clear1— hard errors (missing required creds, invalid schema)2— warnings only (non-strict mode)
--dry-run
Load the config and print a plan. Doesn't connect to the broker or start any runtime task.
agent --dry-run
agent --dry-run --json
Output (plain text):
- Config directory
- Broker kind (nats | local)
- Plugin list
- Agent directory table (id, model, bindings, delegates, description)
Exit codes: 0 valid; 1 on error.
Daemon admin endpoints
Reference for status --endpoint and anyone wiring a custom
dashboard:
| Endpoint | Method | Bind | Purpose |
|---|---|---|---|
/admin/agents | GET | 127.0.0.1:9091 | List every agent (JSON) |
/admin/agents/<id> | GET | 127.0.0.1:9091 | Single agent (JSON) |
/admin/tool-policy | GET | 127.0.0.1:9091 | Tool policy queries |
/admin/credentials/reload | POST | 127.0.0.1:9091 | Phase 17 — re-read agents/plugins YAML and atomically swap the credential resolver. Returns ReloadOutcome JSON. See config/credentials.md. |
/health | GET | 0.0.0.0:8080 | Liveness probe |
/ready | GET | 0.0.0.0:8080 | Readiness probe |
/metrics | GET | 0.0.0.0:9090 | Prometheus |
/whatsapp/pair* | GET | 0.0.0.0:8080 | WhatsApp pairing QR (first instance) |
/whatsapp/<instance>/pair* | GET | 0.0.0.0:8080 | Multi-instance WhatsApp pairing |
Cross-links
Gotchas
- Hand-rolled parser. Unexpected flag ordering can produce "unknown argument" errors that are less forgiving than clap-based CLIs. Stick to the form shown in each subcommand.
- Global
--configmust come before the subcommand.agent --config ./x ext listworks;agent ext list --config ./xdoes not. - Admin console is loopback-only.
status --endpointagainst a remote host requires a tunnel; it won't listen publicly.
Docker
Production deployment as a compose stack: nats broker + nexo
runtime, Docker secrets for credentials, persistent volumes for SQLite
data and the disk queue.
Source: docker-compose.yml, Dockerfile, config/docker/.
Pre-built image at GHCR
Every push to main and every v* tag publishes a multi-arch image
(linux/amd64 + linux/arm64) at:
ghcr.io/lordmacu/nexo-rs:latest # latest tagged release
ghcr.io/lordmacu/nexo-rs:v0.1.1 # exact version
ghcr.io/lordmacu/nexo-rs:edge # latest main commit
ghcr.io/lordmacu/nexo-rs:main-<sha> # pinned to a specific commit
Pull and run:
docker pull ghcr.io/lordmacu/nexo-rs:latest
docker run --rm \
-v $(pwd)/config:/app/config:ro \
-v $(pwd)/data:/app/data \
-p 8080:8080 -p 9090:9090 \
ghcr.io/lordmacu/nexo-rs:latest
Build pipeline: .github/workflows/docker.yml. Tags + labels follow
OCI image spec and are
generated by docker/metadata-action. Image carries SBOM and SLSA
provenance attestations (verify with docker buildx imagetools inspect).
Compose layout
flowchart LR
subgraph STACK[docker-compose]
NATS[nats:2.10<br/>:4222 client<br/>:8222 monitoring]
AG[nexo<br/>:8080 health<br/>:9090 metrics]
end
AG --> NATS
VOL1[(./config RO)] --> AG
VOL2[(./data RW)] --> AG
VOL3[(./extensions RO)] --> AG
SEC[/run/secrets/...] --> AG
IDE[MCP clients] -.->|port 8080| AG
PROM[Prometheus] -.->|port 9090| AG
docker-compose.yml
Two services, healthchecks on both, shared volumes:
nats—nats:2.10-alpine, exposes:4222for agent clients and:8222for monitoring (healthcheck hits:8222/healthz)nexo— the main runtime- Ports:
:8080(health),:9090(metrics) - Environment:
RUST_LOG=info,AGENT_ENV=production shm_size: 1gb— required for Chrome processes (browser plugin)- Bind mounts:
./config:/app/config:ro,./data:/app/data:rw,./extensions:/app/extensions:ro depends_on: { nats: { condition: service_healthy } }
- Ports:
Dockerfile
Multi-stage:
- Builder — Rust
cargo build --release --locked - Runtime —
debian:bookworm-slimwith operational tools baked in:ca-certificates,libsqlite3-0- Python + ffmpeg + tmux + yt-dlp + tesseract (for skills that need them)
- Google Chrome on amd64 (OAuth + Widevine work); falls back to Chromium on arm64
cloudflared(downloaded perTARGETARCHat build time)dumb-initas PID 1
Entry point: /usr/local/bin/nexo --config /app/config.
Exposed ports: 8080, 9090.
Config overrides — config/docker/
Mirrors the main config layout. The compose service mounts the production overrides path:
command: ["nexo", "--config", "/app/config/docker"]
Key differences in the docker overrides:
broker.yaml— NATS URL points at the Docker service name (nats://nats:4222); persistence at/app/data/queue/broker.dbllm.yaml— reads API keys from/run/secrets/<name>- Other files (
agents.yaml,memory.yaml,extensions.yaml) override defaults for container paths
Secrets
The compose file declares Docker secrets and the config overrides reference them:
services:
nexo:
secrets:
- minimax_api_key
- minimax_group_id
- google_client_id
- google_client_secret
secrets:
minimax_api_key:
file: ./secrets/minimax_api_key.txt
minimax_group_id:
file: ./secrets/minimax_group_id.txt
...
Config reads them via the ${file:/run/secrets/...} syntax. Secrets
appear as mode-0400 files inside the container — nothing ever touches
env vars.
Operating the stack
docker compose up -d # start
docker compose logs -f nexo # follow logs
docker compose exec nexo nexo ext list
docker compose exec nexo nexo dlq list
docker compose restart nexo # rolling reload (SIGTERM → 5 s grace)
docker compose down # stop (preserves volumes)
Scaling
- Horizontal scaling needs an external NATS cluster. Running the
compose with two
agentreplicas pointed at a single NATS server works for isolated workloads but duplicate-delivery across agents on the same topic is not avoided by the compose itself — the single-instance lockfile (see Fault tolerance) assumes one agent process per data directory. - For real scale: one NATS cluster + N agent processes, each with its
own
./data/volume.
Health checks for orchestration
services:
nexo:
healthcheck:
test: ["CMD", "curl", "-f", "http://127.0.0.1:8080/ready"]
interval: 10s
timeout: 3s
retries: 3
start_period: 30s
Readiness gate is /ready (covered in metrics + health).
start_period needs to cover first-boot extension discovery + all
agent runtimes attaching to their topics.
Gotchas
- Volume ownership. Don't mount
./dataas root-owned if your container runs as non-root. The runtime will fail to write the SQLite files and you'll only see crypticreadonly databaseerrors. - Chrome needs
/dev/shmspace. Theshm_size: 1gbis not optional when the browser plugin is active — Chrome processes silently corrupt their state if starved. config/docker/is committed, secrets are not../secrets/is gitignored. Populate it before the firstcompose up.
Metrics & health
Prometheus metrics on :9090/metrics, health/readiness on :8080,
admin console on 127.0.0.1:9091. Everything an operator or
orchestrator needs to decide "is the agent healthy?" without reading
logs.
Source: crates/core/src/telemetry.rs, src/main.rs.
Ports at a glance
| Port | Binding | Purpose |
|---|---|---|
:9090 | 0.0.0.0 | Prometheus /metrics scrape |
:8080 | 0.0.0.0 | Health /health, readiness /ready, WhatsApp pairing pages |
:9091 | 127.0.0.1 | Admin console (loopback only) |
Ports are not configurable yet — if you need to remap, port-forward outside the agent (Docker, k8s service).
/metrics (Prometheus)
Exposed metrics:
| Name | Type | Labels | What |
|---|---|---|---|
llm_requests_total | counter | agent, provider, model | Every LLM completion request |
llm_latency_ms | histogram | agent, provider, model | Buckets 50, 100, 250, 500, 1000, 2500, 5000, 10000 ms |
messages_processed_total | counter | agent | Inbound messages that reached an agent |
nexo_extensions_discovered | counter | status={ok,disabled,invalid} | Emitted on every discovery sweep |
nexo_tool_calls_total | counter | agent, outcome={ok,error,blocked,unknown}, tool | Tool invocations |
nexo_tool_cache_events_total | counter | agent, event={hit,miss,put,evict}, tool | Tool-level memoization |
nexo_tool_latency_ms | histogram | agent, tool | Per-tool latency |
circuit_breaker_state | gauge | breaker | 0 = Closed, 1 = Open; always includes nats |
credentials_accounts_total | gauge | channel | Per-channel labelled instance count (Phase 17) |
credentials_bindings_total | gauge | agent, channel | 1 when the agent has a credential bound, 0 otherwise |
channel_account_usage_total | counter | agent, channel, direction={inbound,outbound}, instance | Every credential use |
channel_acl_denied_total | counter | agent, channel, instance | Outbound calls rejected by allow_agents |
credentials_resolve_errors_total | counter | channel, reason | Resolver failures (unbound, not_found, not_permitted) |
credentials_breaker_state | gauge | channel, instance | 0=closed, 1=half-open, 2=open. Per-(channel, instance) circuit breaker — a 429 from one number cannot trip the breaker for a sibling account. |
credentials_boot_validation_errors_total | counter | kind | Gauntlet errors by kind at boot |
credentials_insecure_paths_total | gauge | — | Credential files with lax permissions at boot |
credentials_google_token_refresh_total | counter | account_fp, outcome={ok,err} | Google OAuth refresh attempts (fp = sha256[..8], not raw email) |
pairing_inbound_challenged_total | counter | channel, result={delivered_via_adapter,delivered_via_broker,publish_failed,no_adapter_no_broker_topic} | DM-challenge dispatch attempts (Phase 26.x) |
pairing_approvals_total | counter | channel, result={ok,expired,not_found} | nexo pair approve outcomes (Phase 26.y) |
pairing_codes_expired_total | counter | — | Setup codes pruned past TTL or rejected as expired on approve |
pairing_bootstrap_tokens_issued_total | counter | profile | Bootstrap tokens minted by BootstrapTokenIssuer::issue |
pairing_requests_pending | gauge | channel | Pending pairing requests (push-tracked; PairingStore::refresh_pending_gauge exposed for drift recovery after a daemon restart) |
Circuit-breaker state for the nats breaker is sampled at scrape
time from broker readiness, so a stalled publish path shows up in
the next scrape without needing an eager push.
The credentials_* and channel_* series are documented with full
schema examples in config/credentials.md.
account_fp is always an 8-byte sha256 fingerprint of the account id,
never the raw JID or email, so scraped metrics stay safe to share.
Useful alerts
LLM provider flapping
- alert: LlmError5xxHigh
expr: sum(rate(llm_requests_total{outcome="error"}[5m])) by (provider) > 0.1
for: 5m
NATS circuit open
- alert: NatsBreakerOpen
expr: circuit_breaker_state{breaker="nats"} == 1
for: 1m
Tool call failures
- alert: ToolErrorSpike
expr: |
sum(rate(nexo_tool_calls_total{outcome="error"}[5m])) by (tool) > 0.5
for: 10m
Health endpoints
flowchart LR
GET1[GET /health] --> OK[200 OK<br/>always<br/>{status:ok}]
GET2[GET /ready] --> CHK{broker ready<br/>AND agents > 0?}
CHK -->|yes| RDY[200 OK<br/>{status:ready,<br/>agents_running:N}]
CHK -->|no| NOT[503 Service Unavailable<br/>{status:not_ready,<br/>broker_ready,<br/>agents_running}]
GET /health— liveness probe. Returns 200 as long as the process is accepting connections. Don't use this as a traffic gate.GET /ready— readiness probe. Returns 200 only when the broker is ready and at least one agent runtime is attached to inbound topics. Returns 503 during boot, shutdown, or broker outage.GET /whatsapp/*— QR pairing pages and the/whatsapp/pairtunnel endpoint; see WhatsApp plugin.
Kubernetes probes
livenessProbe:
httpGet: { path: /health, port: 8080 }
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet: { path: /ready, port: 8080 }
initialDelaySeconds: 30
periodSeconds: 5
initialDelaySeconds: 30 for readiness covers extension discovery
and every agent runtime attaching its subscriptions.
Admin console (:9091)
Loopback-only. Exposes:
| Path | Purpose |
|---|---|
/admin/agents | Agent directory with live status, session counts |
/admin/tool-policy | Query the tool-policy registry |
The agent status [--endpoint URL] [--agent-id ID] [--json] CLI
subcommand hits this endpoint and prints a table or JSON; good for
scripting ops without grepping logs.
Remote access requires an explicit tunnel — the port is never exposed publicly by default.
Scrape config sample
# prometheus.yml
scrape_configs:
- job_name: nexo-rs
scrape_interval: 15s
static_configs:
- targets: ['agent:9090']
For Docker compose: the service name is agent. For k8s: use the
service DNS.
Gotchas
circuit_breaker_stateonly labels per-breaker, not per-provider. Multiple LLM providers each have their own breaker instance, but they surface as distinctbreakerlabel values. If you expected{provider="anthropic"}you'll need a label rename in your Prometheus relabel config.- Histograms are non-configurable. Buckets are compiled in. If your SLO requires fine-grained buckets below 50 ms, it is worth opening an issue.
/ready503 during shutdown is expected. Don't alert on 5 s of 503 bursts — alert onrate(> 30 s).
Logging
tracing under the hood. Human-readable in dev, JSON in production,
always to stderr (stdout is reserved for wire protocols like MCP
JSON-RPC).
Source: src/main.rs::init_tracing.
Quick reference
| Env var | Default | Meaning |
|---|---|---|
RUST_LOG | info | EnvFilter syntax (nexo_core=debug,async_nats=warn,*=info) |
AGENT_LOG_FORMAT | pretty (json in AGENT_ENV=production) | pretty | compact | json |
AGENT_ENV | unset | Set to production to default to JSON logs |
Levels
Pick the lowest verbosity that still surfaces the signal you care about:
| Level | Use |
|---|---|
error | Unrecoverable — operator action needed |
warn | Degraded but running (circuit open, retry budget burning) |
info | Lifecycle (startup, shutdown, reconnects) |
debug | Per-turn detail (tool invoked, session created) |
trace | Per-event firehose — only when chasing a bug |
Log formats
pretty (dev default)
Coloured, multi-line. Good at the terminal, bad in log pipelines.
2026-04-24T17:22:13Z INFO agent::runtime: agent runtime ready
at src/main.rs:1243
in agent_boot with agent="ana"
compact
One line per event. Middle ground.
2026-04-24T17:22:13Z INFO agent="ana" agent runtime ready
json
Structured. One JSON object per line. Default when AGENT_ENV=production.
{"ts_unix_ms":1714000000000,"level":"INFO","target":"agent::runtime","thread_id":"ThreadId(3)","file":"src/main.rs","line":1243,"spans":[{"name":"agent_boot","agent":"ana"}],"message":"agent runtime ready"}
Every entry carries:
ts_unix_ms— milliseconds since epoch (stable for ingestion)level,targetthread_id,file,line— for pinpointingspans— span hierarchy with attached fields- Any structured fields passed via
tracing::info!(agent = %id, ...)
Correlating across agents
Cross-agent work lands on agent.route.<target_id> with a
correlation_id. In logs, the correlation id shows up as a field on
every event that happened inside a delegation span.
flowchart LR
A[agent A<br/>info: tool_call agent.route.ops] --> MSG[NATS message<br/>correlation_id=req-123]
MSG --> B[agent B<br/>info: handling agent.route with correlation_id=req-123]
B --> REPLY[reply on agent.route.A<br/>correlation_id=req-123]
REPLY --> A2[agent A<br/>info: delegation returned correlation_id=req-123]
Grep logs by correlation_id to see the whole fan-out+in as a single
thread.
Structured-field conventions
Convention for fields that show up across the codebase:
| Field | Where |
|---|---|
agent | Any log tied to a specific agent runtime |
session | Any log inside a session context (usually UUID) |
extension (or ext) | Any log from extension runtimes |
tool | Any tool invocation log |
provider, model | LLM client logs |
correlation_id | Delegation-related logs |
topic | Broker publish/subscribe logs |
When adding new code, reuse these names — log pipelines can count on them.
Where stdout goes
stdout is reserved for:
- MCP server mode (
agent mcp serve) — JSON-RPC traffic - CLI subcommands that return data (
agent ext list --json,agent flow show --json,agent dlq list)
Everything else, including normal log output, goes to stderr.
Don't pipe agent … 2>&1 | jq unless you know the subcommand never
writes non-JSON to stdout.
Practical setups
Local dev
export RUST_LOG=agent=debug,nexo_core=debug,info
cargo run --bin agent -- --config ./config
Production (Docker)
services:
agent:
environment:
AGENT_ENV: production
RUST_LOG: info,async_nats=warn
Everything lands on stderr → container runtime picks it up → your log pipeline ingests JSON directly.
Chasing a specific agent
export RUST_LOG=agent=info
# then grep by field
docker compose logs agent | jq 'select(.spans[].agent == "ana")'
Gotchas
tracingis compile-time filtered. If you grep logs for a debug-level event and see nothing, verifyRUST_LOGcovers the module.- JSON mode drops ANSI colors. Rightly so — but don't pipe it through a TTY colorizer and then be confused by escape sequences.
stderrordering isn't guaranteed againststdout. Never assume a log line printed right after aprintln!happens in log order — pipes buffer independently.
Dead-letter queue operations
The DLQ is where events end up when they exhaust their retry budget or fail to deserialize at all. The runtime never silently drops an event — if it can't be delivered, it lands here for an operator to inspect or replay.
Source: crates/broker/src/disk_queue.rs, src/main.rs
(agent dlq ... subcommands).
When items land there
flowchart LR
PUB[publish event] --> NATS{NATS up?}
NATS -->|yes| OK[delivered]
NATS -->|no| DQ[pending_events]
DQ --> DRAIN[disk queue drain]
DRAIN -->|attempts < 3| DQ
DRAIN -->|attempts >= 3| DLQ[dead_letters]
DQ -.->|deserialization error| DLQ
- 3 attempts (
DEFAULT_MAX_ATTEMPTS) without success → row moves todead_letters - Unparseable payload → moves immediately (a poison pill is not worth retrying)
- Circuit-breaker-open on publish counts as an attempt — if the breaker stays open, the queue will eventually flush into DLQ
See Fault tolerance for the full retry flow.
The DeadLetter row
#![allow(unused)] fn main() { struct DeadLetter { id: String, // UUID topic: String, // NATS subject payload: String, // JSON event body failed_at: i64, // unix timestamp (ms) reason: String, // error text } }
Storage: SQLite table dead_letters in the broker DB (typically
./data/queue/broker.db).
CLI
agent dlq list # list up to 1000 entries
agent dlq replay <id> # move one entry back to pending_events
agent dlq purge # delete every entry
list output
Columns: id | topic | failed_at | reason. Plain text, one entry per
line, suitable for grep / awk piping.
2f9c2e4a-... plugin.inbound.whatsapp 2026-04-24T17:22:13Z circuit breaker open
b1a3a9f5-... plugin.outbound.telegram 2026-04-24T17:23:01Z deserialization error: unexpected field `...`
replay
Moves the row back to pending_events with attempts = 0:
$ agent dlq replay 2f9c2e4a-...
replayed 2f9c2e4a-... → pending_events (next daemon drain will retry it)
The retry happens on the next drain() cycle of the running agent —
replay itself does not attempt delivery. That way a running agent
in a different shell picks it up; a stopped agent leaves the event
safely in pending_events for its next startup.
purge
Destructive. Drops every row in dead_letters:
$ agent dlq purge
purged 42 dead-letter entries
Use with care — there is no per-topic filter. If you need a scoped
purge, inspect with list, selectively replay what you want to
keep, then purge the rest.
Exit codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Failure (event not found for replay, DB access error, etc.) |
Common workflows
Post-outage triage
# See what piled up during the NATS outage
agent dlq list | wc -l
# Spot-check
agent dlq list | head
agent dlq list | awk '{print $2}' | sort | uniq -c
# If reasons look transient (circuit open, timeouts):
agent dlq list | awk '{print $1}' | while read id; do
agent dlq replay "$id"
done
Poison-pill cleanup
If reason mentions deserialization errors, the payload is malformed
— no amount of retry will help. Collect the offenders, fix the
producer side, then:
agent dlq list | grep deserialization | awk '{print $1}' > /tmp/poison.txt
# ... verify they're truly poison ...
agent dlq purge
Preview without modifying
The CLI has no --dry-run flag today. Use agent dlq list to preview
first; the DB rows are stable until you explicitly replay or
purge.
Monitoring
There is no dedicated DLQ metric yet. Approximations:
- A spike in
circuit_breaker_state{breaker="nats"} == 1time strongly predicts DLQ growth — alert on it. - Consider wrapping
agent dlq list | wc -lin a cron job that pushes the count to Prometheus via the textfile collector if you want a direct gauge.
Gotchas
replaydoesn't wake a stopped agent. If no agent is running against the same data directory, the row just moves back topending_eventsand waits for the next startup drain.- No replay deduplication. Replaying an event that was already successfully delivered later will deliver it again. If your consumer isn't idempotent, spot-check downstream state before replaying.
purgeis global. Scope it withlist | replayselectively if you need to preserve a subset.
Config hot-reload
Operators rotate per-agent knobs (allowlists, model strings, prompts,
rate limits, delegation gates) without restarting the daemon. Sessions
currently handling a message finish their turn on the old snapshot;
the next event picks up the new one (apply-on-next-message). Plugin
configs (whatsapp.yaml, telegram.yaml, …) are not hot-reloadable
yet — see limitations.
What triggers a reload
| Trigger | Source |
|---|---|
File save under config/ | notify-based watcher, debounced 500 ms |
agent reload CLI | Publishes control.reload on the broker |
| Direct broker publish | Any integration can emit control.reload |
What's reloaded
Files watched by default (paths relative to the config dir):
agents.yamlagents.d/(recursive)llm.yamlruntime.yaml
Extra paths listed under runtime.reload.extra_watch_paths are
appended to the list.
The fields that apply live without a restart:
| Field | Location | Effect |
|---|---|---|
allowed_tools (agent + binding) | agents.d/*.yaml | Tool list visible to the LLM + per-call guard |
outbound_allowlist | same | Defense-in-depth in whatsapp_send_* / telegram_send_* |
skills | same | Skill blocks rendered into the system prompt |
model.model (binding-level) | same | LLM model string on next turn |
system_prompt + system_prompt_extra | same | System block composition |
sender_rate_limit | same | Per-binding token bucket |
allowed_delegates | same | Delegation ACL |
providers.<name>.api_key | llm.yaml | Rotated via a fresh LlmClient on next turn |
Fields that require a restart (logged as warn during reload):
id,plugins,workspace,skills_dir,transcripts_dirheartbeat.enabled,heartbeat.intervalconfig.debounce_ms,config.queue_capmodel.provider(binding-level provider must match agent provider — theLlmClientis wired once per agent)broker.yaml,memory.yaml,mcp.yaml,extensions.yaml
Adding or removing an agent also requires a restart in this release; see limitations.
Configuration
config/runtime.yaml is optional. Defaults:
reload:
enabled: true # master switch
debounce_ms: 500 # notify-debouncer-full window
extra_watch_paths: [] # appended to the built-in list
Set enabled: false to turn off the file watcher + the
control.reload subscriber. The CLI agent reload still works — the
daemon never opens a privileged socket, it just listens on the shared
broker.
The reload pipeline
file save / CLI / broker
│
▼
debouncer (500 ms)
│
▼
AppConfig::load (YAML + env resolution)
│
▼
validate_agents_with_providers ──fail──▶ log warn, bump
│ config_reload_rejected_total,
▼ keep old snapshot
RuntimeSnapshot::build (per agent)
│
▼
ArcSwap::store (atomic per agent)
│
▼
events.runtime.config.reloaded
Validation failure never swaps. The daemon always serves a snapshot that passed its boot gauntlet.
CLI
# Human-readable output
$ agent reload
reload v7: applied=2 rejected=0 elapsed=18ms
✓ ana
✓ bob
# Machine-readable
$ agent reload --json
{
"version": 7,
"applied": ["ana", "bob"],
"rejected": [],
"elapsed_ms": 18
}
Exit codes:
0— at least one agent reloaded.1— nocontrol.reload.ackwithin 5 s (daemon not running).2— every agent rejected (partial-fail signal for CI).
Broker contract
| Topic | Direction | Payload |
|---|---|---|
control.reload | → daemon | {requested_by: string} |
control.reload.ack | ← daemon | serialized ReloadOutcome |
ReloadOutcome JSON shape:
{
"version": 7,
"applied": ["ana", "bob"],
"rejected": [
{"agent_id": "ana", "reason": "snapshot build: ..."}
],
"elapsed_ms": 18
}
Telemetry
| Metric | Type | Labels |
|---|---|---|
config_reload_applied_total | counter | — |
config_reload_rejected_total | counter | — |
config_reload_latency_ms | histogram | — |
runtime_config_version | gauge | agent_id |
Scrape via the metrics endpoint (ops/metrics).
Apply-on-next-message semantics
A reload does not interrupt sessions that are currently handling a message. Specifically:
- The LLM turn in flight keeps its captured
Arc<RuntimeSnapshot>for the life of the turn — tool calls inside that turn all see the same policy, even if several reloads land during the turn. - The next event delivered to the agent reads the latest snapshot
via
snapshot.load()on the intake hot path.
If you need a "force-apply now" semantic (terminate in-flight sessions,
respawn), use agent reload --kick-sessions — not implemented yet,
tracked in Phase 19.
Security model
control.reloadtopic has no application-level auth. Anyone with broker publish rights can trigger a reload. In production with NATS, restrict thecontrol.>subject pattern via NATS account permissions; see NATS with TLS + auth. The local-broker fallback is in-process only — no remote attack surface.- File-watcher trust = filesystem write. Whoever can edit
config/agents.d/*.yamlcan change capability surface. Treat the config dir as a privileged resource: 0600 on YAML files, 0700 on the directory. events.runtime.config.reloadedpayload includes agent ids and rejection reasons. Subscribers see them. Single-process deployments are fine; in multi-tenant setups, gate theevents.runtime.>pattern in NATS auth.- Outbound allowlist scope. The Phase 16 outbound allowlist governs WhatsApp + Telegram tools only. Google tools are gated by the OAuth scopes granted at credential creation (see Per-agent credentials) — there is no per-recipient list for Google.
- Apply-on-next-message and tightening reloads. A reload that
narrows an allowlist for security reasons does not affect
in-flight sessions until they next receive an event. If you need
the change to take effect immediately, restart the daemon (or wait
for the upcoming
agent reload --kick-sessionsflag in Phase 19).
Failure modes
- Bad YAML:
AppConfig::loadfails. Old snapshot keeps serving.config_reload_rejected_totalbumps. The warn log names the file + line. - Validation errors: aggregate — every problem across every agent shows in one warn block. Fix them in one edit instead of restart-and-repeat.
- Unknown provider: rejected at boot + at reload by
KnownProviderscheck. Boot validation lists what's registered. - Missing tool in binding's
allowed_tools: caught by the post-registry validation pass during reload. - Agent added / removed: Phase 18 rejects these with a clear message; restart the daemon to reshape the fleet.
Limitations
Intentional scope gaps for Phase 18, tracked for Phase 19:
- Add / remove agent at runtime. The coordinator rejects new ids and left-over registered handles with an actionable message. Restart needed.
- Plugin config hot-reload (
whatsapp.yaml,telegram.yaml,browser.yaml,email.yaml). Plugin daemons own I/O (QR pairing, long-polling). Reshaping them live requires a dedicated lifecycle refactor. config_reloadedhook for extensions to react. Pending.- SIGHUP trigger as an extra UX path. Deferred — use the broker topic or the CLI.
See also
- Layout — where these files live
- agents.yaml — the per-agent surface
- llm.yaml — provider credentials
- Metrics (Prometheus)
Capability toggles
Several bundled extensions ship with dangerous capabilities off by default — write paths, secret reveal, cache purges. Each capability is gated by a single environment variable. The operator flips it on by exporting the var in the agent process's environment.
agent doctor capabilities enumerates every known toggle, its
current state, and a hint for enabling it.
$ agent doctor capabilities
Capability toggles
──────────────────────────────────────────────────────────────────
EXT ENV VAR STATE RISK EFFECT
onepassword OP_ALLOW_REVEAL disabled HIGH Reveal raw secret values…
onepassword OP_INJECT_COMMAND_ALLOWLIST disabled HIGH Allow `inject_template` to pipe…
cloudflare CLOUDFLARE_ALLOW_WRITES disabled HIGH Create / update / delete DNS…
cloudflare CLOUDFLARE_ALLOW_PURGE disabled CRITICAL Purge zone cache…
docker-api DOCKER_API_ALLOW_WRITE disabled HIGH Start / stop / restart…
proxmox PROXMOX_ALLOW_WRITE disabled CRITICAL VM / container lifecycle…
ssh-exec SSH_EXEC_ALLOWED_HOSTS disabled HIGH Allow `ssh_run` against…
ssh-exec SSH_EXEC_ALLOW_WRITES disabled CRITICAL Allow `scp_upload`…
Pass --json for machine-readable output (admin UI, dashboards):
agent doctor capabilities --json
Toggle reference
| Env var | Extension | Kind | Risk | Effect |
|---|---|---|---|---|
OP_ALLOW_REVEAL | onepassword | bool | high | Returns secret values verbatim instead of fingerprints |
OP_INJECT_COMMAND_ALLOWLIST | onepassword | allowlist | high | Enables inject_template exec mode for the listed commands |
CLOUDFLARE_ALLOW_WRITES | cloudflare | bool | high | Authorizes create_dns_record, update_dns_record, delete_dns_record |
CLOUDFLARE_ALLOW_PURGE | cloudflare | bool | critical | Authorizes purge_cache |
DOCKER_API_ALLOW_WRITE | docker-api | bool | high | Authorizes start_container, stop_container, restart_container |
PROXMOX_ALLOW_WRITE | proxmox | bool | critical | Authorizes VM/container lifecycle actions |
SSH_EXEC_ALLOWED_HOSTS | ssh-exec | allowlist | high | Hosts the agent may target with ssh_run |
SSH_EXEC_ALLOW_WRITES | ssh-exec | bool | critical | Authorizes scp_upload |
Boolean kinds accept true, 1, or yes (case-insensitive).
Anything else — including unset — counts as disabled.
Allowlist kinds are comma-separated. Empty / whitespace-only inputs count as disabled. The agent never falls back to "anything goes" when the variable is unset.
When to enable
The default is off because every toggle moves the agent from "informational" to "consequential" — failures are no longer just a bad reply, they can mutate real systems or leak secrets.
Enable a toggle only when:
- The agent will provably need that capability for the next session.
- The operator (you) is present and the session is observed.
- There is a way to revert quickly — a wrapper script, a per-shell
.envrc, or a systemd unit drop-in you can comment out.
Avoid enabling toggles globally in ~/.profile. Scope them to the
specific shell or systemd unit that runs the agent.
How to revoke
- Boolean:
unset CLOUDFLARE_ALLOW_WRITES(or restart the shell / service). - Allowlist:
unset OP_INJECT_COMMAND_ALLOWLISTto disable, orexport OP_INJECT_COMMAND_ALLOWLIST=(empty string) to keep the intent visible while still treating the feature as disabled.
The agent reads these on each call (no caching), so revocation is
immediate without a restart for most paths. The single exception is
OP_INJECT_COMMAND_ALLOWLIST reading happens at tool-call time, not
extension-spawn time, so it also picks up changes live.
Adding a new toggle
When a future extension introduces a new write/reveal env var, add a
matching CapabilityToggle to
crates/setup/src/capabilities.rs::INVENTORY. Without that entry,
agent doctor capabilities is silently incomplete — the inventory
is the operator-facing source of truth.
Context optimization
Four independent mechanisms reduce the number of tokens sent to the LLM
on every request, without changing the agent's behavior. They live
under llm.context_optimization in llm.yaml and can be flipped per
agent under agents.<id>.context_optimization.
# config/llm.yaml
context_optimization:
prompt_cache:
enabled: true # default
long_ttl_providers: [anthropic, vertex]
compaction:
enabled: false # default off — opt in per agent
compact_at_pct: 0.75
tail_keep_tokens: 20000
tool_result_max_pct: 0.30
summarizer_model: "" # empty = reuse the agent's main model
lock_ttl_seconds: 300
token_counter:
enabled: true # default
backend: auto # auto | anthropic_api | tiktoken
cache_capacity: 1024
workspace_cache:
enabled: true # default
watch_debounce_ms: 500
max_age_seconds: 0 # 0 = never force refresh (notify is authoritative)
1. Prompt caching
Materializes the system prompt as a list of cache_control blocks on
the Anthropic wire so the stable prefix (workspace + skills + tool
catalog + binding glue) is billed at 0.1× input cost on every cache
hit. OpenAI / DeepSeek paths surface their automatic
prompt_tokens_details.cached_tokens field through the same
CacheUsage struct. Gemini and MiniMax flatten the blocks into the
legacy system slot today (warned once per process).
Block layout (4 cache breakpoints, the Anthropic max):
workspace— IDENTITY / SOUL / USER / AGENTS / MEMORY (Ephemeral1h)skills— per-binding skill catalog (Ephemeral1h)binding_glue— peer directory + per-binding system prompt + language directive (Ephemeral1h)channel_meta— sender id + per-turn context (Ephemeral5m)
Tools array is sorted alphabetically by name (the registry iterates a
non-deterministic DashMap) and the last tool gets a 1h
cache_control marker when cache_tools=true.
What to watch
llm_cache_read_tokens_total{agent, provider, model}— should dominatellm_cache_creation_tokens_totalafter the first turn of a warm session.llm_cache_hit_ratio{agent}— target >0.7 on multi-turn agents; <0.3 means you're paying the write premium without the discount.
When to flip off
- Provider rejects the request with a 400 mentioning
cache_control(very old model). Mitigation: the framework already strips markers forclaude-2.x; if Anthropic adds another exception, overrideANTHROPIC_CACHE_BETA="..."to disable the beta header. - A custom-built LLM gateway in front of Anthropic doesn't pass the
cache_controlfield through.
2. Compaction (online history folding)
When the pre-flight token estimate crosses compact_at_pct * effective_window, the agent runs a secondary LLM call to fold
history[..tail_start] into a single summary string. The summary
replaces the head; the last tail_keep_tokens worth of turns ride
forward verbatim. Subsequent turns prepend the summary as a synthetic
user/assistant pair so Anthropic's role-alternation rule stays valid.
Defaults are intentionally conservative: off by default. Roll out
per agent via agents.<id>.context_optimization.compaction: true.
agents:
- id: ana
context_optimization:
compaction: true # ana opts in early, others stay off
What to watch
llm_compaction_triggered_total{agent, outcome}— outcomes areok,failed,lock_held,no_boundary,tool_result_truncated.llm_compaction_duration_seconds{agent, outcome="ok"|"failed"}— a rising p99 means the summarizer model is overloaded; lowercompact_at_pctso triggers are smaller (cheaper) and more frequent.
When to flip off
- Quality regression in long sessions — the summary may be losing
active-task state. Inspect
compactions_v1rows in the SQLite store to see what was folded; bumptail_keep_tokensso more verbatim context survives. - Lock contention spikes — multiple processes (NATS multi-node) racing on the same session. The lock is per-session so this only happens with sticky-session misrouting; fix at the broker level rather than disabling compaction.
Safety nets
compaction_locks_v1carries TTL (lock_ttl_seconds) — a crashed compactor doesn't deadlock the session; the next acquire after the TTL wins automatically.- Audit log: every successful compaction inserts a row in
compactions_v1with the summary text + token cost. Inspect withsqlite3 memory.db "SELECT * FROM compactions_v1 WHERE session_id = ? ORDER BY compacted_at DESC". - Failure path: 3 retries with backoff; on total failure the original history goes to the LLM unchanged (graceful degradation, never silent data loss).
3. Token counting (pre-flight sizing)
TokenCounter trait with two backends:
- AnthropicTokenCounter — calls
POST /v1/messages/count_tokens. Exact (matches billing). LRU-cached onblake3(payload): the stable tools+identity prefix hashes the same on every turn, so the network round-trip happens ~once per process lifetime. - TiktokenCounter — offline
cl100k_baseapproximation. Drift vs Anthropic billing measured at 5–15%. Fine for budget gating, not for hard limits.
The cascade wraps the primary in a CircuitBreaker
(failure_threshold=3, 30s→300s backoff): on count_tokens outage the
agent loop falls back to tiktoken so the request still goes through.
Once the breaker has opened at least once, is_exact() flips to false
for the rest of the process so dashboards don't conflate sample
populations.
What to watch
llm_prompt_tokens_estimated{agent, provider, model}— compare againstllm_prompt_tokens_drift{...}(histogram in percent).- A drift p99 climbing past 20% means the active backend is wrong for
your model — switch from
tiktokentoanthropic_api(or vice versa for non-Anthropic providers).
When to flip off
- The agent runs against a self-hosted gateway that doesn't honor
count_tokens. Setbackend: tiktokento skip the round-trip.
4. Workspace bundle cache
Reads of IDENTITY / SOUL / USER / AGENTS / MEMORY MDs go through an
in-memory Arc<WorkspaceBundle> cache keyed by (root, scope, sorted extras). A notify-debouncer-full watcher (default 500ms) drops
every entry under a workspace root when any *.md changes. Non-MD
file changes are ignored.
What to watch
workspace_cache_hits_total{path}should dominateworkspace_cache_misses_total{path}once the cache is warm.workspace_cache_invalidations_total{path}rising without operator edits points to a tool that writes to the workspace too aggressively.
When to flip off
- NFS / FUSE filesystems where
notify(7)drops events. Setworkspace_cache.max_age_seconds: 60(or similar) to force a refresh after the absolute TTL even without a watch event.
Per-agent overrides
The four enables — and only the enables — can be flipped per agent in
agents.yaml. The numeric knobs (compact_at_pct, tail_keep_tokens,
watch_debounce_ms, …) stay global to keep the surface narrow.
agents:
- id: ana
context_optimization:
prompt_cache: true
compaction: true
token_counter: true
workspace_cache: true
- id: bob
context_optimization:
prompt_cache: false # bob runs against a gateway that strips cache_control
Hot-reload behavior
Changing global knobs (llm.yaml) takes effect on the next request
once the reload coordinator picks up the file change (Phase 18). For
per-agent enables, the override rides on Arc<AgentConfig> inside
RuntimeSnapshot and is observed on the next
policy_for(...) lookup. The LlmAgentBehavior struct itself still
caches its compactor / prompt_cache_enabled fields at construction —
toggling those without a process restart requires the future
ArcSwap<CompactionRuntime> refactor noted in proyecto/FOLLOWUPS.md.
Rollout playbook
- Deploy with everything at defaults —
prompt_cache=true,compaction=false,token_counter=true,workspace_cache=true. - Watch
llm_cache_hit_ratiofor 24h. Expect it to climb to >0.7 on chatty agents; if it stays low, check that the workspace bundle is stable across turns (no MD writes mid-session). - Pick one agent, opt it into compaction (
agents.<id>.context_optimization.compaction: true), reload config, watch for a week. - If
llm_compaction_triggered_total{outcome="ok"}> 0 and quality feedback is positive, roll compaction out to the rest of the fleet. - If drift on
llm_prompt_tokens_driftis consistently <10%, leavetoken_counter.backend: auto. If higher, considerbackend: tiktokenfor non-Anthropic providers — saves the round-trip without losing accuracy you didn't have anyway.
Link understanding
When a user message contains URLs, the runtime can fetch them, extract
the main text, and inject a # LINK CONTEXT block into the system
prompt for that turn. The agent stops saying "I can't see what's at
that link" and starts answering against the actual page content.
The feature is off by default. Opt in per agent (and optionally override per binding).
Per-agent config
# config/agents.yaml
agents:
- id: ana
link_understanding:
enabled: true # default: false
max_links_per_turn: 3 # cap URLs fetched per message
max_bytes: 262144 # 256 KiB per response, streamed
timeout_ms: 8000 # per-fetch HTTP timeout
cache_ttl_secs: 600 # 0 disables cache
deny_hosts: # appended to built-in denylist
- internal.corp
Built-in denylist (always applied, cannot be removed):
localhost, 127.0.0.1, ::1, metadata.google.internal,
169.254.169.254. Defense against SSRF to internal endpoints.
Per-binding override
Per-binding link_understanding overrides the agent default. Useful
to disable on a noisy channel:
agents:
- id: ana
link_understanding: { enabled: true }
bindings:
- inbound: plugin.inbound.whatsapp.*
link_understanding: { enabled: false } # narrow on WA
- inbound: plugin.inbound.telegram.*
# inherits agent default (enabled: true)
null / omitted = inherit. Any object = full replace.
What gets injected
For each fetched URL, one bullet:
# LINK CONTEXT
- https://example.com/post — Title of the page
First paragraphs of main text, collapsed to ~max_bytes characters,
HTML stripped, scripts and styles dropped.
The block lands inside the system prompt for that turn only. Cache hits skip the fetch but still render the block.
Hard caps (cannot be raised by config)
| Cap | Value |
|---|---|
| URL length | 2048 chars |
| Redirect chain | 5 hops |
| User-Agent | nexo-link-understanding/0.1 |
| Response stream cutoff | max_bytes (drops the rest) |
| Newlines / control chars in extracted text | sanitised (prompt-injection guard) |
Operations
- A single shared
LinkExtractor(HTTP client + LRU cache, capacity 256) is built at boot and reused by every agent runtime in the process. - Cache is in-process only. Restarts cold.
- Telemetry exported on
/metrics:nexo_link_understanding_fetch_total{result="ok|blocked|timeout|non_html|too_big|error"}— counter, one increment per fetch attempt.nexo_link_understanding_cache_total{hit="true|false"}— counter, incremented on every TTL-cached lookup so dashboards can compute hit-rate without instrumenting the agent loop.nexo_link_understanding_fetch_duration_ms— histogram (single series, no labels). Only observed for attempts that actually issued an HTTP request — cache hits and host-blocked URLs skip it so latency percentiles reflect real network work.
When to leave it off
- Agents talking to untrusted senders where the agent must not be pivoted into fetching attacker-controlled URLs.
- Channels with strict latency budgets — a fetch can add up to
timeout_msto the turn. - Privacy-sensitive deployments where outbound HTTP from the agent host is not allowed.
Web search
The web_search built-in tool lets an agent query the web through one
of four providers: Brave, Tavily, DuckDuckGo, Perplexity.
The runtime owns provider selection, caching, sanitisation, and circuit
breaking — agents only see results.
The feature is off by default. Operators opt in per agent (and optionally override per binding).
Per-agent config
# config/agents.yaml
agents:
- id: ana
web_search:
enabled: true # default false
provider: auto # "auto" | "brave" | "tavily" | "duckduckgo" | "perplexity"
default_count: 5 # 1..=10
cache_ttl_secs: 600 # 0 disables cache
expand_default: false # default value of `expand` arg
provider: auto
Picks the first credentialed provider in this order:
brave(envBRAVE_SEARCH_API_KEY)tavily(envTAVILY_API_KEY)perplexity(envPERPLEXITY_API_KEY, requires theperplexityfeature)duckduckgo(no key — bundled by default; the always-available fallback)
DuckDuckGo scrapes html.duckduckgo.com and is rate-limited / captcha-prone;
the runtime detects bot challenges and trips the breaker so the next call
rotates to a different provider.
Per-binding override
Same shape as link_understanding: null (default) inherits the agent
value, any object replaces it.
agents:
- id: ana
web_search: { enabled: true }
bindings:
- inbound: plugin.inbound.whatsapp.*
web_search: { enabled: false } # silent on WA
- inbound: plugin.inbound.telegram.*
# inherits agent default
Tool surface
The LLM sees this signature:
{
"name": "web_search",
"parameters": {
"query": "string (required)",
"count": "integer (1-10, optional)",
"provider": "string (optional override)",
"freshness": "day | week | month | year (optional)",
"country": "ISO-3166 alpha-2 (optional)",
"language": "ISO-639-1 (optional)",
"expand": "boolean (optional)"
}
}
Return shape:
{
"provider": "brave",
"query": "rust async runtimes",
"from_cache": false,
"results": [
{
"url": "https://example.com/post",
"title": "Title",
"snippet": "First 4 KiB of the description, sanitised.",
"site_name": "example.com",
"published_at": "2026-04-20T00:00:00Z"
}
]
}
When expand: true and Phase 21 link understanding is enabled, the
top three hits also get a body field populated by the shared
LinkExtractor. Bodies obey the same denylist + size caps that
Link understanding describes.
Cache
In-process SQLite cache shared across every agent. Key format:
sha256(SCHEMA_VERSION || provider || query || canonical_params)
canonical_params excludes provider (router decides) and expand
(post-processing). cache_ttl_secs: 0 disables caching entirely.
Operators that want a separate cache file or schema migration set
web_search.cache.path in web_search.yaml (planned — see
FOLLOWUPS).
Circuit breaker
Every provider call goes through nexo_resilience::CircuitBreaker
keyed web_search:<provider>. Default config: 5 consecutive failures
trip the breaker, exponential backoff up to 120 s. Open-state calls
return ProviderUnavailable(provider) immediately and the router
rotates to the next candidate (when called via auto-detect).
Sanitisation
Every title, url, and snippet returned by a provider passes
through sanitise_for_prompt:
- control chars stripped,
- CR / LF / tab collapsed to single spaces,
- runs of whitespace collapsed,
- byte-capped at 4 KiB (snippet) / 512 B (title) / 2 KiB (URL),
- truncation respects UTF-8 char boundaries.
This is the same defence-in-depth Phase 19 (language directive) and
Phase 21 (# LINK CONTEXT) apply: SERPs are attacker-controlled input.
Telemetry
Exported on /metrics:
nexo_web_search_calls_total{provider,result}— counter, one increment per provider attempt.resultisok(provider returned hits),error(network / HTTP / parse failure), orunavailable(the breaker short-circuited the call before it left the process).nexo_web_search_cache_total{provider,hit}— counter, every TTL-cached lookup.provideris the first candidate (the one the cache key is built from). Compute hit rate ascache_total{hit="true"} / sum(cache_total).nexo_web_search_breaker_open_total{provider}— counter; one increment per request the breaker rejected. Pair withcircuit_breaker_state{breaker="web_search:<provider>"}to alert on sustained open state vs a flap.nexo_web_search_latency_ms{provider}— histogram. Only observed for attempts that issued an HTTP request, so the percentile reflects real provider latency (cache hits and breaker short-circuits would pull p50 down to 0 and hide regressions).
When to leave it off
- Privacy-sensitive deployments where outbound HTTP from the agent host is not allowed.
- Channels where the cost of a noisy SERP in the prompt outweighs the
agent's value (use per-binding
enabled: false). - Agents that already have
link_understandingfor the URLs the user shares — no need for SERP duplication.
Web fetch
The web_fetch built-in tool lets an agent retrieve the cleaned
body text + title for one or more URLs the agent already knows.
Companion to Web search: web_search finds
URLs, web_fetch retrieves them.
Distinct from web_search.expand=true because the agent often
knows the URL up-front (skill output, RSS poll, calendar
attachment, user message) and would otherwise have to either
hallucinate a search query or shell out to a fetch-url
extension.
When to use which
| Scenario | Tool |
|---|---|
| Agent needs to find content matching a query | web_search |
Agent has a URL from a web_search hit and wants the body | web_search(expand=true) |
| Agent has a URL from a poller / skill / user message | web_fetch |
| Agent has a list of URLs to triage | web_fetch(urls=[...]) |
Tool signature
{
"name": "web_fetch",
"parameters": {
"urls": ["https://example.com/article", "https://other.com/page"],
"max_bytes": 65536 // optional; clamped to deployment cap
}
}
Response shape:
{
"results": [
{
"url": "https://example.com/article",
"title": "Example article",
"body": "First paragraph...",
"ok": true
},
{
"url": "https://internal.intranet.local/private",
"ok": false,
"reason": "fetch failed (host blocked, timeout, non-HTML, oversized, or transport error). Check `nexo_link_understanding_fetch_total{result}` for the bucket."
}
],
"count": 2
}
A bad URL returns a {ok: false, reason} row instead of bailing
the whole call, so the agent can still consume the successful
ones. Per-call cap of 5 URLs; longer lists get trimmed with a
warn log.
Configuration
web_fetch has no dedicated config. It rides on
Link understanding:
link_understanding.enabled— gates the tool entirely. With itfalse, every fetch returns{ok: false, reason: "disabled by policy"}.link_understanding.max_bytes— deployment-wide ceiling. The tool'smax_bytesarg can shrink but never grow past this.link_understanding.deny_hosts— host blocklist (loopback, private subnets, internal cloud metadata endpoints, plus whatever the operator added).link_understanding.timeout_ms— per-fetch HTTP timeout.link_understanding.cache_ttl_secs— cache TTL. Successful fetches are cached so a secondweb_fetchof the same URL inside the TTL is free.
Per-binding overrides via EffectiveBindingPolicy::link_understanding
(see Per-binding capability override).
Telemetry
web_fetch reuses every counter the auto-link pipeline emits.
There's no separate dashboard:
nexo_link_understanding_fetch_total{result}—ok/blocked/timeout/non_html/too_big/error.nexo_link_understanding_cache_total{hit}—true/false.nexo_link_understanding_fetch_duration_ms— histogram, only populated when an HTTP request actually went out (cache hits and host-blocked URLs skip it so percentiles reflect real fetch work).
The bundled Grafana dashboard
(ops/grafana/nexo-llm.json)
already plots all three.
Why a per-call cap of 5 URLs
A runaway agent given the prompt "fetch every link in this 10k
RSS dump" would otherwise queue thousands of HTTP requests
synchronously, blowing the prompt budget and hammering the
target hosts. 5 covers every realistic agentic workflow
(read 3 candidates, pick the best two, summarise) while leaving
a clear ceiling. Operators who want batch behaviour should
spawn a TaskFlow that calls web_fetch
in chunks with cursor persistence.
Comparison to extensions
The fetch-url Python extension does roughly the same thing.
web_fetch differs in three ways:
- In-process — no subprocess spawn, no Python interpreter, no extension wire protocol. Sub-100ms cold path on the happy case.
- Shared cache + telemetry — links the user shares (auto-
expanded by Phase 21 link-understanding) AND links the
agent fetches via
web_fetchpopulate the same LRU. The second access is always free. - Same security defaults — same deny-host list, same size cap, same timeout. Operators tune one knob, two surfaces honour it.
Use the extension when the runtime path is wrong shape (custom
auth, post-only endpoints, non-HTML responses you want raw).
Use web_fetch for the standard "give me the article" case,
which is most of them.
Implementation
The tool lives at
crates/core/src/agent/web_fetch_tool.rs::WebFetchTool and is
registered for every agent unconditionally in src/main.rs.
The per-binding link_understanding.enabled policy gates
whether the underlying fetch happens; the tool itself is always
visible in the agent's tool list so operators can write
"call web_fetch on URL X" prompts without needing a per-agent
web_fetch.enabled flag.
Source of truth for FOLLOWUPS W-2 closure.
Pairing protocol
Two coexisting protocols ship in nexo-pairing:
- DM-challenge inbound gate — opt-in per binding. Unknown senders on WhatsApp / Telegram receive a one-time human-friendly code; the operator approves them via CLI. Existing senders pass through unchanged.
- Setup-code QR — operator-initiated.
nexo pair startissues a short-lived HMAC-signed bearer token + a gateway URL, packs them into a base64url payload, and renders a QR. A companion app scans, presents the token to the daemon, and gets a session token in return.
The feature is off by default. Existing setups see no behaviour
change until the operator flips pairing_policy.auto_challenge on a
binding.
DM-challenge gate
Per-binding config
# config/agents.yaml
agents:
- id: ana
inbound_bindings:
- plugin: whatsapp
instance: personal
pairing_policy:
auto_challenge: true # default false
The gate runs before the plugin publishes to the broker. Three outcomes per inbound message:
| Outcome | When | Plugin action |
|---|---|---|
Admit | sender in pairing_allow_from (or policy off) | publish as normal |
Challenge { code } | unknown sender, auto_challenge: true, slot free | reply with code, drop message |
Drop | max-pending exhausted (3 per channel/account) | silent drop |
Operator workflow
$ nexo pair list
CODE CHANNEL ACCOUNT CREATED SENDER
K7M9PQ2X whatsapp personal 2026-04-25T13:21:00Z +57311...
$ nexo pair approve K7M9PQ2X
Approved whatsapp:personal:+57311... (added to allow_from)
The next message from +57311... admits through the gate.
pair list only shows pending challenges by default. Use
--all to also dump every active row in pairing_allow_from
(approved + seeded), and --include-revoked to keep soft-deleted
entries in the listing for audit:
$ nexo pair list --all
No pending pairing requests.
CHANNEL ACCOUNT SENDER VIA APPROVED REVOKED
telegram cody_nexo_bot 1194292426 seed 2026-04-26 17:52:10 UTC -
whatsapp personal +57311... cli 2026-04-25 13:21:00 UTC -
$ nexo pair list --all --include-revoked --json | jq '.allow[0]'
{
"channel": "whatsapp",
"account_id": "personal",
"sender_id": "+57311...",
"approved_via": "cli",
"approved_at": "2026-04-25T13:21:00Z"
}
--json always returns { "pending": [...], "allow": [...] } so
consumers get a stable shape regardless of --all.
Cache + revoke
The gate caches decisions for 30 s to keep SQLite off the hot path. Revokes (and freshly-seeded admits) are eventually consistent within that window:
$ nexo pair revoke whatsapp:+57311...
Revoked whatsapp:+57311...
For an immediate effect, trigger a hot-reload — the coordinator
runs PairingGate::flush_cache as a post-reload hook (Phase 70.7),
so nexo reload (or any file-watched config edit) drops the cache
and the next inbound message re-queries the store:
$ nexo reload
A daemon restart still works as a hammer when reload is disabled.
Migrating an existing bot
If you already have known senders, seed them so the gate doesn't
challenge mid-conversation when you flip auto_challenge: true:
$ nexo pair seed whatsapp personal +57311... +57222... +57333...
Seeded 3 sender(s) into whatsapp:personal allow_from
seed is idempotent; running it twice is safe and re-activates any
sender that was previously revoked.
Setup-code QR
Issuing
$ nexo pair start --public-url wss://nexo.example.com --qr-png /tmp/p.png --json
{
"url": "wss://nexo.example.com",
"url_source": "pairing.public_url",
"bootstrap_token": "eyJwcm9maWxlIjoi...",
"expires_at": "2026-04-25T13:32:00Z",
"payload": "eyJ1cmwi..."
}
payload is what goes in the QR. The companion decodes it to recover
{url, bootstrap_token, expires_at}, opens the WebSocket, and
presents the token as Authorization: Bearer <bootstrap_token>.
URL resolution
Priority chain (first non-empty wins):
-
--public-url(CLI flag) -
tunnel.url(Phase tunnel — TODO: wire when accessor lands) -
gateway.remote.url -
LAN bind address (when
gateway.bind=lan) -
fail-closed: the daemon refuses to issue a code on a loopback-only gateway. As of Phase 70.5 the CLI also prints a ready-to-run
nexo pair seed <channel> <account> <SENDER>for every plugin instance configured underconfig/plugins/, so a dev-machine operator can skip the QR flow entirely:$ nexo pair start --ttl-secs 300 Pairing-start needs a non-loopback gateway URL. For local testing you usually don't need the QR flow at all — seed the operator's chat into the allowlist directly: nexo pair seed telegram cody_nexo_bot <YOUR_TELEGRAM_USER_ID> nexo pair seed whatsapp default <YOUR_WHATSAPP_NUMBER> Or, to keep using the QR flow, set one of: - `pairing.public_url` in config/pairing.yaml - `--public-url <wss://…>` flag - run `nexo` with the tunnel enabled (writes tunnel.url)
ws/wss security policy
Cleartext ws:// is allowed only on hosts the operator can
reasonably trust to be private:
127.0.0.1/::1(loopback)- RFC1918 (10/8, 172.16/12, 192.168/16)
- link-local (169.254/16)
*.localmDNS hostnames10.0.2.2(Android emulator)- Any host listed in
pairing.ws_cleartext_allow_extra
Everything else exigirá wss://. This matches OpenClaw's posture in
research/src/pairing/setup-code.ts.
Token format
b64u(claims_json) + "." + b64u(hmac_sha256(secret, claims_json))
claims_json={"profile":"companion-v1","expires_at":"...","nonce":"<32 hex>","device_label":"..."}secret= 32 bytes in~/.nexo/secret/pairing.key(auto-generated on first boot with 0600 perms; rotate by deleting + restarting).
Verification is constant-time (subtle crate) so timing leaks don't
discriminate between "wrong sig" and "wrong claims".
Threat model
| Concern | Mitigation |
|---|---|
| Brute-force pairing code | 32^8 ≈ 10^12 keyspace; 60 min TTL; max 3 pending per (channel, account) |
| Token replay after expiry | TTL on expires_at (default 10 min); HMAC verify fails closed |
| Token forgery | HMAC-SHA256 with 32-byte secret; constant-time compare |
| Secret leak | Rotate via rm ~/.nexo/secret/pairing.key && restart; all in-flight tokens invalidate |
| TOCTOU on approve | Single SQL transaction (approve reads + insert + delete in one tx) |
| ws cleartext on hostile network | Refuse to issue cleartext URL outside private-host allowlist |
| DoS via flood of pending requests | Max 3 per (channel, account); TTL 60 min auto-prunes |
Storage layout
Two SQLite tables in <memory_dir>/pairing.db:
pairing_pending (channel, account_id, sender_id PRIMARY KEY,
code, created_at, meta_json)
pairing_allow_from (channel, account_id, sender_id PRIMARY KEY,
approved_at, approved_via, revoked_at)
Soft-delete (revoked_at) keeps historical context: an operator can
later see "+57311 was approved on X, revoked on Y" for audit.
When to leave it off
- Single-user setups where the operator is the only sender — the gate adds a SQL hit per message for no security gain.
- Bots that take public input by design (e.g. a self-service support bot) — the gate would block every customer.
- Until you have an
agent setup web-search-style wizard, manualpair seedis the only friendly migration path.
Adapter registry
Each channel that participates in pairing implements
PairingChannelAdapter in its plugin crate. The adapter owns three
channel-specific decisions the runtime cannot make on its own:
normalize_sender(raw)— canonicalise inbound sender ids before the gate hits the store. WhatsApp strips@c.us/@s.whatsapp.netand prepends+; Telegram lower-cases@usernameand passes numeric chat ids through.format_challenge_text(code)— render the operator-facing pairing message. The default is plain UTF-8; the Telegram adapter overrides it to escape MarkdownV2 reserved characters and wrap the code in backticks so the user can long-press to copy.send_reply(account, to, text)— publish the challenge through the channel's outbound topic (plugin.outbound.{whatsapp,telegram}[.<account>]) using the payload shape that channel's dispatcher expects.
The bin (src/main.rs) constructs a PairingAdapterRegistry at boot
and registers the WhatsApp + Telegram adapters. The runtime consults
the registry on every inbound event whose binding has
pairing.auto_challenge: true. Channels with no registered
adapter fall back to a hardcoded broker publish that mirrors the
legacy text on plugin.outbound.{channel} — operators still see the
challenge in their channel, but without per-channel formatting.
Telemetry lives under
pairing_inbound_challenged_total{channel,result} with result one of
delivered_via_adapter, delivered_via_broker, publish_failed,
no_adapter_no_broker_topic, so dashboards can split adapter vs.
fallback delivery rates per channel.
CLI reference
nexo pair start [--for-device <name>] [--public-url <url>]
[--qr-png <path>] [--ttl-secs <n>] [--json]
nexo pair list [--channel <id>] [--all] [--include-revoked] [--json]
nexo pair approve <CODE> [--json]
nexo pair revoke <channel>:<sender_id>
nexo pair seed <channel> <account_id> <sender_id> [<sender_id>...]
nexo pair help
Anonymous telemetry (opt-in)
Nexo can emit a weekly heartbeat with anonymous, aggregated deployment shape so the project knows what configurations are actually in production. The heartbeat is disabled by default — nothing leaves your host until you explicitly opt in.
This page documents exactly what's sent, what isn't, and how to inspect the payload before enabling it.
What is sent
Every 7 days (drift-resistant — 7d ± 1h jitter), if telemetry is
enabled, Nexo POSTs a single JSON document to
https://telemetry.lordmacu.dev/nexo over HTTPS:
{
"schema_version": 1,
"instance_id": "0fa3...",
"version": "0.1.1",
"rust_version": "1.80.1",
"os": "linux",
"arch": "aarch64",
"uptime_days": 14,
"agents": {
"total": 3,
"active_24h": 2
},
"channels": {
"whatsapp": 1,
"telegram": 1,
"email": 0,
"browser": 1
},
"llm_providers": [
"minimax",
"anthropic"
],
"memory_backend": "sqlite-vec",
"sessions": {
"average_per_agent_24h": 12,
"p95_per_agent_24h": 28
},
"extensions_loaded": 4,
"broker_kind": "nats"
}
What is not sent
- ❌ Message content. Not a single byte of any conversation, prompt, response, or tool call ever leaves the host.
- ❌ Identifiers. No phone numbers, email addresses, contact
names, agent names, channel handles. The
instance_idis a random UUID generated on first opt-in and stored in~/.nexo/telemetry-id; it can't be tied to anything except a rerun of the same install. - ❌ API keys / tokens / secrets. None. The provider list is
the literal string
"minimax", never the key. - ❌ IP addresses. The receiving server (
telemetry.lordmacu.dev) drops the source IP at ingress before the payload hits any database. The HTTP access log retains only the country code derived from a one-way hash of the IP, used solely to plot the geographic distribution gauge on the public dashboard. - ❌ Hostname. Not in the payload. Not derived from anything in the payload.
- ❌ Time of day. The heartbeat is jittered so the timestamp doesn't reveal a pattern.
Why opt in
It's the only honest signal the project has about what's actually deployed. Without it, every roadmap discussion is guessing. With it, prioritization improves: if 80% of opt-in deployments use Anthropic + WhatsApp, then a regression on that combo gets a hot-fix; a niche feature goes to maintenance mode.
The aggregate dashboard at
https://lordmacu.github.io/nexo-rs/usage/ (published once
Phase 41 fully ships) shows everyone what everyone else is doing
in aggregate — same data the maintainers see.
Enable / disable
# Show current state + what would be sent right now
nexo telemetry status
# Enable (writes to /etc/nexo-rs/telemetry.yaml or ~/.nexo/telemetry.yaml)
nexo telemetry enable
# Inspect exactly what tomorrow's heartbeat will contain
nexo telemetry preview
# Disable + remove the instance_id file
nexo telemetry disable
Hot-reload aware (Phase 18) — toggling doesn't require a daemon restart. The runtime watches the telemetry config; the next heartbeat tick respects whatever is currently on disk.
First-launch banner
On first nexo boot in a fresh install, the daemon prints once
to the journal:
========================================================================
nexo telemetry is DISABLED.
Enabling it sends an anonymous, aggregated weekly heartbeat
describing your deployment shape (channel mix, LLM provider mix,
agent count). No message content, no identifiers, no API keys.
Inspect the payload: nexo telemetry preview
Enable: nexo telemetry enable
Read the full spec: https://lordmacu.github.io/nexo-rs/ops/telemetry.html
========================================================================
Subsequent boots stay silent. Toggling on or off prints a one-line confirmation.
Server-side guarantees
The receiving endpoint at telemetry.lordmacu.dev:
- Drops the source IP at the load balancer, before the request reaches any application code or log aggregator.
- Stores the JSON document verbatim with no enrichment.
- Aggregates documents per
instance_idonly to compute theactive_install_countcardinality on the public dashboard. - Retains raw documents for 90 days, then aggregates and deletes the originals.
- Does not correlate documents across
instance_idrotations — if younexo telemetry disable && nexo telemetry enable, you become a fresh install in the dataset.
The server source code lives at
https://github.com/lordmacu/nexo-telemetry-server (deferred —
opens once Phase 41 finishes server side). Reproducible build,
verifiable signatures.
Inspecting in transit
The HTTP request is plain HTTPS POST with the JSON payload above as the body. Easy to mitm in a corp environment:
mitmproxy -p 8888 -s drop_telemetry.py &
NEXO_TELEMETRY_PROXY=http://127.0.0.1:8888 nexo telemetry preview
The runtime respects HTTPS_PROXY / HTTP_PROXY / standard
proxy env vars for the heartbeat HTTP client (it goes through
the same reqwest client every other Nexo egress uses).
Disabling at the firewall
If you just want to make sure no telemetry can leave even if it gets accidentally enabled:
sudo iptables -A OUTPUT -d telemetry.lordmacu.dev -j REJECT
The runtime will see a network error in its logs every 7 days (rate-limited to once-per-week to not flood). It does not retry-forever — one attempt per scheduled tick.
Compliance notes
- GDPR: anonymous aggregate data with no identifiers and no
PII falls outside Article 4(1) "personal data". The
instance_idis technical metadata, not a pseudonym — it can't be re-tied to a natural person via any data the project holds. - HIPAA: no PHI is collected; the field set is infrastructure metadata only.
- Corporate sec teams: the receiving endpoint speaks only
HTTPS, no fallback to HTTP. The server cert is publicly
pinnable. The payload schema is documented + versioned; new
fields require bumping
schema_versionand a documented changelog entry below.
Schema changelog
| Version | Released | What changed |
|---|---|---|
| 1 | TBD when Phase 41 ships | Initial schema as documented above |
Future schema changes append a row here. Old clients are not
forced to upgrade — the server accepts every advertised
schema_version indefinitely (rolled-up dashboard panels
include only the fields a given schema carries).
Out of scope
- Per-agent / per-binding metrics — that's the Prometheus
/metricsendpoint, scraped locally by your own Prometheus (see Grafana dashboards). The telemetry heartbeat is deployment-shape only. - Crash reports — Nexo emits anyhow backtraces to the local journal but never sends them off-host.
- Real-time analytics — heartbeat is once weekly. There's no call-home for live metrics, ever.
Benchmarks
The workspace ships criterion benchmark suites for every hot path
that runs on the data plane. CI executes them on every PR + weekly
on main so regressions are visible before merge.
Quick run
# Single crate:
cargo bench -p nexo-resilience
# Single bench within a crate:
cargo bench -p nexo-broker --bench topic_matches
# Single group within a bench:
cargo bench -p nexo-broker --bench topic_matches -- 'topic_matches/wildcard'
Output goes to target/criterion/. Open index.html under that
directory in a browser for the full HTML report.
Coverage matrix
| Crate | Bench | What it measures | Run target |
|---|---|---|---|
nexo-resilience | circuit_breaker | CircuitBreaker::allow (closed + open), on_success, on_failure, 8-task concurrent allow contention | sub-100ns per call |
nexo-broker | topic_matches | NATS-style pattern matching (exact, single-wildcard *, multi-wildcard >, 50-pattern storm) | sub-100ns per match |
nexo-broker | local_publish | End-to-end LocalBroker::publish with 0 / 1 / 10 / 50 subscribers (DashMap scan + try_send + slow-consumer drop counter) | sub-10µs at 50 subs |
nexo-llm | sse_parsers | OpenAI / Anthropic / Gemini SSE parsers, 50-chunk fixtures (typical short answer) | chunks/sec scales linearly |
nexo-taskflow | tick | WaitEngine::tick at 10 / 100 / 1 000 active waiting flows | sub-millisecond at single-host scale |
What's NOT benched yet
These are tracked under Phase 35.5 follow-up:
nexo-coretranscripts FTS search — needs SQLite fixture seed before the bench is meaningful.nexo-coreredaction pipeline — wait for the local-LLM redaction backend (Phase 68.7) so we measure the real path operators ship.nexo-mcpencode_request/parse_notification_method— cheap to add; will land alongside an MCP-stdio round-trip bench.nexo-memoryvector-search recall — needs a public dataset baseline.
Add a bench by following the patterns in crates/<x>/benches/:
[dev-dependencies]addscriterion = "0.5"(withasync_tokioif you need a runtime).[[bench]]registersname = "<bench>"andharness = false.- Bench file uses
Throughput::Elements(N)so output is ops/sec, not rawns/iter. - Each
criterion_group!covers a distinct conceptual path — don't bundle unrelated paths.
CI integration
.github/workflows/bench.yml runs the matrix on:
- every PR that touches
crates/**,Cargo.lock, orCargo.toml - weekly on Sunday 04:00 UTC against
main - manual
workflow_dispatch
Each run uploads target/criterion/ as an artifact retained 30
days. PR runs save with --save-baseline pr-<number>; main runs
save as main. Compare locally with:
# Pull the artifact for PR #42
gh run download <run-id> --name bench-nexo-broker-<run-id>
# Compare against the local main baseline
cargo bench -p nexo-broker -- --baseline main
Today the CI job is informational — a regression doesn't
fail the PR. Once we have ~10 main runs of baseline data per
crate, the workflow gates on >10% regression per group. That's
Phase 35.6 done-criteria.
Known limitations
- GitHub Actions runners are noisy. The
ubuntu-latestshared runner tier shows ±5-10% variance on microbenchmarks. This is why we don't gate on small regressions yet — the baseline noise floor is itself ~5%. - Benches don't measure cold cache.
cargo bench's warm-up phase reaches steady-state CPU caches; first-call latency on a cold runtime is not captured. Add a separatebench_cold_*group when this matters (it usually doesn't — hot path is what matters at scale). - No cross-crate end-to-end benchmark yet. Phase 35.3 (load test rig) covers that; today's suites are per-crate microbenchmarks.
Reading criterion output
A typical run prints:
publish/mixed_50_subs time: [12.347 µs 12.451 µs 12.567 µs]
thrpt: [3.9786 Melem/s 4.0153 Melem/s 4.0494 Melem/s]
change: time: [-0.4% +0.3% +1.1%] (p = 0.62 > 0.05)
thrpt: [-1.1% -0.3% +0.4%]
No change in performance detected.
timeis the per-iteration latency (lower better).thrptis throughput (higher better) — only present when the bench declaredThroughput::Elements(N).changecompares against the previous run on the same hardware.p > 0.05means the difference is within noise.
Look for change reporting "Performance has regressed" with a
red bar — that's the signal a PR introduced a regression.
Backup + restore
Nexo state lives under NEXO_HOME (default ~/.nexo/ for native
installs, /var/lib/nexo-rs/ for the systemd package, /app/data/
in the Docker image). Backing it up + restoring it is the operator's
responsibility today; a proper nexo backup / nexo restore
subcommand is tracked under Phase 36.
Quickest path — scripts/nexo-backup.sh
The repo ships a shell script that does the right thing without stopping the daemon:
# Single-shot, output to ./
NEXO_HOME=/var/lib/nexo-rs sudo -E scripts/nexo-backup.sh
# Custom output dir, exclude secrets (default)
scripts/nexo-backup.sh --out /backups/
# Include secrets/ for full recovery (encrypt the archive yourself)
scripts/nexo-backup.sh --include-secrets
What it does:
- Hot snapshot every SQLite DB via
sqlite3 .backup— the official online-backup mechanism. Captures a consistent point-in-time image even with concurrent writers; no daemon stop required. - rsync non-DB state — JSONL transcripts, the agent
workspace-git dir if Phase 10.9 is enabled, any operator
files dropped under
NEXO_HOME. Skips*.tmp,*.lock, and thequeue/disk-queue dir (replays on next boot from NATS, no need to back up). secret/excluded by default. Re-run with--include-secretsto include them; encrypt the resulting tarball before transit (useage,gpg, or push to an encrypted bucket).- sha256 manifest at
MANIFEST.sha256inside the archive so restore can verify integrity. - zstd-19 compression — typical 10× ratio over raw SQLite.
- Sidecar
<archive>.sha256with the archive's outer hash so backup pipelines can detect transit corruption.
Restore
# Pull the archive locally first
scp ops@host:/backups/nexo-backup-20260426T121500Z.tar.zst .
# Extract
zstd -dc nexo-backup-20260426T121500Z.tar.zst | tar -xf -
# Verify the manifest
cd nexo-backup-20260426T121500Z
sha256sum -c MANIFEST.sha256
# Stop the daemon (state must not be mid-write)
sudo systemctl stop nexo-rs
# Replace state
sudo rsync -a --delete --chown=nexo:nexo \
./ /var/lib/nexo-rs/
# Start
sudo systemctl start nexo-rs
sudo journalctl -u nexo-rs -f
The daemon must be stopped during the rsync — SQLite WAL files do not survive a parallel-write replacement.
Cron schedule
Drop in /etc/cron.daily/nexo-backup:
#!/bin/sh
set -eu
ARCHIVE_DIR=/backups/nexo
mkdir -p "$ARCHIVE_DIR"
# Snapshot, retain locally
NEXO_HOME=/var/lib/nexo-rs \
/opt/nexo-rs/scripts/nexo-backup.sh --out "$ARCHIVE_DIR"
# Push to remote (Backblaze, S3, Wasabi, etc.)
rclone copy --include '*.tar.zst*' "$ARCHIVE_DIR" remote:nexo-backups/
# Retain 30 days locally + 90 days remote
find "$ARCHIVE_DIR" -name 'nexo-backup-*.tar.zst*' -mtime +30 -delete
rclone delete --min-age 90d remote:nexo-backups/
chmod +x /etc/cron.daily/nexo-backup. Single-host operators get
a tested daily backup pipeline in 6 lines.
What survives a backup
| Component | In backup | Notes |
|---|---|---|
| Long-term memory (vector + relational) | ✅ | memory.db |
| Transcripts | ✅ | transcripts/ JSONL + transcripts.db FTS |
| TaskFlow state | ✅ | taskflow.db |
| Pairing store + setup-code key | ⚠️ | DB included; key only with --include-secrets |
| LLM credentials | ⚠️ | secret/ only with --include-secrets |
| Per-agent SOUL.md + MEMORY.md | ✅ | rsync from workspace |
| Agent workspace git | ✅ | full .git dir included if Phase 10.9 is on |
| Disk-queue (NATS replay buffer) | ❌ | regenerates from NATS on boot |
| Process logs | ❌ | journalctl handles those separately |
Migrations
Schema migrations across Nexo versions are still ad-hoc — ALTER TABLE … .ok() patterns inside the runtime. Phase 36 adds:
nexo migrate status— show the applied vs available migration setnexo migrate up [target]— apply pending migrations forwardnexo migrate down [target]— roll back if a release ships reversible migrations- A
migrations/dir with versioned, checksummed SQL files
Until then, pin to a specific Nexo version per deployment and test upgrades on a copy of the backup before applying to production.
Status
Tracked as Phase 36 — Backup, restore, migrations.
| Sub-phase | Status |
|---|---|
scripts/nexo-backup.sh shell bridge | ✅ shipped |
| Operator doc (this page) | ✅ shipped |
nexo backup --out <dir> subcommand | ⬜ deferred |
nexo restore --from <archive> subcommand | ⬜ deferred |
nexo migrate up/down/status versioned migrations | ⬜ deferred |
| Encrypted archive output (age / gpg) | ⬜ deferred |
| CI test that backup → restore round-trips on a fixture | ⬜ deferred |
The shell script + this doc are the bridge. Once the runtime subcommands ship, this page rewrites to point at them and the script gets retired.
Privacy toolkit
GDPR-style operator workflows for handling user data requests until
the proper nexo forget / nexo export-user subcommands ship
(tracked under Phase 50).
Right to be forgotten
scripts/nexo-forget-user.sh does cascading delete across every
SQLite DB and JSONL transcript under NEXO_HOME, then VACUUMs
the databases so the deleted rows don't survive in free pages.
# Stop the daemon first — SQLite WAL doesn't survive parallel writes
sudo systemctl stop nexo-rs
# DRY RUN — shows what would be deleted, doesn't change anything
NEXO_HOME=/var/lib/nexo-rs sudo -E scripts/nexo-forget-user.sh \
--id "+5491155556666"
# When the dry-run looks right, re-run with --apply
NEXO_HOME=/var/lib/nexo-rs sudo -E scripts/nexo-forget-user.sh \
--id "+5491155556666" \
--apply
# Restart
sudo systemctl start nexo-rs
What gets deleted (cascading across all DBs):
| Table column | Match | Source DB |
|---|---|---|
user_id | exact | every DB |
sender_id | exact | every DB (used in pairing, transcripts) |
account_id | exact | every DB (used in WA / TG plugins) |
contact_id | exact | memory + transcripts |
peer_id | exact | agent-to-agent routing |
Plus JSONL transcript lines where any of those keys equals the target id.
The script emits forget-user-<id>-<timestamp>.json with the
exact deletion counts — this is the operator's GDPR audit
trail, ship it back to the requester as proof of compliance.
--keep-audit flag
Strict GDPR says even the admin-audit row recording the deletion
should be removed (the user has the right to no trace). But that
breaks operator audit chains. Use --keep-audit to opt out of
that single specific erasure:
nexo-forget-user.sh --id "<id>" --apply --keep-audit
The script keeps the admin_audit table row showing that the
deletion happened (without the user-id field, which is hashed).
Other tables fully wiped either way.
Right to data export
Until nexo export-user --id <id> ships, manual SQL works:
USER_ID="+5491155556666"
OUT_DIR="export-${USER_ID}-$(date -u +%Y%m%dT%H%M%SZ)"
mkdir -p "$OUT_DIR"
# Stop the daemon for a consistent point-in-time export
sudo systemctl stop nexo-rs
# Per-DB extraction
for db in /var/lib/nexo-rs/*.db; do
name=$(basename "$db" .db)
sqlite3 "$db" \
".headers on" \
".mode json" \
".output $OUT_DIR/${name}.json" \
"SELECT * FROM ($(sqlite3 "$db" '
SELECT GROUP_CONCAT(
\"SELECT '\" || name || \"' AS table_name, * FROM \" || name ||
\" WHERE user_id = '\" || ? || \"' OR sender_id = '\" || ? || \"' OR account_id = '\" || ? || \"'\",
\" UNION ALL \"
)
FROM sqlite_master m
WHERE m.type='table'
AND EXISTS (
SELECT 1 FROM pragma_table_info(m.name) p
WHERE p.name IN ('user_id','sender_id','account_id')
)
'))" -- "$USER_ID" "$USER_ID" "$USER_ID"
done
# Per-JSONL extraction
for f in /var/lib/nexo-rs/transcripts/*.jsonl; do
name=$(basename "$f")
jq -c \
--arg id "$USER_ID" \
'select((.user_id // .sender_id // .account_id // "") == $id)' \
"$f" > "$OUT_DIR/$name"
done
# Restart
sudo systemctl start nexo-rs
# Tar + zstd, optionally encrypt
tar -C "$(dirname "$OUT_DIR")" -cf - "$(basename "$OUT_DIR")" | \
zstd -19 -T0 > "${OUT_DIR}.tar.zst"
# (Recommended) age-encrypt before transit
age -r age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \
-o "${OUT_DIR}.tar.zst.age" \
"${OUT_DIR}.tar.zst"
shred -u "${OUT_DIR}.tar.zst"
The result is a tarball the operator hands to the requester —
JSON files per DB + filtered transcript JSONLs — encrypted with
the requester's age public key.
When nexo export-user --id <id> ships, this whole shell pipeline
collapses into one command with built-in encryption.
Retention policy
Operator-defined per deployment. Recommended defaults:
| Surface | Retention | Why |
|---|---|---|
| Transcripts | 90 days | Enough for ops debugging + agent recall |
| Memory (long-term) | indefinite | Agent's working memory; pruned by recall signals |
| TaskFlow finished flows | 30 days | Audit trail for completed work |
| TaskFlow failed flows | 365 days | Forensics |
| Admin audit log | 365 days | Compliance |
| Disk-queue (NATS replay) | 7 days | Disaster recovery |
| Pairing pending requests | 60 min | TTL-enforced by the store |
Apply via cron (until nexo retention apply ships):
# /etc/cron.daily/nexo-retention
#!/bin/sh
set -eu
DB=/var/lib/nexo-rs/transcripts.db
# 90-day rolling window on transcripts
sqlite3 "$DB" "DELETE FROM transcripts
WHERE timestamp < strftime('%s', 'now', '-90 days');"
sqlite3 "$DB" 'VACUUM;'
# Same for taskflow finished + failed
DB=/var/lib/nexo-rs/taskflow.db
sqlite3 "$DB" "DELETE FROM flows
WHERE status='Finished'
AND finished_at < datetime('now', '-30 days');"
sqlite3 "$DB" "DELETE FROM flows
WHERE status='Failed'
AND finished_at < datetime('now', '-365 days');"
PII detection (deferred)
Phase 50 plans inbound PII flagging — separate from the existing outbound redactor. The rough shape:
- Regex pre-screen for SSN-shape, credit-card-shape (Luhn-checked), phone-number-shape per locale.
- Optional LLM-backed second-pass via the future Phase 68 local tier (gemma3-270m).
- Hits land in
data/pii-flags.jsonlfor operator review; agent dialog continues unimpeded.
Today: nothing automated. The outbound redactor in
crates/core/src/redaction.rs (regex-based) catches the obvious
shapes before they reach long-term memory or the LLM, but doesn't
emit a queue for operator review.
Encryption at rest
Two roads, both deferred to Phase 50.x:
- Application-level —
sqlcipherbuild oflibsqlite3-syswith a key fed fromsecrets/. Every page encrypted; backups need the same key to restore. - Filesystem-level —
dm-crypt/ LUKS on the volume hostingNEXO_HOME. Operator does it once at provision, no Nexo changes required.
The native install + Hetzner / Fly recipes assume filesystem-level
crypto handled by the host (LUKS on Hetzner, encrypted EBS on AWS,
Fly volumes are encrypted at rest by default). When sqlcipher is
ready we'll document switching tiers.
Status
| Capability | Status |
|---|---|
scripts/nexo-forget-user.sh cascading delete | ✅ shipped |
| Operator data-export shell pipeline (above) | ✅ documented |
| Retention policy + cron template | ✅ documented |
nexo forget --user <id> subcommand | ⬜ deferred |
nexo export-user --id <id> subcommand | ⬜ deferred |
| Inbound PII detection + review queue | ⬜ deferred |
sqlcipher encryption at rest | ⬜ deferred |
| Admin-action audit log (separate from this script's manifest) | ⬜ deferred |
Tracked as Phase 50 — Privacy toolkit.
Health checks
Three layers of health probes for a Nexo deployment, each tuned for a different consumer:
/health— liveness. Cheap (atomic flag check). HTTP 200 means the process is up; doesn't guarantee it can serve work./ready— readiness. Expensive (verifies broker connection, agents loaded, snapshot warm). HTTP 200 means the runtime can accept inbound traffic. Use this for load-balancer health checks.scripts/nexo-health.sh— operator + monitoring. JSON summary with counter snapshots. Bridge untilnexo doctor health(Phase 44) ships.
Liveness — /health
Returns HTTP 200 + ok body when the agent process is alive.
The runtime sets a RUNNING flag at startup and clears it on
graceful shutdown. Does not verify any subsystem — useful
for "is the daemon there at all" probes.
curl -fsSL http://127.0.0.1:8080/health
# ok
Kubernetes liveness probe:
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
A failing liveness probe should restart the container. Be
generous on initialDelaySeconds — first-boot extension
discovery + memory open + agent runtime spin-up can take 15-25s.
Readiness — /ready
Returns 200 only when all of:
- Broker (NATS or local) is reachable
- Every configured agent has loaded its tool registry
- The hot-reload snapshot has been warmed (Phase 18)
- Pairing store is open (if
pairing_policy.auto_challengeis on)
Returns 503 with a JSON body listing the failing subsystem otherwise:
{
"ready": false,
"reasons": [
{"subsystem": "broker", "detail": "nats://localhost:4222: connection refused"}
]
}
Use this for load-balancer / service-mesh routing decisions.
A node that's live but not ready should not receive
traffic.
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 1
Operator one-shot — scripts/nexo-health.sh
Single-shot JSON summary intended for watch -n 5 nexo-health.sh during ops, cron health-mailers, and uptime
monitors that want one structured payload covering everything.
# Default — pretty human output
scripts/nexo-health.sh
# JSON only (cron, monitoring scrapers)
scripts/nexo-health.sh --json
# Custom hosts (e.g., probing through a service mesh)
scripts/nexo-health.sh --host nexo.internal:8080 \
--metrics-host nexo.internal:9090
# Strict mode — open circuit breaker counts as unhealthy.
# Default mode tolerates breaker-open (degraded-but-up).
scripts/nexo-health.sh --strict
Pretty output:
============================================================
nexo-rs health · 2026-04-26T15:30:00Z
============================================================
overall: ok
admin: 127.0.0.1:8080
metrics: 127.0.0.1:9090
probes:
✓ live ok
✓ ready ok
✓ metrics ok
counters:
tool_calls_total 4711
llm_stream_chunks_total 28391
web_search_breaker_open_total 0
JSON shape (for monitoring scrapers):
{
"overall": "ok",
"timestamp": "2026-04-26T15:30:00Z",
"endpoints": { "admin": "127.0.0.1:8080", "metrics": "127.0.0.1:9090" },
"probes": [
{"name": "live", "status": "ok", "detail": "ok"},
{"name": "ready", "status": "ok", "detail": "{...}"},
{"name": "metrics", "status": "ok", "detail": "# HELP nexo_..."}
],
"counters": {
"tool_calls_total": 4711,
"llm_stream_chunks_total": 28391,
"web_search_breaker_open_total": 0
}
}
Exit codes:
0— overall healthy1— at least one probe failed (or--strictand a breaker is open)
Cron health mailer
# /etc/cron.d/nexo-health
*/5 * * * * nexo /opt/nexo-rs/scripts/nexo-health.sh --json --strict \
>> /var/log/nexo-rs/health.jsonl 2>&1 \
|| (tail -1 /var/log/nexo-rs/health.jsonl | mail -s "nexo unhealthy" ops@yourorg)
Five-minute resolution, one line of JSONL per check, mail on failure.
Uptime monitor integration
UptimeRobot / BetterStack / Pingdom:
URL: https://nexo.example.com/ready
Interval: 60s
Timeout: 5s
Expected: HTTP 200
That's all most monitors need. The JSON body of /ready
explains the failure when the alert fires.
What nexo-health.sh adds beyond /ready
| Signal | /ready | nexo-health.sh |
|---|---|---|
| Process up + accepting traffic | ✅ | ✅ |
| Counter snapshot (tool calls, LLM chunks) | ❌ | ✅ |
| Web-search breaker state | ❌ | ✅ |
| Single JSON payload | ❌ (HTTP 200/503) | ✅ |
| Suitable for HTTP probe | ✅ | ❌ (shells out) |
Use /ready for the orchestrator. Use nexo-health.sh for the
operator's eyeballs and the alerting pipeline.
Status
Tracked as Phase 44 — Auxiliary observability surfaces.
| Capability | Status |
|---|---|
/health liveness endpoint | ✅ shipped (Phase 9) |
/ready readiness endpoint | ✅ shipped (Phase 9) |
scripts/nexo-health.sh operator one-shot | ✅ shipped |
| Operator runbook (this page) | ✅ shipped |
nexo doctor health aggregating subcommand | ⬜ deferred |
nexo inspect <session_id> state-transition pretty-print | ⬜ deferred |
Per-session structured event log under data/events/ | ⬜ deferred |
Cost & quota controls
Operator runbook for tracking + capping LLM spend. Today the
runtime emits enough Prometheus metrics for an operator to build
their own picture; the proper nexo costs subcommand + budget
caps land in Phase 45.
Estimating spend — scripts/nexo-cost-report.sh
Aggregates nexo_llm_stream_chunks_total by provider, multiplies
by a price table, prints (or emits JSON) per-provider rolling
totals.
# Human-readable report against the local /metrics endpoint
scripts/nexo-cost-report.sh
# JSON for monitoring / dashboards
scripts/nexo-cost-report.sh --json
# Custom price table (your negotiated enterprise rates)
scripts/nexo-cost-report.sh --prices ~/our-enterprise-rates.tsv
# Probe a remote daemon
scripts/nexo-cost-report.sh --metrics-host nexo.internal:9090
Pretty output:
============================================================
nexo-rs cost report · 2026-04-26T15:30:00Z
============================================================
PROVIDER CHUNKS EST_TOKENS EST_USD
anthropic 28391 85173 $0.7666
minimax 4711 14133 $0.0042
ollama 1208 3624 $0.0000
total estimated: $0.7708
disclaimer: heuristic estimate. Calibrate
NEXO_TOKENS_PER_CHUNK once you have a measured baseline.
Calibration
The default tokens-per-chunk = 3 is a heuristic. To get an
accurate number for your deployment:
- Find a typical conversation in transcripts (
session_logstool output). - Sum the
usage.total_tokensfrom thechat.completionend event(s). - Divide by the total chunk count emitted during that
conversation (visible in
nexo_llm_stream_chunks_total{provider="...",kind="text_delta"}). - Set
NEXO_TOKENS_PER_CHUNKenv to the result.
Example:
# Anthropic typical: 4-token granularity per delta
NEXO_TOKENS_PER_CHUNK=4 scripts/nexo-cost-report.sh
# OpenAI typical: 1 token per delta on streaming
NEXO_TOKENS_PER_CHUNK=1 scripts/nexo-cost-report.sh
When the runtime ships nexo_llm_tokens_total{provider,model,direction}
(Phase 45 deliverable), the heuristic is replaced by direct token
counts and the calibration step disappears.
Built-in price table
| Provider | Model | $/1M in | $/1M out |
|---|---|---|---|
| anthropic | claude-opus-4 | 15.00 | 75.00 |
| anthropic | claude-sonnet-4 | 3.00 | 15.00 |
| anthropic | claude-haiku-4 | 0.80 | 4.00 |
| openai | gpt-4o | 2.50 | 10.00 |
| openai | gpt-4o-mini | 0.15 | 0.60 |
| minimax | abab6.5s | 0.20 | 0.60 |
| minimax | M2.5 | 0.30 | 1.50 |
| gemini | gemini-1.5-pro | 1.25 | 5.00 |
| gemini | gemini-1.5-flash | 0.075 | 0.30 |
| deepseek | deepseek-chat | 0.14 | 0.28 |
| ollama | * | 0.00 | 0.00 |
These are public list prices as of 2026-04. Operators with
enterprise contracts override via --prices:
provider model in_per_1m out_per_1m
anthropic claude-sonnet-4 2.40 12.00
openai gpt-4o 2.00 8.00
(One row per provider×model. * model = applies to any model
from that provider.)
Daily budget alerts via cron
Snapshot every 24h, mail the operator if estimated spend > cap:
# /etc/cron.daily/nexo-cost-alert
#!/bin/sh
set -eu
CAP=10.00 # $/day soft cap
REPORT=$(/opt/nexo-rs/scripts/nexo-cost-report.sh --json)
TOTAL=$(echo "$REPORT" | jq -r '.total_estimated_usd')
if awk -v t="$TOTAL" -v c="$CAP" 'BEGIN { exit !(t > c) }'; then
echo "$REPORT" | mail -s "nexo daily spend over \$$CAP: \$$TOTAL" \
ops@yourorg.com
fi
This is alerting only, not enforcement — the runtime keeps serving traffic. For hard caps, wait for Phase 45.
Hard quota caps (deferred)
Phase 45 ships per-agent monthly budget caps:
# config/agents.yaml — once 45.x lands
agents:
- id: kate
cost_cap_usd:
monthly: 50.00
daily: 5.00
action: refuse_new_turns # or: warn_only, throttle
warn_topic: alerts.kate.budget
When hit:
refuse_new_turns— agent returns a fixed response ("I've reached my budget for the period; please ask the operator to extend.") to every new inbound. Existing in-flight turns finish.warn_only— log + telemetry but keep serving.throttle— switch to a cheaper model variant (claude-haiku-4instead ofclaude-opus-4) for the rest of the period.
Per-binding token rate limits (e.g. "WhatsApp sales binding
capped at 5k tokens/hour") layer on top of the existing
sender_rate_limit. Phase 45.x.
Inspecting the metrics directly
If the script is too coarse:
# Top providers by total chunks (last 5m rate)
curl -sS http://127.0.0.1:9090/metrics | \
awk '/^nexo_llm_stream_chunks_total/{gsub(/.*provider="/, "", $1); gsub(/".*/, "", $1); n[$1]+=$2} END{for (p in n) print n[p], p}' | \
sort -rn
# TTFT p95 by provider (curl + jq if you have promtool):
promtool query instant http://127.0.0.1:9090 \
'histogram_quantile(0.95, sum by (provider, le) (rate(nexo_llm_stream_ttft_seconds_bucket[5m])))'
The full metric inventory lives in
Grafana dashboards → metric coverage
(in repo as ops/grafana/README.md).
Status
Tracked as Phase 45 — Cost & quota controls.
| Capability | Status |
|---|---|
scripts/nexo-cost-report.sh heuristic estimator | ✅ shipped |
| Operator runbook (this page) | ✅ shipped |
nexo_llm_tokens_total{provider,model,direction} metric | ⬜ deferred |
| Per-agent monthly budget cap (config + enforcement) | ⬜ deferred |
agents.<id>.cost_cap_usd schema | ⬜ deferred |
| Per-binding token rate limit | ⬜ deferred |
| Pre-flight token-count predictor in agent prompt | ⬜ deferred |
nexo costs CLI rolling 24h/7d/30d aggregator | ⬜ deferred |
/api/costs admin endpoint | ⬜ deferred |
Recipes
End-to-end walkthroughs that wire multiple subsystems together. Each recipe runs against a clean checkout of nexo-rs — prerequisites are at the top.
| Recipe | What you build |
|---|---|
| WhatsApp sales agent | A drop-in agent that greets WhatsApp leads, asks qualifying questions, and notifies a human on hot leads. |
| Agent-to-agent delegation | Route work from one agent to another using agent.route.* with correlation ids. |
| Python extension | Write a stdlib-only extension that adds a custom tool to any agent. |
| MCP server from Claude Desktop | Expose the agent's tools to the Anthropic desktop client. |
| NATS with TLS + auth | Harden the broker for a multi-node deployment. |
| Rotating config without downtime | Three Phase 18 hot-reload scenarios: API key rotation, A/B prompt swap, narrowing an outbound allowlist mid-incident. |
If a recipe drifts from reality, open an issue — it means the docs didn't get updated alongside a code change.
WhatsApp sales agent
Build a drop-in agent that handles a sales line on WhatsApp:
- Greets the lead with the right operator (ETB / Claro / generic)
- Qualifies via a short scripted flow (address, package, budget)
- Notifies a human on hot leads, narrows the tool surface so the LLM only ever sees the lead-notification tool
This is the production shape of the shipped ana agent.
Prerequisites
agentbuilt (cargo build --release)- NATS running (
docker run -p 4222:4222 nats:2.10-alpine) - A MiniMax M2.5 key
- A phone with WhatsApp ready to scan a QR
1. Provide the LLM key
export MINIMAX_API_KEY=...
export MINIMAX_GROUP_ID=...
2. Create a gitignored agent file
config/agents.d/ana.yaml is gitignored; put the business-sensitive
content there.
agents:
- id: ana
model:
provider: minimax
model: MiniMax-M2.5
plugins: [whatsapp]
inbound_bindings:
- plugin: whatsapp
allowed_tools:
- notify_lead # only this tool is visible
outbound_allowlist:
whatsapp:
- "573000000000@s.whatsapp.net" # human advisor's WA
workspace: ./data/workspace/ana
workspace_git:
enabled: true
heartbeat:
enabled: false
system_prompt: |
You are Ana, a sales advisor for ETB and Claro. Help customers
choose the best internet, TV, and phone package.
On the first incoming message:
- If it contains "etb" -> route directly to the ETB flow.
- If it contains "claro" -> route directly to the Claro flow.
- Otherwise, ask which operator they prefer.
Capture: name, address, socioeconomic stratum, preferred package
(internet only / internet+TV / triple play).
When the lead is ready, invoke `notify_lead` with JSON containing:
{name, phone, address, operator, package, notes}. Do not call any
other tool — this is your only tool.
3. Pair WhatsApp for this agent
./target/release/agent setup whatsapp
The wizard creates ./data/workspace/ana/whatsapp/default/, flips
config/plugins/whatsapp.yaml::whatsapp.session_dir to point at it,
and renders a QR. Scan from the WhatsApp app.
4. Ship the notify_lead tool as an extension
Copy the Rust template and rename:
cp -r extensions/template-rust extensions/notify-lead
cd extensions/notify-lead
Edit plugin.toml:
[plugin]
id = "notify-lead"
version = "0.1.0"
[capabilities]
tools = ["notify_lead"]
[transport]
type = "stdio"
command = "./target/release/notify-lead"
Implement tools/notify_lead in src/main.rs — it should publish
to plugin.outbound.whatsapp.default with a recipient = the human
advisor number you listed in outbound_allowlist.
Build and install:
cargo build --release
cd ../..
./target/release/agent ext install ./extensions/notify-lead --link --enable
./target/release/agent ext doctor --runtime
5. Run
./target/release/agent --config ./config
Flow diagram
sequenceDiagram
participant U as Lead
participant WA as WhatsApp
participant N as NATS
participant A as Ana
participant H as Human advisor
U->>WA: "Hi, I want internet service"
WA->>N: plugin.inbound.whatsapp
N->>A: deliver
A->>A: qualify (address, package)
A->>A: invoke notify_lead(json)
A->>N: plugin.outbound.whatsapp (advisor number)
N->>WA: deliver
WA->>H: "🚨 New lead — Luis, 573111111111, triple play"
Why this shape works
allowed_tools: [notify_lead]prevents the LLM from hallucinating other actions — the model literally cannot see other tools.outbound_allowlist.whatsappis defense-in-depth: even if the LLM crafts a send to an unexpected number, the runtime rejects it.workspace_git.enabled: truelets you audit what Ana remembered over time viamemory_history— useful for reviewing tough calls.- Gitignored
agents.d/ana.yamlkeeps tarifarios and business content out of the public repo.
Testing
- Open WhatsApp on a second phone and send "hi, ETB"
- Watch
agent status anafor session activity - Watch
docker compose logs agent | jq 'select(.agent == "ana")'for turn-by-turn reasoning
Cross-links
Agent-to-agent delegation
Route work from one agent to another using agent.route.<target_id>
with a correlation id. Typical shapes:
- Kate delegates research to
opsand waits for the reply - Ana fans out lead data to
crm-bot,ticket-bot, andlogger - A supervisor agent orchestrates specialist subagents
Prerequisites
- Two agents configured in
config/agents.yaml(and/oragents.d/) - NATS running
- Either agent can be the caller or callee; the topology is symmetric
Agent config
agents:
- id: kate
model: { provider: minimax, model: MiniMax-M2.5 }
plugins: [telegram]
inbound_bindings: [{ plugin: telegram }]
allowed_delegates: [ops, crm-bot]
description: "Personal assistant; delegates research to ops."
- id: ops
model: { provider: minimax, model: MiniMax-M2.5 }
accept_delegates_from: [kate]
description: "Operations agent; answers factual questions about systems."
Key fields:
allowed_delegates(on the caller) — globs of peer ids this agent may route to. Empty = no restriction.accept_delegates_from(on the callee) — inverse gate. Empty = no restriction.description— injected into both sides'# PEERSblock so the LLM knows who can do what.
Both gates are glob lists and can be set on either side or both.
Wire shape
sequenceDiagram
participant K as Kate
participant B as NATS
participant O as Ops
Note over K: LLM decides to delegate
K->>B: publish agent.route.ops<br/>{correlation_id: "req-abc", body: "what's the latest DB migration status?"}
B->>O: deliver
O->>O: on_message + LLM turn
O->>B: publish agent.route.kate<br/>{correlation_id: "req-abc", body: "migration 0042 is running..."}
B->>K: deliver
K->>K: correlate reply by req-abc
Correlation ids are caller-chosen strings. The callee echoes the id back on the reply; the caller uses it to match replies to requests (especially for fan-out + reassemble patterns).
Using the delegate tool
The runtime exposes a delegate tool whenever allowed_delegates is
non-empty. LLM call shape:
{
"name": "delegate",
"args": {
"to": "ops",
"body": "what's the latest DB migration status?"
}
}
The runtime:
- Generates a fresh
correlation_id - Publishes to
agent.route.opswith that id - Waits (bounded) for the reply on
agent.route.kate - Returns the body as the tool result
Timeouts and retry policy match the broker defaults — the circuit breaker on the target topic protects against an unreachable callee.
Fan-out
To fan out to multiple peers, the LLM can issue several delegate
calls in one turn. The runtime issues each with a unique
correlation_id and gathers the replies in parallel.
Guardrails
- Self-delegation is rejected at the manager level.
- Unknown target id → tool returns an error result, no broker traffic.
allowed_delegatesempty + no constraint means the agent can delegate to any peer — prefer an explicit list in production.
Observability
Every delegation emits two log lines (dispatch + reply) with structured fields:
{"agent": "kate", "target": "ops", "correlation_id": "...", "event": "delegate_dispatch"}
{"agent": "kate", "target": "ops", "correlation_id": "...", "event": "delegate_reply", "latency_ms": 1342}
Filter on correlation_id to trace a single delegation end to end.
Cross-links
Python extension
Ship a custom tool written in Python — no dependencies beyond stdlib. The agent spawns your script, handshakes with it over stdin/stdout, and exposes your tool to the LLM.
Prerequisites
python3on the host$PATH- A running nexo-rs install with
extensions.enabled: true
1. Copy the template
cp -r extensions/template-python extensions/word-count
cd extensions/word-count
2. Edit plugin.toml
[plugin]
id = "word-count"
version = "0.1.0"
description = "Count words in a piece of text."
priority = 0
[capabilities]
tools = ["count_words"]
[transport]
type = "stdio"
command = "python3"
args = ["./main.py"]
[requires]
bins = ["python3"]
[meta]
license = "MIT OR Apache-2.0"
[requires] bins = ["python3"] gates the extension: if Python
isn't on $PATH, the runtime skips the extension with a warn log
instead of crash-looping.
3. Write main.py
#!/usr/bin/env python3
import sys, json
def reply(id, result=None, error=None):
msg = {"jsonrpc": "2.0", "id": id}
if error is None:
msg["result"] = result
else:
msg["error"] = error
sys.stdout.write(json.dumps(msg) + "\n")
sys.stdout.flush()
def log(*args):
print(*args, file=sys.stderr, flush=True)
HANDSHAKE = {
"server_version": "0.1.0",
"tools": [{
"name": "count_words",
"description": "Count whitespace-separated words in a string.",
"input_schema": {
"type": "object",
"properties": {"text": {"type": "string"}},
"required": ["text"]
}
}],
"hooks": []
}
def main():
log("word-count starting")
for line in sys.stdin:
try:
req = json.loads(line)
except json.JSONDecodeError:
continue
method = req.get("method", "")
rid = req.get("id")
if method == "initialize":
reply(rid, HANDSHAKE)
elif method == "tools/count_words":
params = req.get("params", {}) or {}
text = params.get("text", "")
count = len(text.split())
reply(rid, {"count": count})
else:
reply(rid, error={"code": -32601, "message": f"unknown method: {method}"})
if __name__ == "__main__":
main()
Make it executable:
chmod +x main.py
4. Validate and install
cd ../..
./target/release/agent ext validate ./extensions/word-count/plugin.toml
./target/release/agent ext install ./extensions/word-count --link --enable
./target/release/agent ext doctor --runtime
--link creates a symlink instead of a copy — good for the
edit-test loop. doctor --runtime actually spawns the extension
and runs the handshake, so a Python error that kills the interpreter
during init surfaces here rather than in production logs.
5. Allow the tool per agent
The registered tool name is ext_word-count_count_words. Add it to
the right agent's allowed_tools (or use a glob):
agents:
- id: kate
allowed_tools:
- ext_word-count_*
# ...
6. Run
./target/release/agent --config ./config
Send a message that would prompt the LLM to use the tool; watch
the logs for tools/count_words on stderr.
Debugging
- stderr of the Python process is forwarded to the agent's log
pipeline.
print(..., file=sys.stderr)lines show up in the agent's tracing output with theextension=word-countfield. - Handshake failures are visible in
ext doctor --runtimeand prevent the tool from being registered at all. - Per-tool latency shows up in the
nexo_tool_latency_ms{tool="ext_word-count_count_words"}Prometheus histogram.
Productionizing
- Pin
commandto an absolute path or a virtualenv-local interpreter;python3on$PATHmay vary across hosts. - Pick your dependency strategy carefully — the template is stdlib
only. If you need
requestsor similar, ship arequirements.txt- bootstrap script, or switch to the Rust template.
- If the extension holds a connection to a remote service, add a heartbeat loop so you can detect liveness.
- For long-running tool calls,
printstatus events to stderr — they become structured log entries and help debug hung tools.
Cross-links
MCP server from Claude Desktop
Expose nexo-rs tools (memory, Gmail, WhatsApp send, browser, etc.) to the Anthropic desktop app so your agent-sandboxed capabilities show up inside Claude conversations.
Same technique works for Cursor, Zed, and anything else that speaks MCP — the config shape is identical.
Prerequisites
- Built
agentbinary at a known path (e.g./usr/local/bin/agent) - A working
config/directory (reuse the one your daemon normally uses, or point at a dedicated one) - Anthropic API key (or OAuth bundle) configured for the agent
1. Enable the MCP server
config/mcp_server.yaml:
enabled: true
name: nexo
allowlist:
- memory_* # recall + store + history
- forge_memory_checkpoint
- google_* # if you paired Google OAuth
- browser_* # if you want Claude to drive Chrome
expose_proxies: false # hide ext_* and mcp_* from the IDE
auth_token_env: "" # leave empty for local spawn; set if tunneling
Pick the smallest allowlist that covers what you want the IDE to do. Each glob is power you're handing the IDE user.
2. Wire Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json
(macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"nexo": {
"command": "/usr/local/bin/agent",
"args": ["mcp-server", "--config", "/srv/nexo-rs/config"],
"env": {
"RUST_LOG": "info",
"AGENT_LOG_FORMAT": "json"
}
}
}
}
Restart Claude Desktop. The nexo block should appear in the tool
picker; pick tools from it the same way you pick built-ins.
3. Verify
Ask Claude: "use the nexo tool my_stats and show me the output."
If it works, Claude calls agent mcp-server as a subprocess, which
emits JSON-RPC over stdin/stdout. Logs hit Claude's app-level log
file plus stderr of the spawned agent (configurable via
AGENT_LOG_FORMAT=json).
Wire shape
sequenceDiagram
participant CD as Claude Desktop
participant A as agent mcp-server
participant TR as ToolRegistry
participant MEM as Memory tool
participant LTM as SQLite
CD->>A: spawn subprocess
CD->>A: initialize
A-->>CD: {capabilities: {tools}}
CD->>A: notifications/initialized
CD->>A: tools/list
A->>TR: enumerate (allowlist-filtered)
TR-->>A: tool defs
A-->>CD: [memory_recall, memory_store, …]
CD->>A: tools/call {name: memory_recall, args: {query: "..."}}
A->>MEM: invoke
MEM->>LTM: SELECT ...
LTM-->>MEM: rows
MEM-->>A: result
A-->>CD: content
Recipes within the recipe
Recall my cross-session memory from Claude
Allowlist:
allowlist:
- memory_recall
- memory_history
Now inside a Claude conversation: "recall what I told you about
Luis's address last week." Claude calls memory_recall on your
agent's SQLite — Claude itself has no persistent memory; this is how
you give it one.
Post to WhatsApp from Claude
Allowlist:
allowlist:
- whatsapp_send_message
⚠ Be careful. This gives whoever sits at the IDE the ability to send WhatsApp messages from your paired account. Only enable if you trust the IDE user as much as you'd trust the agent.
Read-only Gmail from Claude
Allowlist:
allowlist:
- google_auth_status
- google_call
Pair with GOOGLE_ALLOW_SEND= (unset) to keep the google_call
tool read-only.
Auth token
If you expose the MCP server over a tunnel (not a local spawn), set
auth_token_env to guard the initialize call:
auth_token_env: NEXO_MCP_TOKEN
Then set NEXO_MCP_TOKEN in the agent's env and have the client
send it on initialize. Clients that don't present the token are
rejected.
Gotchas
expose_proxies: truetransitively exposes every upstream MCP server. If the agent already consumes a Gmail MCP server, turning this on lets Claude reach through — usually not what you want.- Allowlist globs match whole tool names.
memory_*is OK;mem*is not — enumerate withagent ext listand real tool names before wiring globs. - Rate limits still apply.
whatsapp_send_messagethrough this path counts against the same WhatsApp rate bucket as the agent's own uses.
Cross-links
NATS with TLS + auth
Harden the broker for a multi-node deployment: mTLS on the client connection, NKey-based authentication, and a separate NATS server process (not the throwaway Docker-compose one).
Prerequisites
- A NATS server ≥ 2.10
nscCLI for generating NKeys- The agent binary deployed where it will run
1. Generate NKeys
nsc add operator --generate-signing-key nexo-ops
nsc add account --name nexo-prod
nsc add user --name agent-kate --account nexo-prod
nsc generate creds --account nexo-prod --name agent-kate > secrets/agent-kate.nkey
secrets/agent-kate.nkey is a single-file credential that contains
both the NKey seed and the signed JWT. Treat it like any other
secret — gitignored, Docker-secret, k8s-secret.
2. Configure the NATS server
nats-server.conf:
listen: 0.0.0.0:4222
http: 0.0.0.0:8222
tls {
cert_file: "/etc/nats/tls/server.crt"
key_file: "/etc/nats/tls/server.key"
ca_file: "/etc/nats/tls/ca.crt"
verify: true # require client certs too (mTLS)
}
authorization {
operator = "/etc/nats/nsc/operator.jwt"
resolver = MEMORY
accounts = [
{ name: nexo-prod, jwt: "/etc/nats/nsc/nexo-prod.jwt" }
]
}
Start the server:
nats-server -c nats-server.conf
3. Configure the agent
config/broker.yaml:
broker:
type: nats
url: tls://nats.example.com:4222
auth:
enabled: true
nkey_file: ./secrets/agent-kate.nkey
persistence:
enabled: true
path: ./data/queue
fallback:
mode: local_queue
drain_on_reconnect: true
The agent reads nkey_file at startup and presents it on every
connection.
4. Verify the client
Before starting the full agent, smoke-test the credentials with the
nats CLI:
nats --creds ./secrets/agent-kate.nkey \
--tlsca /etc/nats/tls/ca.crt \
-s tls://nats.example.com:4222 \
pub test.topic "hello"
If this works, the agent will too.
5. Deploy
Start the agent as usual:
agent --config ./config
On boot the agent:
- Opens a TLS connection to the broker
- Presents its NKey + JWT
- Server validates against the operator/account JWT
- Subscribes only to subjects its account is allowed to access
6. Multi-agent isolation
Give each agent its own NKey and an export/import declaration in the NSC account so agents can talk to each other on specific subjects only. Example policy:
# allow kate to publish agent.route.ops
# deny kate from publishing plugin.outbound.* (only the WA plugin should)
The agent does not enforce NATS auth itself — it just presents credentials. The broker enforces. That's the point: you can revoke a compromised agent without touching the agent's code or config.
Observability
circuit_breaker_state{breaker="nats"}flips to1if the broker rejects the credentials on startup or after a refreshdisk queuebuffers every publish while the circuit is open — see Event bus — disk queuenats --traceon the server side logs every auth failure with the rejected subject
Gotchas
verify: true(mTLS) requires client certs and NKey auth. Picking one or the other is a policy choice — don't half-configure.- JWT expiry. Account JWTs expire; NSC's
pushcommand renews them against the resolver. - Disk queue on client side. Even with auth misconfigured, the
agent keeps running on the local fallback; operators may miss the
outage without alerting on
circuit_breaker_state.
Cross-links
Rotating config without downtime
Three practical hot-reload scenarios. Each shows the YAML edit, how to trigger the swap, and what the operator should see in the logs and on the metrics endpoint. Reference: Config hot-reload.
Prerequisites
- A running daemon (
agentin another terminal or under systemd). - Broker reachable from the same host (
broker.yaml). - Phase 16 + Phase 18 features enabled (default since
0.xof nexo-rs).
A quick sanity check:
$ agent reload
reload v1: applied=1 rejected=0 elapsed=14ms
✓ ana
If you get exit 1 with "no control.reload.ack received within 5s",
the daemon isn't running or runtime.reload.enabled is false —
fix that first.
1. Rotate an LLM API key
The Anthropic key on production rotates every 90 days. Old key still valid for an hour after the rotation.
Edit
config/llm.yaml:
providers:
anthropic:
- api_key: ${file:./secrets/anthropic_old.txt}
+ api_key: ${file:./secrets/anthropic_new.txt}
base_url: https://api.anthropic.com
Apply
# Drop the new key first, THEN trigger the reload — the file watcher
# would also do it 500 ms after the save, the CLI is just explicit.
$ printf '%s' "sk-ant-..." > secrets/anthropic_new.txt
$ chmod 600 secrets/anthropic_new.txt
$ agent reload
reload v2: applied=2 rejected=0 elapsed=22ms
✓ ana
✓ bob
Verify
# The aggregate counter bumped:
$ curl -s localhost:9090/metrics | grep config_reload_applied_total
config_reload_applied_total 2
# Per-agent versions advanced:
$ curl -s localhost:9090/metrics | grep runtime_config_version
runtime_config_version{agent_id="ana"} 2
runtime_config_version{agent_id="bob"} 2
# Watch one agent's next turn — the new key is used by the LlmClient
# rebuilt inside RuntimeSnapshot::build:
$ tail -f agent.log | grep "llm request"
In-flight LLM calls keep using the old client (the in-flight Arc<dyn LlmClient> is captured per-turn). They land in <30 s; the old key is
still valid for the hour the auth team gave you.
2. A/B test a system prompt
You want to roll out a friendlier sales pitch on Ana's WhatsApp binding without touching the Telegram one (which has a longer support persona).
Edit
config/agents.d/ana.yaml:
inbound_bindings:
- plugin: whatsapp
allowed_tools: [whatsapp_send_message]
outbound_allowlist:
whatsapp: ["573115728852"]
- system_prompt_extra: |
- Channel: WhatsApp sales. Follow the ETB/Claro lead-capture flow.
+ system_prompt_extra: |
+ Channel: WhatsApp sales (variant B — warmer tone).
+ Follow the ETB/Claro lead-capture flow but lead with a personal
+ greeting and use first names.
- plugin: telegram
instance: ana_tg
allowed_tools: ["*"]
...
Apply
The file watcher picks the save up automatically:
$ tail -f agent.log
INFO config reload applied version=3 applied=["ana"] rejected_count=0 elapsed_ms=18
Or trigger manually:
$ agent reload
reload v3: applied=1 rejected=0 elapsed=18ms
✓ ana
Verify
Send one message on each channel and tail the LLM request log to see which prompt block went to the model.
$ grep "snapshot_version=3" agent.log
INFO inbound matched binding agent_id=ana plugin=whatsapp \
binding_index=0 snapshot_version=3
Telegram binding's system_prompt_extra is unchanged; only the WA binding picks up variant B.
Roll back
If variant B underperforms, git revert the YAML and agent reload.
Sessions in flight finish their turn on B; the next inbound is back
on A.
3. Tighten an outbound allowlist after an incident
A jailbroken prompt almost made Ana send WhatsApp messages to arbitrary numbers (Phase 16's defense-in-depth caught it). Until you investigate, narrow the allowlist to the on-call advisor only.
Edit
config/agents.d/ana.yaml:
inbound_bindings:
- plugin: whatsapp
allowed_tools: [whatsapp_send_message]
outbound_allowlist:
whatsapp:
- - "573115728852"
- - "573215555555"
- - "573009999999"
+ - "573115728852" # incident-only: on-call advisor
Apply
$ agent reload
reload v4: applied=1 rejected=0 elapsed=15ms
✓ ana
Verify
Try the previously-allowed-but-now-blocked number from a test message. The LLM will try; the tool will reject:
ERROR tool_call rejected reason="recipient 573215555555 is not in \
this agent's whatsapp outbound allowlist"
The session's Arc<RuntimeSnapshot> is captured at the start of each
turn, so even mid-conversation the next user reply re-loads from the
new snapshot and the allowlist update takes effect immediately.
What you cannot reload (yet)
- Adding or removing agents — restart the daemon. Phase 19.
- Plugin instances (
whatsapp.yaml,telegram.yamlinstance blocks) — restart the daemon. Plugin sessions own QR pairing / long-polling state that needs lifecycle plumbing. Phase 19. broker.yaml,memory.yaml— restart the daemon. Long-lived connections + storage handles aren't safe to swap mid-flight.workspace,skills_dir,transcripts_diron an agent — restart that agent.
The daemon logs every restart-required field that changed during a
reload as warn so you don't have to remember which knob lives where.
See also
- Config hot-reload — full behaviour reference
- agents.yaml — per-binding override surface
- Per-agent credentials — credential
rotation has its own
POST /admin/credentials/reloadendpoint - Metrics —
config_reload_*series
Build a poller module
Three steps. No main.rs edit, no scheduler, no breaker, no SQLite
work. The runner gives you all of that — your code only describes
what to fetch, what to dispatch, and (optionally) what kind-specific
LLM tools to expose.
Reference: crates/poller/src/builtins/ for in-tree examples (gmail.rs,
rss.rs, webhook_poll.rs, google_calendar.rs).
Step 1 — implement the trait
#![allow(unused)] fn main() { // crates/poller/src/builtins/jira.rs use std::sync::Arc; use nexo_poller::{ OutboundDelivery, PollContext, Poller, PollerError, TickOutcome, }; use async_trait::async_trait; use serde::Deserialize; use serde_json::{json, Value}; #[derive(Debug, Deserialize, Clone)] #[serde(deny_unknown_fields)] struct JiraConfig { base_url: String, project_key: String, deliver: nexo_poller::builtins::gmail::DeliverCfg, } pub struct JiraPoller; #[async_trait] impl Poller for JiraPoller { fn kind(&self) -> &'static str { "jira" } fn description(&self) -> &'static str { "Polls Jira for newly assigned issues in a project." } fn validate(&self, config: &Value) -> Result<(), PollerError> { serde_json::from_value::<JiraConfig>(config.clone()) .map(drop) .map_err(|e| PollerError::Config { job: "<jira>".into(), reason: e.to_string(), }) } async fn tick(&self, ctx: &PollContext) -> Result<TickOutcome, PollerError> { let cfg: JiraConfig = serde_json::from_value(ctx.config.clone()) .map_err(|e| PollerError::Config { job: ctx.job_id.clone(), reason: e.to_string(), })?; // 1. Pull data. Use ctx.cursor for incremental fetches. // 2. Decide what to dispatch. // 3. Build OutboundDelivery items — the runner publishes them // via Phase 17 credentials so you never touch the broker. let payload = json!({ "text": "(jira tick — replace with real fetch)" }); Ok(TickOutcome { items_seen: 0, items_dispatched: 1, deliver: vec![OutboundDelivery { channel: nexo_auth::handle::TELEGRAM, recipient: cfg.deliver.to.clone(), payload, }], next_cursor: None, next_interval_hint: None, }) } } }
Anything Poller::validate returns Err(PollerError::Config { … })
fails this job at boot — siblings keep going.
Poller::tick returns:
Ok(TickOutcome)— the runner persistsnext_cursor, increments counters, dispatches everyOutboundDeliveryvia the agent's Phase 17 binding, and sleeps until next slot.Err(PollerError::Transient(…))— counts toward the breaker; next tick retries with backoff.Err(PollerError::Permanent(…))— auto-pauses the job and fires thefailure_toalert.
PollContext.stores exposes the credential stores when your module
needs paths (e.g., Gmail / Calendar built-ins read
client_id_path from there). Plain ctx.credentials.resolve(…) is
enough when you only need a CredentialHandle.
Step 2 — register
#![allow(unused)] fn main() { // crates/poller/src/builtins/mod.rs pub mod gmail; pub mod google_calendar; pub mod jira; // ← new pub mod rss; pub mod webhook_poll; pub fn register_all(runner: &PollerRunner) { runner.register(Arc::new(gmail::GmailPoller::new())); runner.register(Arc::new(rss::RssPoller::new())); runner.register(Arc::new(webhook_poll::WebhookPoller::new())); runner.register(Arc::new(google_calendar::GoogleCalendarPoller::new())); runner.register(Arc::new(jira::JiraPoller)); // ← new } }
That is the only place wiring is touched. main.rs already calls
register_all.
Step 3 — declare a job
# config/pollers.yaml
pollers:
jobs:
- id: ana_jira_assigned
kind: jira
agent: ana
schedule: { every_secs: 300 }
config:
base_url: https://company.atlassian.net
project_key: ENG
deliver:
channel: telegram
to: "1194292426"
Run the daemon. Verify with:
agent pollers list # ana_jira_assigned shows up
agent pollers run ana_jira_assigned # tick on demand
Add per-kind LLM tools
Your module can ship its own tools alongside the generic
pollers_* ones. Override Poller::custom_tools:
#![allow(unused)] fn main() { fn custom_tools(&self) -> Vec<nexo_poller::CustomToolSpec> { use nexo_llm::ToolDef; use nexo_poller::{CustomToolHandler, CustomToolSpec, PollerRunner}; use async_trait::async_trait; struct JiraSearch; #[async_trait] impl CustomToolHandler for JiraSearch { async fn call( &self, runner: Arc<PollerRunner>, args: Value, ) -> anyhow::Result<Value> { // Use `runner` to inspect / mutate jobs the same way // built-in `pollers_*` tools do — list_jobs, run_once, // set_paused, reset_cursor are all available. let id = args["id"] .as_str() .ok_or_else(|| anyhow::anyhow!("`id` required"))?; let outcome = runner.run_once(id).await?; Ok(json!({ "matching": outcome.items_seen })) } } vec![CustomToolSpec { def: ToolDef { name: "jira_search".into(), description: "Run the Jira poll job once without persisting state.".into(), parameters: json!({ "type": "object", "properties": { "id": { "type": "string" } }, "required": ["id"] }), }, handler: Arc::new(JiraSearch), }] } }
The agent then sees jira_search automatically — no extra
registration step. The adapter in
nexo-poller-tools::register_all walks every registered Poller's
custom_tools() and wires each spec into the per-agent
ToolRegistry.
What the runner gives you for free
- Per-job
tokiotask withevery | cron | atschedule + jitter. - Cross-process atomic lease in SQLite (lease takeover after TTL expiry — daemon crash mid-tick is recoverable).
- Cursor persistence — your
next_cursoris the next tick'sctx.cursor. Survives restarts.agent pollers reset <id>clears it. - Exponential backoff on
Transient, auto-pause onPermanent. - Per-job circuit breaker keyed on
("poller", job_id). - Outbound dispatch via Phase 17 —
OutboundDeliverylands atplugin.outbound.<channel>.<instance>resolved from the agent's binding. You never touch the broker. - 7 Prometheus series labelled by
kind,agent,job_id,status. Audit log undertarget=credentials.audit. - Admin endpoints + CLI subcommands (
agent pollers …). - Six generic LLM tools (
pollers_list,pollers_show,pollers_run,pollers_pause,pollers_resume,pollers_reset). - Hot-reload via
POST /admin/pollers/reload—add | replace | remove | keepplan applied atomically.
Tests pattern
#![allow(unused)] fn main() { #[tokio::test] async fn validate_accepts_minimal() { let p = JiraPoller; let cfg = json!({ "base_url": "https://x.atlassian.net", "project_key": "ENG", "deliver": { "channel": "telegram", "to": "1" }, }); p.validate(&cfg).unwrap(); } #[tokio::test] async fn validate_rejects_unknown_field() { let p = JiraPoller; let cfg = json!({ "wat": true, "deliver": { "channel": "x", "to": "1" }}); assert!(p.validate(&cfg).is_err()); } }
Cursor / dispatch tests follow the same pattern as the in-tree
built-ins (gmail.rs, rss.rs, webhook_poll.rs).
Anti-patterns
- Don't publish to the broker directly from
tick. ReturnOutboundDeliveryso the runner uses Phase 17 + audit log. - Don't share global state across modules. Use cursors for
per-job state; use
DashMapinside your struct for per-account caches (gmail does this forGoogleAuthClient). - Don't sleep inside
tickfor backoff. ReturnPollerError::Transientand let the runner own the backoff schedule — that wayagent pollers resetand hot-reload still cancel cleanly. - Don't auto-create jobs from inside an LLM tool. The runner
intentionally exposes only read + control on existing jobs.
Operators own
pollers.yaml.
Deploy on Hetzner Cloud (CX22)
A concrete recipe for a single-VPS production deploy. CX22 is the Hetzner sweet spot — €3.79/mo, 2 vCPU, 4 GB RAM, 40 GB SSD, ARM64, 20 TB transfer included. Runs the Nexo daemon + an internal NATS broker comfortably with headroom for the browser plugin (Chrome).
This recipe targets a single-tenant personal-agent deploy. For multi-tenant or multi-process see Phase 32.
What you end up with
- Nexo daemon under systemd, auto-start on boot
- NATS broker on the same host (
nats-serverfrom the official Debian package), auto-start - Cloudflare Tunnel for inbound HTTPS without opening ports
- UFW firewall: only outbound + cloudflared
- Unattended security upgrades
- TLS handled by Cloudflare; no Let's Encrypt cert renewal to babysit
Estimated cost: ~€4/month (CX22 only; Cloudflare Tunnel is free).
0. Prerequisites
- Hetzner Cloud account with API token
- Cloudflare account with a domain pointed at it
- SSH key uploaded to Hetzner (
hcloud ssh-key create --name ops --public-key-from-file ~/.ssh/id_ed25519.pub)
1. Provision the VPS
Via Hetzner Cloud console: New Server → Location: any close to
your users → Image: Debian 12 → Type: CX22 (ARM64, shared
vCPU). Add your SSH key. Name it nexo-1.
CLI alternative:
hcloud server create \
--name nexo-1 \
--type cx22 \
--image debian-12 \
--ssh-key ops \
--location nbg1
Wait ~30s, grab the IPv4 from the dashboard.
2. Initial hardening (one-time)
SSH in as root, then drop privileges to a sudo user:
ssh root@<ip>
adduser ops
usermod -aG sudo ops
rsync --archive --chown=ops:ops ~/.ssh /home/ops
exit
ssh ops@<ip>
sudo apt update && sudo apt full-upgrade -y
sudo apt install -y unattended-upgrades ufw fail2ban
sudo dpkg-reconfigure -p low unattended-upgrades
# Firewall: deny inbound, allow outbound + ssh from your IP only
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow from <your-home-ip> to any port 22 proto tcp
sudo ufw enable
# Disable root SSH + password auth
sudo sed -i 's/^#\?PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config
sudo sed -i 's/^#\?PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
sudo systemctl restart ssh
3. Install Nexo from the .deb
Once Phase 27.4 ships and a release exists with an arm64 .deb:
curl -LO https://github.com/lordmacu/nexo-rs/releases/latest/download/nexo-rs_arm64.deb
# Verify the signature first (Phase 27.3):
curl -LO https://github.com/lordmacu/nexo-rs/releases/latest/download/nexo-rs_arm64.deb.bundle
cosign verify-blob \
--bundle nexo-rs_arm64.deb.bundle \
--certificate-identity-regexp 'https://github.com/lordmacu/nexo-rs/.*' \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
nexo-rs_arm64.deb \
|| { echo "REFUSING TO INSTALL UNSIGNED PACKAGE"; exit 1; }
sudo apt install ./nexo-rs_arm64.deb
The post-install scaffolds the nexo user, owns
/var/lib/nexo-rs/, and prints next steps. Does not auto-start
the service — that comes after we wire config.
4. Install + enable NATS
# Hetzner Debian repo doesn't ship nats-server; use the upstream .deb
NATS_VERSION=2.10.20
curl -LO "https://github.com/nats-io/nats-server/releases/download/v${NATS_VERSION}/nats-server-v${NATS_VERSION}-linux-arm64.deb"
sudo apt install ./nats-server-v${NATS_VERSION}-linux-arm64.deb
sudo systemctl enable --now nats-server
NATS now listens on 127.0.0.1:4222 (loopback only) — exactly
what we want; only Nexo running on the same host should reach it.
5. Wire Nexo config
sudo -u nexo nexo setup
The wizard asks for:
- LLM provider keys (Anthropic / MiniMax / etc.) — paste them; they
land in
/var/lib/nexo-rs/secret/mode 0600 owned bynexo:nexo - WhatsApp / Telegram pairing — defer if not needed yet
- Memory backend — pick
sqlite-vec(default for single-host)
The wizard writes /etc/nexo-rs/{agents,broker,llm,memory}.yaml.
Verify broker.yaml points at nats://127.0.0.1:4222.
6. Cloudflare Tunnel for HTTPS
The Nexo admin port (8080) shouldn't be exposed directly. Use a tunnel:
# Install cloudflared
curl -LO https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-arm64.deb
sudo apt install ./cloudflared-linux-arm64.deb
# Authenticate (opens a browser link — visit it on your laptop)
cloudflared tunnel login
# Create tunnel
cloudflared tunnel create nexo-1
# Route a hostname
cloudflared tunnel route dns nexo-1 nexo.yourdomain.com
# Config
sudo mkdir -p /etc/cloudflared
sudo tee /etc/cloudflared/config.yml >/dev/null <<EOF
tunnel: nexo-1
credentials-file: /home/ops/.cloudflared/<UUID>.json
ingress:
- hostname: nexo.yourdomain.com
service: http://127.0.0.1:8080
- service: http_status:404
EOF
# Run as a service
sudo cloudflared service install
sudo systemctl enable --now cloudflared
Now https://nexo.yourdomain.com reaches the Nexo admin via
Cloudflare's edge — TLS terminated at Cloudflare, no cert renewal,
DDoS protection bundled.
7. Start Nexo
sudo systemctl enable --now nexo-rs
sudo journalctl -u nexo-rs -f
You should see the boot sequence: config validated → broker connected → agents loaded → ready.
8. Verify
# Local health check (over the loopback)
curl -fsSL http://127.0.0.1:8080/health
# External via the tunnel
curl -fsSL https://nexo.yourdomain.com/health
# Metrics endpoint
curl -fsSL http://127.0.0.1:9090/metrics | head -20
9. Backups
The state lives in /var/lib/nexo-rs/. Daily snapshot to S3 /
Backblaze:
# /etc/cron.daily/nexo-backup
#!/bin/sh
set -eu
TIMESTAMP=$(date -u +%Y%m%dT%H%M%SZ)
BACKUP="/tmp/nexo-${TIMESTAMP}.tar.zst"
# Pause the runtime briefly so SQLite isn't mid-write.
systemctl stop nexo-rs
tar -I 'zstd -19 -T0' \
-cf "$BACKUP" \
-C /var/lib/nexo-rs \
--exclude='./queue/*.tmp' \
.
systemctl start nexo-rs
# Upload — adjust to your storage backend
rclone copy "$BACKUP" remote:nexo-backups/
rm "$BACKUP"
# Retain last 30
rclone delete --min-age 30d remote:nexo-backups/
chmod +x /etc/cron.daily/nexo-backup.
For a sub-second pause-free backup, use SQLite's
VACUUM INTO-based hot backup — track Phase 36 (backup, restore,
migrations) for the upcoming nexo backup subcommand.
10. Updates
# Pull the latest .deb
curl -LO https://github.com/lordmacu/nexo-rs/releases/latest/download/nexo-rs_arm64.deb
# Verify (always)
cosign verify-blob ...
# Install (apt restarts the service automatically)
sudo apt install ./nexo-rs_arm64.deb
Or wire the apt repo (Phase 27.4 follow-up) and run
apt upgrade nexo-rs like any other system package.
Limits + escape hatches
- Browser plugin uses ~300 MB RAM per Chrome process. CX22 has 4 GB; budget 2 instances tops. Bump to CX32 (€7/mo, 4 vCPU, 8 GB) when you start hitting OOM.
- NATS on the same host is fine for single-tenant; for multi-host, run NATS on its own VM (CX12, €3.29/mo).
- TLS at Cloudflare only means traffic between Cloudflare's edge and your VPS is plain HTTP over the tunnel. The tunnel is encrypted at the transport layer (QUIC + mTLS to Cloudflare), so this is fine — but if you want defense-in-depth, terminate TLS again locally with caddy or nginx.
Troubleshooting
- Tunnel disconnects after reboot —
systemctl status cloudflared. The credentials file moved if you reinstalled cloudflared with a differentservice install. Re-runcloudflared service installaftercloudflared tunnel login. - NATS refuses connections — the upstream .deb binds
0.0.0.0:4222by default. Edit/etc/nats-server/nats-server.confto sethost: 127.0.0.1andsystemctl restart nats-server. - Nexo can't write to /var/lib/nexo-rs/ —
sudo chown -R nexo:nexo /var/lib/nexo-rs && sudo chmod 0750 /var/lib/nexo-rs.
Related
- Docker compose — single-machine but containerized (vs systemd-native here)
- Native install — the underlying mechanics of step 3 if you skip the .deb
- Phase 27.4 (Debian / RPM packages) — source of the
.debthis recipe consumes
Deploy on Fly.io
Recipe for a single-region Fly.io deploy. Fly's strengths fit Nexo well: persistent volumes (for the SQLite state), health checks, free TLS, easy multi-region scale-out, and a generous free tier (up to 3 shared-1x VMs free) that covers a personal agent.
What you end up with
- Nexo daemon + bundled local NATS broker on a single Fly machine
- Persistent volume mounted at
/var/lib/nexo-rs/ - Free TLS via
fly.iosubdomain (custom domain optional) - Auto-redeploy on every git push to
main(via Fly GitHub Action) - Fly's built-in metrics + log streaming
Estimated cost: $0–$5/mo (free tier covers shared-1x VM + small volume; bigger Chrome workloads = $5-15/mo on a performance-1x).
0. Prerequisites
# Install flyctl
curl -L https://fly.io/install.sh | sh
fly auth login
fly auth signup # if first time
# Confirm:
fly version
1. Initialize the app
From the repo root:
fly launch \
--name nexo-yourname \
--region <closest-region> \
--vm-cpu-kind shared \
--vm-cpus 1 \
--vm-memory 1024 \
--no-deploy
--no-deploy lets us tweak the generated fly.toml before the
first build.
2. fly.toml
Replace the auto-generated fly.toml with this:
app = "nexo-yourname"
primary_region = "ams" # or whichever closest
# Use the published GHCR image instead of building per-deploy.
[build]
image = "ghcr.io/lordmacu/nexo-rs:latest"
# Persistent state — Fly volumes survive restarts and are
# mounted into the VM. SQLite + transcripts + secret/ live here.
[mounts]
source = "nexo_data"
destination = "/app/data"
# Override the container CMD so config + state align with the
# fly volume layout. NEXO_HOME defaults to /app/data so
# everything writable lands on the volume.
[env]
RUST_LOG = "info"
NEXO_HOME = "/app/data"
# `services` block tells Fly which container ports to expose.
[[services]]
internal_port = 8080
protocol = "tcp"
auto_stop_machines = false # keep the agent running 24/7
auto_start_machines = true
min_machines_running = 1
[[services.ports]]
port = 80
handlers = ["http"]
force_https = true
[[services.ports]]
port = 443
handlers = ["tls", "http"]
[services.concurrency]
type = "connections"
soft_limit = 200
hard_limit = 250
[[services.tcp_checks]]
interval = "15s"
timeout = "2s"
grace_period = "30s"
# Metrics endpoint — Fly scrapes Prometheus-style automatically.
[metrics]
port = 9090
path = "/metrics"
# VM sizing — bump to performance-1x when the browser plugin is on.
[[vm]]
cpu_kind = "shared"
cpus = 1
memory_mb = 1024
3. Create the volume
fly volumes create nexo_data --region ams --size 3
3 GB covers SQLite + a few months of transcripts. Bump as needed.
4. Set secrets
Fly's secret store injects them as env vars at runtime. Reference
them from config/llm.yaml via ${ENV_VAR} placeholders:
fly secrets set ANTHROPIC_API_KEY=sk-ant-...
fly secrets set MINIMAX_API_KEY=...
fly secrets set MINIMAX_GROUP_ID=...
# Anything else your llm.yaml references via ${...}
The Nexo config loader resolves ${ANTHROPIC_API_KEY} placeholders
from the process env — works the same whether the env vars come
from /run/secrets/, ~/.bashrc, or Fly secrets.
5. Pre-bake the config
Fly mounts /app/data from the volume but /app/config lives
inside the image. Two options:
Option A — bake config into a custom image (recommended). Wrap the GHCR image in a tiny Dockerfile:
# Dockerfile.fly
FROM ghcr.io/lordmacu/nexo-rs:latest
# Copy your operator config tree into the image. Adjust to
# whatever your setup needs — just don't ship secrets here, use
# fly secrets for those.
COPY ./config/fly /app/config
# fly.toml's CMD already passes `--config /app/config`.
Then change fly.toml:
[build]
dockerfile = "Dockerfile.fly"
Option B — write config to the volume on first boot. Use a
Fly machine init script that runs nexo setup --non-interactive --from-env once, then exits.
6. Deploy
fly deploy
First deploy spins up the volume + machine. Subsequent deploys hot-swap the image with zero-downtime rolling restart.
7. Verify
# Health
fly status
curl https://nexo-yourname.fly.dev/health
# Metrics (over the Fly internal network)
fly proxy 9090:9090 -a nexo-yourname &
curl http://127.0.0.1:9090/metrics | head -20
# Logs
fly logs
# SSH in if something looks off
fly ssh console
8. Custom domain
fly certs add nexo.yourdomain.com
# Add the CNAME to your DNS as instructed
fly certs check nexo.yourdomain.com
9. Continuous deploy on push
Drop this into .github/workflows/fly-deploy.yml:
name: fly-deploy
on:
push:
branches: [main]
permissions:
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: superfly/flyctl-actions/setup-flyctl@master
- run: flyctl deploy --remote-only
env:
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
Get a token: fly tokens create deploy -x 999999h. Drop in repo
secrets as FLY_API_TOKEN.
10. Backups
# Manual snapshot
fly volumes snapshots create nexo_data
fly volumes snapshots list nexo_data
# Restore (creates a new volume from the snapshot)
fly volumes create nexo_data_restored \
--snapshot-id vs_xxxxxxxxxxxx \
--region ams
For automated backups, set up a daily Fly cron machine that runs
fly volumes snapshots create against the data volume.
Limits + escape hatches
- Free tier shared-1x has 1 vCPU + 256 MB RAM — too small for
the browser plugin. Disable Chrome (
plugins.browser.enabled: false) on shared-1x; or bump to performance-1x ($15/mo, 1 vCPU + 2 GB). - Single-region by default — Fly has a multi-region story
but the broker (NATS) doesn't speak Fly's distributed
primitives. For multi-region, run NATS on a dedicated VM with
NatsBrokercluster mode and pin Nexo machines to the same region as their broker. - Volume snapshots cost $0.15/GB/month — small but adds up if you keep many. Auto-prune via the snapshot cron.
Troubleshooting
- Volume mount fails on machine start —
fly volumes listmust show the volume in the same region as the machine. Mismatch = create the volume in the right region or move the machine. - Out of memory + machine cycles — most likely the browser
plugin loaded Chrome on a shared-1x. Check
fly logsfor OOM killer messages; bump VM size or disable the browser plugin. - Secrets not picked up after deploy — Fly redacts them in
logs but they're in the env. SSH in (
fly ssh console), runprintenv | grep ANTHROPICto verify.
Related
- Docker GHCR — same image Fly pulls
- Hetzner deploy — bare-VM alternative if you outgrow Fly's free tier or want full control
- Phase 27.5 (Docker GHCR) — source of the image this recipe pulls
Deploy on AWS (EC2)
Recipe for a single-AZ AWS deploy on t4g.small (ARM Graviton).
Fits a personal-agent or small team; production multi-AZ scale-out
needs Phase 32 multi-host orchestration.
What you end up with
- Nexo daemon under systemd on EC2 + EBS gp3 for state
- Nginx + ACM cert for TLS termination (free)
- Route53 hostname pointing at the instance
- IAM role granting only SES send + S3 backup-bucket access (no console / no read of other AWS resources)
- Daily snapshot of the EBS volume + lifecycle policy retaining 30
- CloudWatch agent shipping
/var/log/nexo-rs/*.log+ metrics
Estimated cost (us-east-1, on-demand):
t4g.smallinstance: ~$13.43/mogp316 GB EBS: ~$1.28/mo- Route53 hosted zone: $0.50/mo
- ACM cert: free
- SES outbound (5k emails/mo on free tier first 12 months): free then $0.10/1k
- Total: ~$15-20/mo
Cheaper alternative for personal-agent budgets: use Hetzner's CX22 at €4/mo if you don't need AWS-specific integrations.
0. Prerequisites
- AWS account with billing alarms set
- Route53 hosted zone for your domain
- AWS CLI installed and
aws configure'd locally - Terraform 1.5+ if you want infra-as-code (recommended)
1. Provision via Terraform (recommended)
The repo will eventually ship deploy/terraform/aws/ (Phase 40
follow-up). Until then, here's a minimal main.tf:
terraform {
required_providers {
aws = { source = "hashicorp/aws", version = "~> 5.0" }
}
}
provider "aws" {
region = "us-east-1"
}
# --- VPC + subnet -----------------------------------------------------
resource "aws_vpc" "nexo" {
cidr_block = "10.0.0.0/16"
enable_dns_support = true
enable_dns_hostnames = true
tags = { Name = "nexo" }
}
resource "aws_subnet" "nexo_public" {
vpc_id = aws_vpc.nexo.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
map_public_ip_on_launch = true
}
resource "aws_internet_gateway" "nexo" {
vpc_id = aws_vpc.nexo.id
}
resource "aws_route_table" "nexo_public" {
vpc_id = aws_vpc.nexo.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.nexo.id
}
}
resource "aws_route_table_association" "nexo_public" {
subnet_id = aws_subnet.nexo_public.id
route_table_id = aws_route_table.nexo_public.id
}
# --- security group ----------------------------------------------------
resource "aws_security_group" "nexo" {
name = "nexo"
vpc_id = aws_vpc.nexo.id
# SSH only from your home IP — replace 1.2.3.4/32 with yours.
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["1.2.3.4/32"]
}
# 443 open to the world, terminated at nginx on the instance.
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# 80 only to redirect to https.
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# --- IAM role: SES + S3 backups, nothing else --------------------------
resource "aws_iam_role" "nexo" {
name = "nexo-instance"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = { Service = "ec2.amazonaws.com" }
}]
})
}
resource "aws_iam_role_policy" "nexo" {
name = "nexo-instance-policy"
role = aws_iam_role.nexo.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{ Effect = "Allow", Action = ["ses:SendEmail","ses:SendRawEmail"], Resource = "*" },
{ Effect = "Allow", Action = ["s3:PutObject","s3:GetObject","s3:DeleteObject","s3:ListBucket"], Resource = ["arn:aws:s3:::your-nexo-backups","arn:aws:s3:::your-nexo-backups/*"] }
]
})
}
resource "aws_iam_instance_profile" "nexo" {
name = "nexo-instance"
role = aws_iam_role.nexo.name
}
# --- AMI lookup: latest Debian 12 arm64 -------------------------------
data "aws_ami" "debian" {
most_recent = true
owners = ["136693071363"] # Debian official
filter {
name = "name"
values = ["debian-12-arm64-*"]
}
}
# --- instance ----------------------------------------------------------
resource "aws_instance" "nexo" {
ami = data.aws_ami.debian.id
instance_type = "t4g.small"
subnet_id = aws_subnet.nexo_public.id
vpc_security_group_ids = [aws_security_group.nexo.id]
iam_instance_profile = aws_iam_instance_profile.nexo.name
key_name = "your-existing-aws-keypair-name"
root_block_device {
volume_size = 16
volume_type = "gp3"
encrypted = true
}
tags = {
Name = "nexo-1"
}
}
# --- Route53 DNS -------------------------------------------------------
data "aws_route53_zone" "main" {
name = "yourdomain.com."
}
resource "aws_route53_record" "nexo" {
zone_id = data.aws_route53_zone.main.zone_id
name = "nexo.yourdomain.com"
type = "A"
ttl = 300
records = [aws_instance.nexo.public_ip]
}
output "nexo_ip" {
value = aws_instance.nexo.public_ip
}
Then:
terraform init
terraform apply
# review the plan; type 'yes'
2. Hardening + install (post-provision)
SSH in:
ssh admin@nexo.yourdomain.com
sudo apt update && sudo apt full-upgrade -y
sudo apt install -y unattended-upgrades ufw fail2ban nginx certbot python3-certbot-nginx
sudo dpkg-reconfigure -p low unattended-upgrades
# UFW — defense in depth on top of the security group
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
# Disable root SSH + password auth
sudo sed -i 's/^#\?PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config
sudo sed -i 's/^#\?PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
sudo systemctl restart ssh
Install Nexo (when 27.4 .deb is available):
curl -LO https://github.com/lordmacu/nexo-rs/releases/latest/download/nexo-rs_arm64.deb
# Verify Cosign signature first (Phase 27.3) — see verify.md
sudo apt install ./nexo-rs_arm64.deb
NATS:
NATS_VERSION=2.10.20
curl -LO "https://github.com/nats-io/nats-server/releases/download/v${NATS_VERSION}/nats-server-v${NATS_VERSION}-linux-arm64.deb"
sudo apt install ./nats-server-v${NATS_VERSION}-linux-arm64.deb
sudo systemctl enable --now nats-server
3. nginx + ACM-via-certbot
sudo tee /etc/nginx/sites-available/nexo >/dev/null <<'EOF'
server {
listen 80;
server_name nexo.yourdomain.com;
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name nexo.yourdomain.com;
# Cert paths populated after `certbot --nginx`
ssl_certificate /etc/letsencrypt/live/nexo.yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/nexo.yourdomain.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
# Health check — proxied through to the daemon
location /health { proxy_pass http://127.0.0.1:8080; access_log off; }
location /ready { proxy_pass http://127.0.0.1:8080; access_log off; }
# Admin surface (auth via the daemon's session token)
location /api/ { proxy_pass http://127.0.0.1:8080; }
location /admin/ { proxy_pass http://127.0.0.1:8080; }
# Block /metrics from public — scrape internally only
location /metrics { return 403; }
}
EOF
sudo ln -s /etc/nginx/sites-available/nexo /etc/nginx/sites-enabled/nexo
sudo nginx -t
# Issue cert (ACME via Let's Encrypt — same chain ACM uses)
sudo certbot --nginx -d nexo.yourdomain.com --non-interactive --agree-tos -m ops@yourdomain.com
sudo systemctl reload nginx
If you want AWS ACM specifically (instead of Let's Encrypt), front the EC2 with an ALB and attach an ACM cert there — adds ~$18/mo for the ALB. Most personal deploys don't need it.
4. Wire SES for outbound email
The IAM role grants ses:SendEmail. Configure in config/llm.yaml:
plugins:
email:
provider: ses
aws_region: us-east-1
# Credentials come from the EC2 instance profile — no keys
# in the YAML.
sender: "agent@nexo.yourdomain.com"
Verify the sender domain in SES first:
aws ses verify-domain-identity --domain yourdomain.com
# Add the printed TXT record to Route53
aws ses set-identity-mail-from-domain --identity yourdomain.com \
--mail-from-domain mail.yourdomain.com
If your SES account is still in sandbox, request production access via the SES console — required to send to non-verified recipients.
5. EBS snapshots + lifecycle
# Daily snapshot via DLM (Data Lifecycle Manager) — set up once
# in Terraform or via the console:
aws dlm create-lifecycle-policy \
--description "nexo daily snapshots, retain 30" \
--state ENABLED \
--execution-role-arn arn:aws:iam::ACCT:role/AWSDataLifecycleManagerDefaultRole \
--policy-details '{...}' # see DLM docs
Or the cheap way: cron + aws ec2 create-snapshot on the
instance itself, retaining 30 days locally.
6. CloudWatch logs + metrics
sudo apt install -y amazon-cloudwatch-agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
# Point at /var/log/nexo-rs/*.log + 9090/metrics scrape
The Prometheus metrics endpoint can be pulled by CloudWatch Container Insights via the EMF agent if you go in that direction. For most personal deploys, journalctl + a Grafana Cloud free-tier scrape is cheaper.
Limits + escape hatches
- t4g.small RAM (2 GB) is tight if the browser plugin is on.
Bump to
t4g.medium(4 GB, ~$26/mo) before turning on Chrome. - Single AZ. AZ outage = full downtime. Multi-AZ needs Phase 32 + an external NATS cluster. Acceptable for personal agents; not for SLAs.
- SES sandbox limit (200 emails/day) until you request production. Plan for this if email channel is primary.
- EIP not allocated. Stop/start the instance and the public IP changes. Allocate an Elastic IP (free when attached) if the Route53 record can't auto-update.
Troubleshooting
- Nexo can't send email —
aws sts get-caller-identityfrom the instance must show thenexo-instancerole. If empty, the instance profile is missing. - certbot --nginx fails — DNS hasn't propagated yet. Wait 5-10 min after the Route53 record creation.
/healthreturns 503 — broker not ready.systemctl status nats-server; if good, checkjournalctl -u nexo-rsfor credential errors (instance profile didn't propagate, orconfig/llm.yamlreferences a key the instance can't reach).
Related
- Hetzner Cloud — bare-VM, cheaper
- Fly.io — easier scaling, less AWS lock-in
- Phase 27.4 (Debian package) — source of the .deb this recipe consumes
- Phase 27.3 (Cosign) — signature verification before install
Architecture Decision Records
Short documents capturing why the architecture is the way it is. Each ADR names an alternative that was considered and rejected, and the forces that drove the choice. Read these when you're tempted to change something load-bearing.
Format loosely follows Michael Nygard's ADR template: context, decision, consequences.
Index
| # | Title | Status |
|---|---|---|
| 0001 | Single-process runtime over microservices | Accepted |
| 0002 | NATS as the broker | Accepted |
| 0003 | sqlite-vec for vector search | Accepted |
| 0004 | Per-agent tool sandboxing at registry build time | Accepted |
| 0005 | Drop-in agents.d/ directory for private configs | Accepted |
| 0006 | Per-agent git repo for memory forensics | Accepted |
| 0007 | WhatsApp via whatsapp-rs (Signal Protocol) | Accepted |
| 0008 | MCP dual role — client and server | Accepted |
| 0009 | Dual MIT / Apache-2.0 licensing | Accepted |
Writing a new ADR
- Copy the template (next ADR below, or use
0001as a reference) - Number sequentially:
NNNN-short-slug.md - Set
status: Proposedwhile in review, flip toAcceptedorRejectedafter the discussion settles - Link from this index
- Do not edit accepted ADRs in place. Create a new ADR that
supersedes it and mark the old one
Superseded by NNNN.
ADRs are load-bearing documentation — they're how future you (and future contributors) learn that "NATS over RabbitMQ was not an accident."
ADR 0001 — Single-process runtime over microservices
Status: Accepted Date: 2026-01
Context
nexo-rs hosts N agents, each with its own LLM client, channel plugins, memory views, and extensions. The natural first instinct for Rust systems targeting real uptime is to split this into microservices: an agent service, a plugin service per channel, a memory service, etc., wired over the broker.
Every microservice adds:
- A serialization boundary (more CPU, more latency)
- A deployment artifact (more Dockerfiles, more CI)
- A failure mode (service down vs process down)
- An ops surface (metrics, health, logs per service)
The alternative — one binary hosting every subsystem as tokio tasks — gives up none of the durability (the disk queue + DLQ survive a process restart anyway) and keeps all in-memory caches naturally shared.
Decision
Ship one binary (agent) that hosts:
- Every agent runtime (one tokio task per agent)
- Every channel plugin (WhatsApp, Telegram, browser, …)
- Broker client + disk queue + DLQ
- Memory (short-term in-mem, long-term SQLite, vector sqlite-vec)
- Extension runtimes (stdio / NATS)
- MCP client and server
- TaskFlow runtime
- Metrics + health + admin HTTP servers
Coordination between tasks happens over the broker (NATS or the local mpsc fallback) exactly as if they were separate processes. Swapping to microservices later requires zero code changes on either side of the bus.
Consequences
Positive
- One Dockerfile, one health probe, one metrics endpoint
- No IPC overhead on hot paths (LLM tool calls go
ToolRegistry → Extensionthrough a tokio channel, not a network hop) - Memory caches (session, tool registry) are naturally shared
- Simpler ops: one log stream, one trace span hierarchy
Negative
- A bug that panics the process takes down every agent at once (the single-instance lockfile mitigates the blast radius by preventing silent double-boot)
- Scaling out means running more agent processes pointed at the same NATS — isolation between them requires deliberate NATS subject partitioning
Escape hatch
If a subsystem needs its own lifecycle (example: a GPU-heavy inference service), ship it as a NATS extension — it's automatically out-of-process and auto-discovered by the agent. Microservices by the back door, without splitting the monolith first.
ADR 0002 — NATS as the broker
Status: Accepted Date: 2026-01
Context
The event bus sits under every inter-plugin and inter-agent communication. Requirements:
- Subject-based routing with wildcards (
plugin.inbound.*,agent.route.<id>) - Low-latency pub/sub (sub-millisecond on LAN)
- No broker-side state to manage unless we opt in
- Clustered production deployments
- Mature async Rust client
Alternatives considered:
- RabbitMQ — heavier, queue-per-binding mental model fits less well for fan-out across plugin instances, ops overhead higher
- Redis streams / pub-sub — streams are great for durable event
logs but the stream-per-subject model clashes with free-form
plugin.outbound.<channel>.<instance>naming; pub-sub has no durability - Kafka — overkill for sub-millisecond request/reply loops, heavy ops, partition count becomes a thing you think about
- Custom over TCP — too much invented complexity
Additional implementation note: a crate literally called natsio
came up in early design research; it does not exist on crates.io.
The real Rust client is async-nats (from the NATS org itself),
matching the NATS 2.10 server line.
Decision
Use NATS as the broker. Specifically:
- Client:
async-nats = "0.35"(pinned inCargo.toml) - Subject namespace:
plugin.inbound.*,plugin.outbound.*,plugin.health.*,agent.events.*,agent.route.* - Fallback: a local
tokio::mpscbus implementing the sameBrokertrait for offline / single-machine runs - Durability: SQLite disk queue in front of every publish; drains FIFO on reconnect; 3 attempts before DLQ
Consequences
Positive
- Standard ops path (monitor on
:8222/healthz, prometheus exporter, clustering via well-known recipes) - Pub/sub semantics are trivial to reason about
- Swapping in JetStream later for persistent streams is additive
- Zero broker state in the happy path — restart NATS without catastrophe thanks to the disk queue
Negative
- NATS auth (NKey / JWT) has its own learning curve — see the NATS TLS + auth recipe
- No built-in message ordering guarantee across subjects (only per-subscriber). Callers that need ordering (e.g. delegation with correlation id) must enforce it themselves
Forbidden anti-pattern
- Do not use
natsioor any other non-async-nats client. The crate doesn't exist on crates.io; copy-paste from older design docs will mislead.
ADR 0003 — sqlite-vec for vector search
Status: Accepted Date: 2026-02
Context
Agents benefit from semantic recall — surface a memory whose text doesn't share keywords with the query but shares meaning. The usual playbook: run a dedicated vector database.
Requirements:
- Zero extra infrastructure for single-machine deployments
- Same durability and transactional model as the rest of memory
- Embedding-dimension sanity checks at startup
- Hybrid retrieval (keyword ⊔ vector) without a separate query plane
Alternatives considered:
- Qdrant / Weaviate / Milvus — all excellent; all require an extra service, network hop, and ops surface
- pgvector — would force Postgres everywhere, abandoning SQLite for long-term memory
- Simple numpy file + linear scan — works for small datasets, falls over past ~10k memories per agent
Decision
Use sqlite-vec: a SQLite extension that adds a vec0 virtual
table in the same DB file as long-term memory.
- One SQLite file holds
memories,memories_fts, andvec_memories— a singleJOINreturns content + tags alongside similarity - Dimension is checked at schema init; mismatch between config and existing rows aborts startup with an explicit message
sqlite3_auto_extensionregisters once per process- Hybrid retrieval uses Reciprocal Rank Fusion (K=60) over the keyword FTS5 hits and the vector neighbors
Consequences
Positive
- Zero-infra single-machine deploys keep working — no extra service to run
- Backups, replication, export are all just "copy the
.dbfile" - Transactional writes:
INSERTintomemories+vec_memoriesin one statement; no dual-write races - Hybrid retrieval is easy (see vector docs)
Negative
- sqlite-vec is newer than Qdrant; its indexing algorithm improves over time. Large indexes may need re-sorting periodically
- Changing embedding models (even same-dimension ones) produces a stale index — the ADR doesn't solve this, users must reindex
- The
sqlite3_auto_extensionregistration happens once per process and has caught test suites that spawn many short-lived connections off-guard
Swap-out path
EmbeddingProvider is a trait and the recall_mode = vector branch
is a single code path. Replacing sqlite-vec with Qdrant is a
day's work, not a rewrite.
ADR 0004 — Per-agent tool sandboxing at registry build time
Status: Accepted Date: 2026-02
Context
The same process hosts agents with very different blast radii.
Ana runs on WhatsApp against leads; Kate manages a personal Telegram;
ops has Proxmox credentials. The LLM in one agent must never see —
let alone invoke — tools registered for another agent.
Three enforcement points are possible:
- Prompt-level sandboxing — "don't use these tools." Relies on model compliance. Fails under adversarial prompts.
- Runtime filter — every
tools/callchecks a policy before dispatch. Robust, but the LLM still sees the tools intools/listand can hallucinate calls. - Registry build-time pruning — the agent's
ToolRegistryis built with only the allowed tools. The LLM literally cannot see the others.
Decision
Default to registry build-time pruning.
allowed_tools: [](empty) = every registered tool visibleallowed_tools: [glob, …]= strict allowlist, tools not matching are removed from the registry before the LLM'stools/listcall is answered- For agents with
inbound_bindings[], the base registry keeps every tool and per-binding overrides apply build-time filtering at turn time — a single agent can narrow its surface differently per channel
Additional layers stack on top:
outbound_allowlist.<channel>: [recipients]— even withwhatsapp_send_messagein the registry, the runtime rejects sends to unlisted recipients (defense in depth)tool_rate_limits— per-tool rate limiting for side-effectful tools- Per-agent
workspaceandlong-term memory (WHERE agent_id = ?)— data-level isolation
Consequences
Positive
- Adversarial prompts can't invoke missing tools — the model has no token string for them
- Easy mental model: grep
allowed_toolsto see what an agent can do - Prompt tokens stay small (tool list scales with allowlist, not registry)
Negative
- A misconfigured
allowed_toolssilently hides tools the LLM expected to use — the agent returns "I can't do that," puzzling both user and developer. Mitigation:agent statusshows the effective tool set per agent - Dynamic granting mid-session is not supported (would require re-handshake with the MCP clients)
Related
- Config — agents.yaml (allowed_tools semantics)
- Per-agent credentials — the gauntlet validates that the binding's channel instance is actually allowed
ADR 0005 — Drop-in agents.d/ directory for private configs
Status: Accepted Date: 2026-02
Context
Two kinds of agent content coexist in the same project:
- Public — the framework demo agents, ops helpers, templates
- Private — sales prompts, tarifarios, internal phone numbers, compliance-flagged customer scripts
The obvious "one agents.yaml" approach forces everything to be
either committed (leaking business content) or gitignored (losing
the template reference). Neither is acceptable.
Decision
Split by path convention:
config/agents.yaml— committed, public-safe defaultsconfig/agents.d/*.yaml— gitignored drop-in directoryconfig/agents.d/*.example.yaml— committed templates- Merge happens at load time: every
.yamlinagents.d/gets itsagents:array concatenated to the base list - Files load in lexicographic filename order, so
00-common.yaml10-prod.yamlcomposes predictably
.gitignoreincludes:config/agents.d/*.yaml !config/agents.d/*.example.yaml
Consequences
Positive
- Safe to open-source the repo; real business content stays private
- Templates stay in git (
ana.example.yaml) so newcomers can copy and fill - Per-environment layering falls out for free (
00-dev.yamlvs10-prod.yamlper deploy)
Negative
- Agent-id collisions across files are possible — the loader rejects them at startup with an explicit error. Operators must coordinate file naming
- Not every config is split this way — some operators expected
plugins.d/,llm.d/, etc. We decided against the generalization until a concrete need appeared
Related
- Config — drop-in agents — full mechanics
- Recipes — WhatsApp sales agent — shows the pattern in practice
ADR 0006 — Per-agent git repo for memory forensics
Status: Accepted Date: 2026-03
Context
An agent's memory evolves over time — dream sweeps promote memories, the agent writes USER.md / AGENTS.md / SOUL.md revisions, session closes append to MEMORY.md. When an agent misbehaves, "what did it know and when?" is a real debugging question.
Options considered:
- Append-only audit log per write — possible, but rolls out a custom scheme for every file
- DB-level revision history — works for LTM rows but not for workspace markdown files
- Git — battle-tested, standard tooling,
git logandgit blameship with every developer's laptop
Decision
When workspace_git.enabled: true, the agent's workspace
directory is a per-agent git repository. The runtime commits at
three specific moments:
- Dream sweep finishes — commit subject
promote, body lists promoted memories with scores - Session close — commit subject
session-close, body includes session id and agent id - Explicit
forge_memory_checkpoint(note)tool call — commit subjectcheckpoint: {note}
Commit mechanics:
- Staged: every non-ignored file (respects auto-generated
.gitignorethat excludestranscripts/,media/,*.tmp) - Skipped: files larger than 1 MiB (
MAX_COMMIT_FILE_BYTES) - Idempotent: no-op commit if tree clean
- Author:
{agent_id} <agent@localhost>(configurable) - No remote by default — operators add one if archival matters
Consequences
Positive
git loggives you a timestamped history of every memory evolution, for freememory_historytool lets the LLM reason about its own past state — e.g. "what did I believe about this user last week?"git diff <oldest>..HEADis one command away when debugging- Familiar tooling for humans (
git bisecta misbehaving agent)
Negative
- Repositories grow over time; operators should add a remote with periodic push-and-repack
- Commits are process-scoped — an agent process crash between "write MEMORY.md" and "commit" leaves an uncommitted diff. The next commit picks it up, but at that point the audit event is merged
- Transcripts are intentionally excluded from commits — they can be enormous and aren't the forensic artifact the ADR is aimed at
Related
- Soul — MEMORY.md + workspace-git
- Agent runtime — Graceful shutdown (session-close commit runs here)
ADR 0007 — WhatsApp via whatsapp-rs (Signal Protocol)
Status: Accepted Date: 2026-02
Context
"Add WhatsApp support" has three common paths:
- Official WhatsApp Business API — rate-limited, costs per message, requires business verification, limits proactive outreach to approved templates. Fine for some deployments, a bad fit for "run an agent on your personal number for a small business."
- Unofficial web-scraping libraries (e.g.
whatsapp-web.js) — pretend to be a browser, fragile against UI changes, frequently banned - Signal Protocol reimplementation — speak the native protocol that the WhatsApp mobile app speaks. Stable, fast, no scraping, permits all message types (voice, media, reactions, edits, etc.)
Decision
Use whatsapp-rs (Cristian's crate) which implements the Signal
Protocol handshake + pairing + message layer in Rust. nexo-rs wraps
it in crates/plugins/whatsapp:
- Pairing: setup-time QR scan via
Client::new_in_dir()— the wizard creates a per-agent session dir and renders the QR as Unicode blocks - Runtime: the plugin subscribes to inbound messages, forwards
to
plugin.inbound.whatsapp[.<instance>], handles the outbound side via the tool family (whatsapp_send_message,whatsapp_send_reply,whatsapp_send_reaction,whatsapp_send_media) - Credentials expiry: the plugin does not fall back to a runtime QR on 401 — the operator must re-pair via the wizard. The runtime refuses to boot without valid creds. This is a deliberate safety net against silent re-pair loops that would cross-deliver to the wrong account
- Multi-account: each agent points at its own session dir. No XDG_DATA_HOME mutation
Consequences
Positive
- Full feature coverage (voice, media, reactions, edits, groups)
- No per-message cost beyond the bandwidth
- No business-verification paperwork
- Works on a personal number, a secondary SIM, anything you can pair to WhatsApp's Linked Devices
Negative
- Signal Protocol parity is non-trivial; keeping up with WhatsApp
protocol evolution is an ongoing commitment of
whatsapp-rs - Running an agent on a personal number is a policy choice.
WhatsApp's Terms of Service don't love automated accounts; use
whatsapp-rson numbers you own and are ready to re-pair if they get banned - Multi-account needs careful session-dir management — see Plugins — WhatsApp gotchas
Forbidden alternatives
- Puppeteer / whatsapp-web.js / selenium — pulls the entire Chromium runtime into the process, breaks constantly, and is detected and banned faster than the Signal Protocol path
- Business API — only if the deployment pays for it and the agent flow survives template constraints; ship a separate plugin if this comes up
Related
../whatsapp-rs/sibling crate (Signal Protocol + pairing + Client)- Plugins — WhatsApp
- Recipes — WhatsApp sales agent
ADR 0008 — MCP dual role: client and server
Status: Accepted Date: 2026-03
Context
Model Context Protocol is becoming the de facto integration surface for LLM-driven tools. Two questions arose during the Phase 12 design:
- Should the agent be an MCP client (consume external MCP servers as tools)?
- Should the agent be an MCP server (expose its own tools to external MCP clients like Claude Desktop, Cursor, Zed)?
These are independent decisions. Picking one does not force the other.
Decision
Do both. Same process, same ToolRegistry, different transports.
- Client —
McpRuntimeManagerspawns stdio or HTTP MCP servers per session (with a shared "sentinel session" for servers that don't need per-session isolation). Their tools register into the per-sessionToolRegistrywith names like{server_name}_{tool_name}and are callable by the agent like any built-in - Server —
agent mcp-serversubcommand reads JSON-RPC from stdin and writes responses to stdout. Anmcp_server.yamlallowlist controls which tools are exposed. Configurableauth_token_envguards theinitializecall when the server is exposed through a tunnel
Both sides speak MCP 2024-11-05 (streamable HTTP) with SSE fallback for legacy servers.
Consequences
Positive
- Being a client: any MCP-speaking tool ecosystem is reachable without writing a custom extension
- Being a server: the agent's tools + memory become available inside Claude Desktop / Cursor / Zed — cross-session memory, remote actions, etc.
- Interop with the broader MCP catalog is a configuration change, not a code change
Negative
- Two independent code paths to keep current as the MCP spec evolves
expose_proxiesconfiguration gotcha: enabling it on the server side makes every upstream MCP server transitively visible to the consuming client. Default isfalseand the docs call this out explicitly- MCP spec churn (2024-11-05 vs future versions) needs staying power
Related
ADR 0009 — Dual MIT / Apache-2.0 licensing
Status: Accepted Date: 2026-04
Context
Open-sourcing nexo-rs required picking a license. Constraints:
- The Rust ecosystem convention (rustc, tokio, serde, clap, axum…) is dual MIT / Apache-2.0
- Downstream projects should be able to pick whichever license fits their own project's obligations
- Attribution to the original author must be legally enforceable — the author explicitly asked that users "use it, just name me"
- The author doesn't want to ship a custom / restrictive license that confuses or scares off contributors
Alternatives considered:
- MIT alone — fine, but missing the explicit patent grant that Apache-2 gives (relevant to corporate downstream users)
- Apache-2 alone — fine, but incompatible with GPLv2 downstream (MIT is compatible)
- AGPL-3 — forces source-release on SaaS; nexo-rs isn't trying to prevent cloud forks
- BSL (Business Source License) — source-available with time-delayed open-source conversion; inappropriate for a framework whose value is in wide adoption
- Custom "use it, name me" — would need a lawyer for every edge case; a solved problem doesn't need a new solution
Decision
Dual-license under MIT OR Apache-2.0:
LICENSE-MIT— full text of the MIT License, 2026 Cristian GarcíaLICENSE-APACHE— full text of the Apache-2.0 LicenseCargo.toml:license = "MIT OR Apache-2.0"(SPDX)NOTICEfile at repo root (required to be preserved by Apache-2.0 §4(d)) carries the attribution — author, contact, original repo URL- README links all three + explains the SPDX choice
Downstream users pick whichever they prefer. Attribution is mandatory under both.
Consequences
Positive
- Fits existing Rust ecosystem tooling (crates.io, rustdoc headers, CI scanners)
- Maximum compatibility: GPLv2 projects pick MIT, patent-sensitive corporate projects pick Apache-2
NOTICEfile gives the author the strongest attribution lever available in permissive OSS: removing it is a license violation
Negative
- Contributors who want to submit PRs agree (per Apache-2 §5) that their contributions are dual-licensed under the same terms. Some contributors may require a CLA discussion; none so far
- Trademark on the name "nexo-rs" is not covered — this ADR is about the code, not the brand. If the brand becomes load-bearing, register a trademark separately
Related
Contributing
PRs welcome. A few ground rules keep the codebase coherent.
Workflow
All feature work follows the /forge pipeline:
/forge brainstorm <topic> → /forge spec <topic> → /forge plan <topic> → /forge ejecutar <topic>
Per-sub-phase done criteria live in
PHASES.md.
Rules of the road
- All code, code comments, and Markdown docs in English.
- No hardcoded secrets. Use
${ENV_VAR}or${file:...}in YAML. - Every external call goes through
CircuitBreaker. No exceptions. - Don't commit anything under
secrets/. - Don't skip hooks (
--no-verify). Fix the underlying lint / test issue instead.
Docs must follow
Any change that touches user-visible behavior — features, config
fields, CLI flags, tool surfaces, retry policies — must update the
mdBook under docs/ in the same commit. Docs phase plan:
docs/PHASES.md.
All mdBook pages must be written in English.
Pure-internal changes (private renames, refactors, test-only) are exempt — mention that explicitly in the commit body.
Local checks
cargo fmt --all
cargo clippy --workspace --all-targets -- -D warnings
cargo test --workspace
./scripts/check_mdbook_english.sh
./scripts/check_markdown_english.sh
mdbook build docs
CI runs all of the above on every push and every PR.
Git pre-commit hook
The repo ships a pre-commit hook at .githooks/pre-commit that:
- Docs-sync gate — rejects the commit if production files under
crates/,src/,config/,extensions/,scripts/,.github/, orCargo.{toml,lock}are staged without anything underdocs/. cargo fmt --all -- --checkcargo clippy --workspace -- -D warningscargo test --workspace --quiet
Enable it once per clone:
git config core.hooksPath .githooks
(./scripts/bootstrap.sh does this for you.)
Bypass tags
The docs-sync gate honors a single opt-out tag. Include it in the commit message when the change is genuinely internal and doesn't need docs:
refactor: rename private fn [no-docs]
Acceptable reasons:
- Private refactor, no change to any public API
- Test-only changes
- Dependency bumps with no behavior change
- CI-config fiddling that doesn't alter ops
Do not use [no-docs] for anything a user would notice. If in
doubt, update the docs — it's the lower-regret path.
Full escape hatch
git commit --no-verify disables all hooks (fmt, clippy, tests,
docs-sync). Last resort, not a habit.
Reporting issues
Open a GitHub issue with:
- nexo-rs version / commit hash
- Rust version (
rustc -V) - OS / arch
- Relevant log lines (redact secrets)
- Minimal reproduction
License of contributions
Contributions are dual-licensed MIT OR Apache-2.0 as described in License.
Releases
Two complementary tools own the release pipeline:
| Tool | Owns |
|---|---|
release-plz | version bumps, git tags, crates.io publish, per-crate CHANGELOG.md |
cargo-dist | cross-target binary tarballs, curl | sh / PowerShell installers, sha256 sidecars |
They run on the same tag (nexo-rs-v<version>) and stay independent
— no overlapping config. Phase 27 brings both online; Phase 27.2
wires the GitHub Actions workflow that combines them on tag push.
What ships
The nexo binary is the only artifact in release tarballs. Every
other binary in the workspace (driver subsystem, dispatch tools,
companion-tui, mock MCP server) carries
[package.metadata.dist] dist = false so cargo-dist excludes it.
Dev / smoke programs (browser-test, integration-browser-check,
llm_smoke) live as [[example]] entries under examples/ for
the same reason.
Build provenance — nexo version
build.rs injects four stamps captured at compile time:
NEXO_BUILD_GIT_SHA— short git SHA of the build commit (orunknownoutside a git checkout)NEXO_BUILD_TARGET_TRIPLE— full Rust target tripleNEXO_BUILD_CHANNEL— opaque channel marker; defaults tosource. The release workflow overrides viaNEXO_BUILD_CHANNEL=apt-musl(etc.) so support tickets carry install-channel provenance.NEXO_BUILD_TIMESTAMP— UTC ISO8601 timestamp of the build
Operators see them with:
nexo version
# nexo 0.1.1
# git-sha: abc1234
# target: x86_64-unknown-linux-musl
# channel: apt-musl
# built-at: 2026-04-27T12:34:56Z
nexo --version (without --verbose or the subcommand) prints the
short form nexo <version>.
Local validation
make dist-check
Builds the host-target tarball via dist build --target $(rustc -vV | sed -n 's|host: ||p') and runs
scripts/release-check.sh.
The smoke gate verifies every present tarball contains the bin +
LICENSE-* + README.md and that the host-native --version
output matches the workspace version. Targets the local toolchain
can't satisfy emit [release-check] WARN lines instead of failing.
Full setup notes (cargo-dist, cargo-zigbuild, zig, rustup targets):
packaging/README.md.
What's automatic vs manual
| Step | Owner |
|---|---|
| Bump version + open release PR | release-plz (CI on push to main) |
Tag commit + crates.io publish | release-plz (on PR merge) |
| Build 2 musl tarballs (x86_64 + aarch64) | release.yml (Phase 27.2 ✅) — cargo-dist |
Build Termux .deb (aarch64-linux-android) | release.yml (Phase 27.2 ✅) — packaging/termux/build.sh |
| Upload tarballs + Termux deb + sha256 sidecars | release.yml (Phase 27.2 ✅) |
Smoke-test nexo --version + provenance stamps | release.yml (Phase 27.2 ✅) |
| Sign tarballs + Termux deb (cosign keyless) | sign-artifacts.yml (Phase 27.3 ✅) |
| Generate CycloneDX + SPDX SBOMs | sbom.yml (Phase 27.9 🔄) |
Apt repo publish + signed Release file | Phase 27.4 deferred |
| Yum / dnf repo publish | Phase 27.4 deferred |
| Termux pkg index | Phase 27.8 deferred |
| Homebrew bottle auto-PR | Phase 27.6 PARKED (Apple targets dropped) |
nexo self-update | Phase 27.10 deferred |
Adding a new bin to the release
- Declare the
[[bin]]in the appropriate crate'sCargo.toml. - If the crate hosts the bin via
[package.metadata.dist] dist = false, either remove that opt-out or move the bin to a new crate that doesn't carry it. - Re-run
make dist-checkand confirm the new bin shows up under[bin]in the dist plan output. - Update
scripts/release-check.sh's per-archive content check if the new bin should be required.
Adding a new target
- Append the target triple to
targets = […]indist-workspace.toml. - Append the matching tarball name to
EXPECTED_TARBALLSin the smoke gate. - Land the toolchain story in the GH Actions release workflow (Phase 27.2) — without that, the target builds locally only.
License
nexo-rs is dual-licensed under either:
- MIT — see
LICENSE-MIT - Apache License, Version 2.0 — see
LICENSE-APACHE
at your option. SPDX: MIT OR Apache-2.0.
Attribution is required
Redistributions — source, binary, modified, or unmodified — must
preserve the NOTICE
file and the copyright attribution, as required by Section 4(d) of the
Apache License.
Nexo-rs
Copyright 2026 Cristian García <informacion@cristiangarcia.co>
This product includes software developed by Cristian García.
Original project: https://github.com/lordmacu/nexo-rs
Why dual-licensed
Dual MIT / Apache-2.0 is the Rust ecosystem convention (rustc,
tokio, serde, clap, etc.). It maximizes downstream compatibility:
- MIT is compatible with GPLv2 (Apache-2.0 is not)
- Apache-2.0 grants explicit patent rights (MIT does not)
Users pick whichever fits their project.
Contributions
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in nexo-rs by you shall be dual-licensed as above, without any additional terms or conditions — per Section 5 of the Apache License.
API reference (rustdoc)
Every public type, trait, function, and module in the nexo-rs
workspace is documented via cargo doc. The CI workflow runs
cargo doc --workspace --no-deps and publishes the output under
/api/ on the same GitHub Pages deployment as this book.
Open the rustdoc
- Published site: https://lordmacu.github.io/nexo-rs/api/
- Local build:
cargo doc --workspace --no-deps --open
What's there
One rustdoc page per workspace crate:
| Crate | Contents |
|---|---|
agent | Top-level binary — mostly wiring; see src/main.rs. |
nexo-core | Agent trait, AgentRuntime, SessionManager, ToolRegistry, HookRegistry, agent-facing tools (memory, taskflow, self_report, delegate, workspace_git). |
nexo-broker | Broker trait (NatsBroker, LocalBroker), disk queue, DLQ. |
nexo-llm | LlmClient trait, MiniMax / Anthropic / OpenAI-compat / Gemini clients, retry + rate limiter. |
nexo-memory | Short-term / long-term / vector types, LongTermMemory API. |
nexo-config | YAML struct types, env/file placeholder resolution. |
nexo-extensions | ExtensionManifest, ExtensionDiscovery, StdioRuntime, CLI. |
nexo-mcp | MCP client + server primitives. |
nexo-taskflow | Flow, FlowStore, FlowManager, WaitEngine. |
nexo-resilience | CircuitBreaker. |
nexo-setup | Wizard field registry, YAML patcher. |
nexo-tunnel | Cloudflared tunnel helper. |
nexo-auth | Per-agent credential gauntlet, resolver, audit. |
nexo-plugin-* | Channel plugins (browser, whatsapp, telegram, email, google, gmail-poller). |
When to read rustdoc vs the book
| Goal | Start here |
|---|---|
| Understand a subsystem's purpose | this book |
| Read a specific trait's methods / signatures | rustdoc |
| Wire two subsystems together | book → rustdoc |
| Embed a crate in your own binary | rustdoc |
| Audit what's public API | rustdoc (anything not in rustdoc is internal) |
Building locally
# All crates, no dependencies:
cargo doc --workspace --no-deps
# Open the nexo-core rustdoc in a browser:
cargo doc -p nexo-core --no-deps --open
Warnings are rejected in CI (RUSTDOCFLAGS=-D warnings). Run the
same locally before pushing if you edited doc comments:
RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps
Public-API stability
The workspace has not committed to semver-level stability yet.
Public signatures change between code phases; follow PHASES.md and
commit history when upgrading.
Cross-links
- Contributing — how to add
///docs when you touch public surface - Architecture overview — the mental model that rustdoc fills in the fine detail for