Introduction

nexo-rs is a Rust framework for building multi-agent LLM systems that live on real messaging channels — WhatsApp, Telegram, email — instead of a chat webapp. Event-driven over NATS, per-agent tool sandboxes, drop-in configuration for private vs. public agents.

One process, many agents, many channels. Kate handles your personal Telegram; Ana works the WhatsApp sales line; a cron-style poller sweeps Gmail for leads — all sharing one broker, one tool registry, and one memory layer.

Single binary, ~34 MB. No Node, no npm, no Docker required. Stripped: 29 MB. Gzipped: 13 MB. Runs on a fresh VPS, on Termux without root, or as a systemd unit. The closest reference point is OpenClaw (TypeScript, Node): nexo-rs trades JS familiarity for a single static binary, a fault-tolerant NATS broker layer, per-agent capability sandboxes, durable workflows, secrets audit, and Termux-first portability — see vs OpenClaw for the full side-by-side.

flowchart LR
    WA[WhatsApp] --> NATS[(NATS broker)]
    TG[Telegram] --> NATS
    MAIL[Email / Gmail poller] --> NATS
    BROWSER[Browser CDP] --> NATS
    NATS --> ANA[Agent: Ana]
    NATS --> KATE[Agent: Kate]
    NATS --> OPS[Agent: ops-bot]
    ANA --> TOOLS[Tools & extensions]
    KATE --> TOOLS
    OPS --> TOOLS
    TOOLS --> MEM[(Memory: SQLite + sqlite-vec)]
    TOOLS --> LLM{{LLM providers}}

Why it exists

Most "agent frameworks" assume one LLM talking to one user through one UI. Real deployments are not shaped that way:

  • Several agents with different personas, models, and skills
  • Multiple channels (WA + Telegram + mail) feeding the same agents
  • Business logic that is not LLM-driven (scheduled tasks, regex email triage, lead notifications) running next to the LLM loop
  • Private prompts and pricing tables alongside an open-source core

nexo-rs is opinionated toward that shape.

What's in the box

AreaWhat ships
RuntimeMulti-agent core, SessionManager, Heartbeat, CircuitBreaker
BrokerNATS (async-nats = 0.35) + disk queue + DLQ + backpressure
LLMsMiniMax M2.5 (primary), Anthropic (OAuth + API), OpenAI-compat, Gemini
PluginsWhatsApp, Telegram, Email, Browser (CDP), Google (Gmail/Calendar/Drive/Sheets)
MemoryShort-term in-memory, long-term SQLite, vector via sqlite-vec
ExtensionsTOML manifest, stdio + NATS runtimes, CLI, 22 skills shipped
MCPClient (stdio + HTTP), agent as MCP server, hot-reload
TaskFlowDurable multi-step flow runtime with wait/resume
SoulIdentity, MEMORY.md, dreaming, workspace-git, transcripts

Who it is for

  • Developers who want to run real agents — not a ChatGPT demo with retrieval.
  • Multi-tenant single-install — several agents, several channels, isolated by config.
  • Fault-tolerance-first teams — disk queue, DLQ, circuit breakers, single-instance lock, no message drop on reconnect.
  • Anyone extending with their own stack — stdio extensions in any language, MCP, drop-in private agents.

What it is not

  • Not a chatbot, not a webapp. It has no UI of its own.
  • Not a replacement for LangChain/LlamaIndex as a "primitives library". It is an operational runtime.
  • Not a channel-abstraction layer. WhatsApp behaves like WhatsApp, Telegram like Telegram. The runtime surfaces channels, not uniforms them.

Next

Install nexo-rs (LLM-friendly guide)

Share this URL with any AI assistant to teach it how to install nexo-rs end-to-end on Linux or Termux: https://lordmacu.github.io/nexo-rs/install-for-ai.html

The page is intentionally linear: copy-paste each block in order. No menus to navigate, no marketing, every command is deterministic.


What you are installing


Pick your platform

  • Linux (Debian / Ubuntu / Arch / Fedora) → §A
  • Termux (Android, no root) → §B

Skip the section that doesn't apply.


§A — Linux install

A.1. System packages

Debian / Ubuntu:

sudo apt update
sudo apt install -y build-essential pkg-config libsqlite3-dev git curl

Arch:

sudo pacman -Syu --needed base-devel pkgconf sqlite git curl

Fedora:

sudo dnf install -y @development-tools pkgconf-pkg-config sqlite-devel git curl

A.2. Rust toolchain

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source "$HOME/.cargo/env"
rustup component add rustfmt clippy

A.3. Clone + build

git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
cargo build --release --bin agent

The compiled binary is at ./target/release/agent. Copy it into PATH (optional):

sudo install -m 0755 target/release/agent /usr/local/bin/agent

A.4. First-run wizard

agent setup

Follow the interactive prompts. Defaults are sane. The wizard writes config/agents.d/<your-agent>.yaml, IDENTITY.md, SOUL.md, and any channel YAMLs you opt into.

A.5. Run

agent

Or, for the web admin (loopback HTTP + Cloudflare tunnel):

agent admin

The admin command prints a one-time URL + password to stdout. Open the URL, log in, and configure from the browser.

A.6. (Optional) systemd service

sudo useradd -r -s /bin/false -d /srv/nexo-rs nexo
sudo mkdir -p /srv/nexo-rs
sudo cp -r config target/release/agent /srv/nexo-rs/
sudo chown -R nexo:nexo /srv/nexo-rs

sudo tee /etc/systemd/system/nexo-rs.service > /dev/null <<'EOF'
[Unit]
Description=nexo-rs agent
After=network.target

[Service]
Type=simple
User=nexo
WorkingDirectory=/srv/nexo-rs
ExecStart=/srv/nexo-rs/agent --config /srv/nexo-rs/config
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now nexo-rs

Logs: journalctl -u nexo-rs -f.

A.7. (Optional) NATS broker

A single-process install does not need NATS — the runtime falls back to in-process channels. Add NATS only when scaling beyond one host:

curl -L -o /tmp/nats.tar.gz \
  https://github.com/nats-io/nats-server/releases/download/v2.10.20/nats-server-v2.10.20-linux-amd64.tar.gz
tar -xzf /tmp/nats.tar.gz -C /tmp
sudo mv /tmp/nats-server-*/nats-server /usr/local/bin/
sudo systemctl enable --now nats-server  # if you have a unit file

Then in config/broker.yaml set kind: nats and url: nats://127.0.0.1:4222.


§B — Termux install (Android, no root)

B.1. Termux from F-Droid

Install Termux from https://f-droid.org/en/packages/com.termux/. Do not install from the Google Play Store — that build is outdated.

Open Termux. Then:

pkg update
pkg upgrade -y

B.2. Build dependencies

pkg install -y rust git curl sqlite openssl clang pkg-config

Optional extras (only the ones you'll use):

# media transcoding + OCR + youtube downloads
pkg install -y ffmpeg tesseract yt-dlp
# tmux for long-running tunnels and ssh
pkg install -y tmux openssh
# headless Chromium for the browser plugin
pkg install -y tur-repo
pkg install -y chromium
# Termux:API for sensors / SMS / clipboard
pkg install -y termux-api
# (also install the Termux:API companion app from F-Droid)

B.3. Clone + build

cd ~
git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
cargo build --release --bin agent

B.4. First-run wizard

./target/release/agent setup

B.5. Run

./target/release/agent

Or with the admin UI (the cloudflared tunnel works on Termux):

./target/release/agent admin

B.6. Keep running with the screen off

Termux apps get killed on doze unless you disable battery optimizations and acquire a wake lock:

  1. Disable optimizations: Android Settings → Apps → Termux → Battery → Unrestricted.

  2. Wake lock: in Termux, type:

    termux-wake-lock
    
  3. (Optional) auto-restart on boot: install Termux:Boot from F-Droid, then create ~/.termux/boot/00-nexo-rs:

    mkdir -p ~/.termux/boot
    cat > ~/.termux/boot/00-nexo-rs <<'EOF'
    #!/data/data/com.termux/files/usr/bin/sh
    termux-wake-lock
    cd ~/nexo-rs
    ./target/release/agent --config ./config >> ~/nexo-rs/agent.log 2>&1
    EOF
    chmod +x ~/.termux/boot/00-nexo-rs
    

B.7. Termux-specific tip — Chromium flags

The browser plugin (plugins: [browser]) needs the right Chromium launch flags on Termux. The defaults already cover Android; nothing extra to set. Just make sure chromium is on PATH (it is, after pkg install chromium).


Config layout (both platforms)

After agent setup runs, the project tree looks like:

nexo-rs/
├── config/
│   ├── agents.yaml          # opt-in dev defaults
│   ├── agents.d/            # your agents land here
│   │   └── <slug>.yaml
│   ├── broker.yaml          # NATS or local
│   ├── llm.yaml             # provider keys + model
│   └── plugins/             # one YAML per channel plugin
├── secrets/                 # mode 0600 token files (gitignored)
├── data/                    # SQLite databases (memory, taskflow, transcripts)
├── target/release/agent     # the built binary
└── agent.log                # if you redirected stdout

Edit YAML by hand or use the web admin (agent admin).


Troubleshooting

  • cargo build fails with linker errors on Linux — install build-essential and pkg-config (§A.1).
  • cargo build hits out of memory on Termux — close other apps, or build with one job: cargo build --release -j 1.
  • agent exits immediately with failed to load config — run agent setup first; the wizard creates the missing files.
  • WhatsApp QR pairing fails on Termux — make sure the device is on the same network as your phone, then open the QR pairing URL the daemon prints.
  • Admin tunnel URL doesn't respond — Cloudflare's quick tunnel occasionally rotates; restart agent admin and copy the new URL.

Useful commands after install

agent --help                                  # all subcommands
agent doctor capabilities --json              # which env toggles are armed
agent setup doctor                            # audit configured secrets
agent ext doctor --json                       # extension health
agent flow list                               # taskflow admin
agent dlq list                                # dead-letter queue

Full reference: https://lordmacu.github.io/nexo-rs/cli/reference.html


When asking an AI for help

Paste this URL into your prompt:

Install nexo-rs from https://lordmacu.github.io/nexo-rs/install-for-ai.html
on this machine. The OS is <Linux distro / Termux>. Stop after each
section to confirm output looks right.

The page above is the canonical, copy-paste-friendly install path. The full mdBook (https://lordmacu.github.io/nexo-rs/) covers the same ground in more depth — link there once the agent is up.

Installation

Pick the channel that matches your environment. Every channel produces the same nexo binary; the differences are in how it gets onto your machine and which dependencies come bundled.

Channel matrix

ChannelWhen to pick itTime to first runBundled runtime tools
Docker (GHCR)Production, CI, "just works"~30 sChrome, Chromium, cloudflared, ffmpeg, tesseract, yt-dlp
Nix flakeNixOS, reproducible dev shell~3-5 min coldNone (system-level)
Native (no Docker)Bare-metal Linux / macOS, full control~10-15 minNone (apt / brew / pacman)
TermuxPhone-hosted personal agent~15-20 minNone (pkg install)
From sourceContributors~5 min after toolchainNone

Quickest path — Docker

docker pull ghcr.io/lordmacu/nexo-rs:latest
docker run --rm \
  -v $(pwd)/config:/app/config:ro \
  -v $(pwd)/data:/app/data \
  -p 8080:8080 -p 9090:9090 \
  ghcr.io/lordmacu/nexo-rs:latest --help

The image is multi-arch (linux/amd64 + linux/arm64), built fresh on every push to main and every v* tag, with SBOM and SLSA provenance attestations. Full guide: Docker.

Build from source

For contributors and operators who want to track main directly:

git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
cargo build --release --bin nexo
./target/release/nexo --help

The workspace compiles 22 crates and produces the nexo binary plus a few smoke-test bins (browser-test, integration-browser-check, llm_smoke). Toolchain is pinned to Rust 1.80 (MSRV) via rust-toolchain.toml — no manual channel selection needed.

Prerequisites

  • Rust 1.80+ (rustup recommended)
  • NATS running locally or reachable over the network — for development:
    docker run -p 4222:4222 nats:2.10-alpine
    
    Production setup: see broker.yaml.
  • Git (the memory subsystem uses per-agent workspace-git)
  • Chrome / Chromium (only if you plan to use the browser plugin)

Verification

./target/release/nexo --version
cargo test --workspace --lib

nexo --version prints the build provenance line (commit + build timestamp) so a bug report carries enough context to reproduce.

Bootstrap script

For native or Termux installs, ./scripts/bootstrap.sh automates the whole process — installs the system deps, downloads NATS if not present, scaffolds config/, and runs the setup wizard.

./scripts/bootstrap.sh           # interactive
./scripts/bootstrap.sh --yes     # accept all defaults

The script auto-detects Termux ($PREFIX set) and switches to pkg install + broker.type: local so you don't need root or NATS on a phone.

Next steps

Native install (no Docker)

If you'd rather run nexo-rs directly on a Linux / macOS host — development loop, single-machine deploy, restricted container environment — this page walks through every step and names the bootstrap script that automates it.

Fast path

git clone git@github.com:lordmacu/nexo-rs.git
cd nexo-rs
./scripts/bootstrap.sh

scripts/bootstrap.sh verifies prerequisites, installs a local NATS, creates the runtime directories, stages example configs, and builds the agent binary. Re-runnable — each step is idempotent.

Keep reading for what it actually does (and what to do when a step needs manual intervention).

Prerequisites

ToolRequired forNotes
Rust (stable, edition 2021)building the binariesrust-toolchain.toml pins the channel
Gitcloning + per-agent workspace-gitdefault on most hosts
NATS ≥ 2.10the brokerbinary or dev docker container is fine
SQLite ≥ 3.38memory + broker disk queueships with most distros
Chrome / Chromiumbrowser plugin (optional)skip if you don't use the browser plugin
ffmpeg + ffprobemedia-related skills (optional)skip if you don't ship those skills
yt-dlp / tesseract / tmux / sshindividual skills (optional)each skill declares its requires.bins

On Ubuntu / Debian:

sudo apt update
sudo apt install -y build-essential pkg-config libsqlite3-dev git curl

On macOS:

xcode-select --install
brew install sqlite git

Install Rust

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"
rustup component add rustfmt clippy

The repo's rust-toolchain.toml pins the channel; no manual version pick is needed.

Install NATS

Pick one path:

Option A — native NATS server

# Linux x86_64
curl -L -o /tmp/nats.tar.gz \
  https://github.com/nats-io/nats-server/releases/download/v2.10.20/nats-server-v2.10.20-linux-amd64.tar.gz
tar -xzf /tmp/nats.tar.gz -C /tmp
sudo mv /tmp/nats-server-*/nats-server /usr/local/bin/

For macOS: brew install nats-server.

Start it:

nats-server -js                      # foreground
nats-server -js -D                   # foreground with debug
# or, as a systemd service: see below

Option B — dev throwaway via Docker

Even on a "no-Docker" box, a single short-lived container for the broker is often fine:

docker run -d --name nexo-nats --restart unless-stopped \
  -p 4222:4222 -p 8222:8222 nats:2.10-alpine

This is the same broker the compose stack would use; only the broker itself runs in a container.

Systemd unit (Linux, production)

/etc/systemd/system/nats-server.service:

[Unit]
Description=NATS Server
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/nats-server -js
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now nats-server

Build nexo-rs

git clone git@github.com:lordmacu/nexo-rs.git
cd nexo-rs
cargo build --release

The output is ./target/release/agent. Symlink it into $PATH if you want:

sudo ln -sf "$(pwd)/target/release/agent" /usr/local/bin/agent

Prepare runtime directories

mkdir -p ./data/{queue,workspace,media,transcripts}
mkdir -p ./secrets          # gitignored; holds API keys, nkey files, etc.
chmod 700 ./secrets         # restrictive — the credential gauntlet checks this

Stage config

The repo ships config/*.yaml with safe defaults. Override whatever you need:

# Optional: copy the ana sales agent template into the gitignored dir
cp config/agents.d/ana.example.yaml config/agents.d/ana.yaml

# Add an API key:
export MINIMAX_API_KEY=...
export MINIMAX_GROUP_ID=...
# or write to secrets/ files referenced from config/llm.yaml via ${file:...}

See Configuration — layout for the full reference.

Pair channels and set secrets

./target/release/agent setup

The wizard pairs WhatsApp / Telegram / Google / LLM credentials interactively. See Setup wizard.

First run

./target/release/agent --config ./config

Watch the startup summary — it tells you exactly which plugins loaded, which extensions were skipped and why, and whether the broker is reachable. If anything's missing, the log line names the specific file or env var to fix.

Run as a systemd service

/etc/systemd/system/nexo-rs.service:

[Unit]
Description=nexo-rs agent
Requires=nats-server.service
After=nats-server.service

[Service]
Type=simple
User=nexo
Group=nexo
WorkingDirectory=/srv/nexo-rs
Environment=RUST_LOG=info
Environment=AGENT_ENV=production
ExecStart=/usr/local/bin/agent --config /srv/nexo-rs/config
Restart=on-failure
RestartSec=5
# Optional: restrict where the agent can write
ReadWritePaths=/srv/nexo-rs/data /srv/nexo-rs/secrets

[Install]
WantedBy=multi-user.target
sudo useradd -r -s /bin/false -d /srv/nexo-rs nexo
sudo chown -R nexo:nexo /srv/nexo-rs
sudo systemctl daemon-reload
sudo systemctl enable --now nexo-rs

Logs:

journalctl -u nexo-rs -f

macOS launchd

~/Library/LaunchAgents/dev.nexo-rs.agent.plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>          <string>dev.nexo-rs.agent</string>
  <key>WorkingDirectory</key><string>/Users/you/nexo-rs</string>
  <key>ProgramArguments</key>
  <array>
    <string>/Users/you/nexo-rs/target/release/agent</string>
    <string>--config</string><string>/Users/you/nexo-rs/config</string>
  </array>
  <key>EnvironmentVariables</key>
  <dict>
    <key>RUST_LOG</key><string>info</string>
  </dict>
  <key>RunAtLoad</key>      <true/>
  <key>KeepAlive</key>      <true/>
</dict>
</plist>
launchctl load -w ~/Library/LaunchAgents/dev.nexo-rs.agent.plist
launchctl start dev.nexo-rs.agent

Verify

agent status                    # lists running agents
curl localhost:8080/ready       # readiness
curl localhost:9090/metrics     # Prometheus metrics

See Metrics + health.

Upgrading

cd nexo-rs
git pull
cargo build --release
sudo systemctl restart nexo-rs      # Linux
# or: launchctl kickstart -k gui/$UID/dev.nexo-rs.agent   # macOS

The graceful shutdown sequence drains in-flight work and persists the disk queue before exit.

Uninstalling

sudo systemctl disable --now nexo-rs nats-server
sudo rm /etc/systemd/system/{nexo-rs,nats-server}.service
sudo rm /usr/local/bin/{agent,nats-server}
sudo userdel nexo
rm -rf /srv/nexo-rs

See also

Termux (Android) install

Run nexo-rs directly on an Android phone under Termux. No Docker, no server — a self-hosted agent in your pocket.

Use this path for a personal agent (one phone, one WhatsApp, one Telegram). For multi-tenant / multi-process deployments the regular Linux setup on a server is the right shape.

Quickest path — pre-built .deb

Once a v* release is published (recipe lives in packaging/termux/build.sh), download the asset and install with one command:

# Inside Termux on the phone:
curl -LO https://github.com/lordmacu/nexo-rs/releases/latest/download/nexo-rs_aarch64.deb
pkg install ./nexo-rs_aarch64.deb

The deb pulls the runtime deps Termux already ships (libsqlite, openssl, ffmpeg, tesseract, python, yt-dlp). Its postinst scaffolds ~/.nexo/{data,secret} and prints the next steps. Skip the build-from-source section below if this works.

Root vs non-root

Everything in this guide runs without root. You do not need to root your phone to self-host nexo-rs on it.

Root only unlocks extras:

ScenarioNeeds root?
Build + run the agent daemon❌ no
Pair WhatsApp, Telegram, Google❌ no
Local broker (broker.type: local)❌ no
Native NATS Go binary❌ no (installs to $PREFIX/bin)
termux-wake-lock, Termux:Boot autostart❌ no
Install skills from pkg (ffmpeg, tesseract, yt-dlp)❌ no
MCP client / server mode❌ no
Browser plugin via cdp_url to a chromium you launched yourself❌ no
Docker compose stack (via proot-distro or Linux Deploy)✅ yes
SELinux permissive (if Chromium sandbox misbehaves)✅ yes
Running multiple proot-distro containers side by side✅ yes
Bypass Android's battery optimizer more aggressively✅ yes

Short version: don't root just for nexo-rs. Root if you want the full compose stack in a Linux-Deploy chroot, otherwise skip it.

What works

AreaStatus
Core runtime, memory, TaskFlow, dreaming✅ full
Broker: type: local (in-process) or native NATS Go binary✅ full
LLM providers (MiniMax / Anthropic / OpenAI-compat / Gemini)✅ all rustls-based
WhatsApp plugin (pure Rust + Signal Protocol)✅ pairing via Unicode QR
Telegram plugin✅ Bot API over HTTP
Gmail / Google plugin + gmail-poller✅ OAuth over HTTP
Extensions (stdio + NATS)✅ spawn works
Skills: fetch-url, dns-tools, rss, weather, wikipedia, pdf-extract, brave-search, wolfram-alpha, summarize, translate✅ pure Rust
MCP client + server✅ stdio + HTTP
Health / metrics / admin HTTP servers (8080 / 9090 / 9091)✅ unprivileged ports

What needs a tweak

ThingWorkaround
Service manager (no systemd)termux-services (runit) or tmux + nohup
Run at bootinstall the Termux:Boot app + drop a script in ~/.termux/boot/
Survives screen-offtermux-wake-lock (from the Termux:API add-on) before running the agent
Browser plugin (Chrome/Chromium)use cdp_url: to a chromium you start manually with --no-sandbox --disable-dev-shm-usage; or disabled: [browser] if you don't need it
Secrets file permission gauntletexport CHAT_AUTH_SKIP_PERM_CHECK=1 (Android filesystem perms model differs)
WhatsApp public tunnel (cloudflared)skip the public tunnel; pair locally via Unicode QR rendered on the terminal
Docker / composeuse broker.type: local or native NATS binary — no containers involved

Prerequisites

From a fresh Termux install:

pkg update
pkg install -y rust git curl sqlite openssl clang pkg-config

Optional (enables specific skills):

pkg install -y ffmpeg tesseract yt-dlp tmux openssh

Optional (browser plugin):

pkg install -y tur-repo
pkg install -y chromium

Optional (run in background without the terminal session alive):

pkg install -y termux-services termux-api
# install the companion app "Termux:API" from F-Droid

Fast path — bootstrap script

The repo's scripts/bootstrap.sh auto-detects Termux and picks the right defaults:

git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
./scripts/bootstrap.sh --yes

What it does on Termux:

  1. Verifies rust, git, curl, sqlite from pkg
  2. Downloads the static nats-server Go binary (arm64), drops it in $PREFIX/bin/or skip with --nats=skip to use the local broker
  3. Creates ./data/** and ./secrets/ (with Termux-compatible perms)
  4. Stages config/agents.d/*.example.yaml*.yaml if missing
  5. Runs cargo build --release (grab a coffee — ~20–40 min on phone hardware)
  6. Optionally launches agent setup to pair channels

Expect a ~60–100 MB final binary.

Manual install

1. Install Rust and deps

pkg install -y rust git curl sqlite openssl clang pkg-config

2. Clone and build

git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
cargo build --release --bin agent

3. Broker

Option A — local (simplest):

# config/broker.yaml
broker:
  type: local
  persistence:
    enabled: true
    path: ./data/queue

No NATS binary needed. All pub/sub stays in-process.

Option B — native NATS binary:

curl -L -o /tmp/nats.tar.gz \
  https://github.com/nats-io/nats-server/releases/download/v2.10.20/nats-server-v2.10.20-linux-arm64.tar.gz
tar -xzf /tmp/nats.tar.gz -C /tmp
install -m 0755 "$(find /tmp -name nats-server -type f | head -1)" \
  $PREFIX/bin/nats-server
nats-server -js &

Go binaries are static and work on Termux without libc surprises.

4. Runtime directories and secrets

mkdir -p ./data/{queue,workspace,media,transcripts} ./secrets

Termux stores files under /data/data/com.termux/files/home by default. Avoid pointing config paths at /sdcard — Android's scoped-storage model breaks directory permissions there.

5. Relax the credentials perm check

Android's filesystem doesn't support the same permission bits as Linux in the same way. The credentials gauntlet would refuse to boot with false-positive warnings:

export CHAT_AUTH_SKIP_PERM_CHECK=1

Add it to ~/.termux/termux.properties or a wrapper shell script so it's set every time.

6. Launch the wizard

./target/release/agent setup

For the WhatsApp pairing step, the wizard renders the QR as Unicode blocks directly in the terminal — scan from the phone's WhatsApp app (Settings → Linked Devices). No public tunnel needed.

7. Run the agent

termux-wake-lock                # keep CPU awake even with screen off
./target/release/agent --config ./config

Staying alive in the background

Android's aggressive task killing is the biggest operational surprise. Pick one:

A — termux-wake-lock + foreground notification

termux-wake-lock
# agent in foreground:
./target/release/agent --config ./config

The wake-lock persists until you run termux-wake-unlock or kill the session. Minimum friction, most reliable.

B — termux-services (runit)

pkg install -y termux-services
sv-enable termux-services
mkdir -p ~/.config/service/nexo-rs
cat > ~/.config/service/nexo-rs/run <<'EOF'
#!/data/data/com.termux/files/usr/bin/sh
cd /data/data/com.termux/files/home/nexo-rs
export CHAT_AUTH_SKIP_PERM_CHECK=1
exec ./target/release/agent --config ./config 2>&1
EOF
chmod +x ~/.config/service/nexo-rs/run
sv up nexo-rs
sv status nexo-rs

C — Termux:Boot (start on device boot)

Install the Termux:Boot app from F-Droid, then:

mkdir -p ~/.termux/boot
cat > ~/.termux/boot/start-agent <<'EOF'
#!/data/data/com.termux/files/usr/bin/sh
termux-wake-lock
cd /data/data/com.termux/files/home/nexo-rs
export CHAT_AUTH_SKIP_PERM_CHECK=1
exec ./target/release/agent --config ./config
EOF
chmod +x ~/.termux/boot/start-agent

Disabling the browser plugin

If you don't need headless browser control (most phone-hosted agents don't), drop it from config/extensions.yaml:

extensions:
  disabled: [browser]

Or, if you have tur-repo chromium installed and want nexo-rs to spawn it, use the browser.args field to forward the flags Termux needs:

# config/plugins/browser.yaml
browser:
  headless: true
  executable: /data/data/com.termux/files/usr/bin/chromium
  args:
    - --no-sandbox
    - --disable-dev-shm-usage
    - --disable-gpu

The built-in launch flags still apply; args is appended after them so you can also override any of the built-ins (Chrome's CLI parser uses last-wins).

Alternative: launch chromium yourself and attach via cdp_url:

# config/plugins/browser.yaml
browser:
  # Start chromium yourself with:
  #   chromium --headless --no-sandbox --disable-dev-shm-usage \
  #            --disable-gpu --remote-debugging-port=9222 &
  cdp_url: http://127.0.0.1:9222

When cdp_url is set, args is ignored — nexo-rs doesn't spawn Chrome, only connects to yours.

Verify

curl localhost:8080/ready
curl localhost:9090/metrics
./target/release/agent status

Upgrading

cd ~/nexo-rs
git pull
cargo build --release
# restart under whichever method you picked (wake-lock / runit / Boot)

Android's graceful shutdown still runs on SIGTERM — closing the Termux session or killing the process drains the disk queue cleanly.

See also

Install — Nix

Nexo ships a Nix flake that pins the toolchain (Rust 1.80, MSRV) and the native build deps so a contributor or operator can go from clean shell to working binary without touching the host system.

Run without installing

nix run github:lordmacu/nexo-rs -- --help

First invocation builds from source (~3-5 min on cold cache); subsequent runs hit the local Nix store.

Build a local binary

nix build github:lordmacu/nexo-rs
./result/bin/nexo --help

The binary is the same nexo produced by cargo build --release --bin nexo. Outputs a result/ symlink the operator can link into /usr/local/bin/ or copy elsewhere.

Contributor dev shell

git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
nix develop

Drops you into a shell with:

  • rustc 1.80 + cargo + clippy + rustfmt + rust-src
  • cargo-edit, cargo-watch, cargo-nextest, cargo-deny
  • mdbook + mdbook-mermaid (for mdbook build docs)
  • sqlite, pkg-config, openssl, libgit2 (build deps)

RUST_LOG=info is exported by default. The toolchain version is pinned in flake.nix — bump in lockstep with [workspace.package].rust-version in Cargo.toml.

What the flake does NOT install

The nexo binary alone is not enough for full functionality. Runtime tools the channel plugins shell out to live at the system level, not in the flake:

  • Chrome / Chromium — required by the browser plugin
  • cloudflared — used by the tunnel plugin
  • ffmpeg — media transcoding for WhatsApp voice notes
  • tesseract-ocr — OCR skill
  • yt-dlp — the yt-dlp extension

Operators install these via their distro's package manager. The native install guide lists the apt / pacman / brew commands. The Docker image bundles all of them — that's the path of least friction for a "just works" deploy.

Pinning a release

Once v* tags are published, pin to a specific release:

nix run github:lordmacu/nexo-rs/v0.1.1 -- --help

Or in a flake input:

{
  inputs.nexo-rs.url = "github:lordmacu/nexo-rs/v0.1.1";
}

Verifying the build

nix flake check

Runs nix flake check — verifies the flake metadata, evaluates all outputs (packages, apps, devShells, formatter) without actually building. Useful in CI to catch flake regressions early.

Troubleshooting

  • "experimental feature 'flakes' is disabled" — add to ~/.config/nix/nix.conf:
    experimental-features = nix-command flakes
    
  • First build is very slow — the build re-fetches and re-compiles every cargo dependency in the sandbox. Subsequent builds are cached. A future Phase 27.x will publish a cachix cache so nix build pulls the binary directly.
  • Build fails on macOS arm64git2-rs occasionally lags on Apple silicon. Workaround: build the binary inside the Docker image instead (see Docker).

Quick start

Minimum viable agent running in five minutes. Covers: NATS, one agent, one channel, one LLM key.

1. Start NATS

docker run -d --name nexo-nats -p 4222:4222 nats:2.10-alpine

2. Build the binary

git clone git@github.com:lordmacu/nexo-rs.git
cd nexo-rs
cargo build --release

3. Provide an LLM key

Pick one provider to get started. MiniMax M2.5 is the primary:

export MINIMAX_API_KEY=your-key
export MINIMAX_GROUP_ID=your-group-id

Or Anthropic:

export ANTHROPIC_API_KEY=sk-ant-...

The shipped config/llm.yaml reads both via ${ENV_VAR}.

4. Run the setup wizard

./target/release/agent setup

The wizard walks you through:

  • Choosing a default LLM provider
  • Pairing any channels you want (WhatsApp QR, Telegram bot token, Google OAuth)
  • Writing secrets into ./secrets/ (gitignored)

See Setup wizard for the full step-by-step.

5. Run the agent

./target/release/agent --config ./config

First boot emits a startup summary listing:

  • which plugins loaded
  • which extensions were discovered / skipped (and why)
  • which LLM providers are wired
  • the NATS connection state

If anything is missing, the log line tells you exactly what to fix.

6. Talk to it

If you paired Telegram, send a message to the bot. If you paired WhatsApp, send a message to the paired number. The agent replies via the same channel.

What you just ran

sequenceDiagram
    participant U as User
    participant CH as Channel plugin
    participant B as NATS
    participant A as Agent runtime
    participant L as LLM provider

    U->>CH: Inbound message
    CH->>B: publish plugin.inbound.<channel>
    B->>A: deliver
    A->>L: chat.completion(tools)
    L-->>A: assistant turn
    A->>B: publish plugin.outbound.<channel>
    B->>CH: deliver
    CH-->>U: Outbound reply

Next

Setup wizard

The setup wizard is the recommended way to configure nexo-rs on a fresh install. It pairs channels, writes secrets, and patches the YAML config files so the runtime boots with everything it needs.

./target/release/agent setup

Run it from the repo root (or wherever your config/ directory lives).

What the wizard does

flowchart TD
    START([agent setup]) --> MENU{Menu}
    MENU --> LLM[LLM provider]
    MENU --> WA[WhatsApp pairing]
    MENU --> TG[Telegram bot]
    MENU --> GOOG[Google OAuth]
    MENU --> MEM[Memory DB location]
    MENU --> INFRA[NATS + runtime]
    MENU --> SKILLS[Enable / disable skills]

    LLM --> WRITE1[Write secrets/<br/>patch llm.yaml]
    WA --> QR[Scan QR<br/>write session dir]
    TG --> TOKEN[Ask bot token<br/>write secret]
    GOOG --> OAUTH[Open browser<br/>PKCE flow]
    MEM --> WRITE2[Patch memory.yaml]
    INFRA --> WRITE3[Patch broker.yaml]
    SKILLS --> WRITE4[Patch extensions.yaml]

    WRITE1 --> DONE([Done])
    QR --> DONE
    TOKEN --> DONE
    OAUTH --> DONE
    WRITE2 --> DONE
    WRITE3 --> DONE
    WRITE4 --> DONE

Every step is optional. You can run setup repeatedly — each section is idempotent.

Steps in detail

LLM provider

Prompts for the default provider (MiniMax, Anthropic, OpenAI-compat, Gemini). Writes the API key to ./secrets/<provider>_api_key.txt and ensures config/llm.yaml references it via ${file:...} or the corresponding env var.

WhatsApp pairing (multi-instance)

Per-agent. Asks which agent you are pairing and which instance label to use (personal, work, …). Each instance gets its own session dir under ./data/workspace/<agent>/whatsapp/<instance> and an allow_agents list (defense-in-depth ACL). The wizard:

  1. Normalises config/plugins/whatsapp.yaml to sequence form (legacy single-mapping entries are auto-converted on first edit).
  2. Upserts the entry by instance label.
  3. Writes credentials.whatsapp: <instance> on the chosen agent's YAML — agents.yaml if the agent lives there, otherwise the matching agents.d/*.yaml.
  4. Launches the pairing loop and renders the QR as Unicode blocks. Scan with WhatsApp → Settings → Linked Devices.
  5. Runs the credential gauntlet so any drift surfaces immediately.

Re-run the wizard once per number you want to pair; instance labels are append-friendly.

Telegram bot (multi-instance)

Same shape as WhatsApp. Asks for instance label (default <agent>_bot) and bot token from @BotFather. Token lands at ./secrets/<instance>_telegram_token.txt with mode 0o600; the YAML references it via ${file:...} so secrets never live in telegram.yaml directly. Adds credentials.telegram: <instance> on the agent.

Google OAuth

The wizard writes one entry per agent in config/plugins/google-auth.yaml:

google_auth:
  accounts:
    - id: ana@google
      agent_id: ana
      client_id_path:     ./secrets/ana_google_client_id.txt
      client_secret_path: ./secrets/ana_google_client_secret.txt
      token_path:         ./secrets/ana_google_token.json
      scopes: [https://www.googleapis.com/auth/gmail.modify]

Two consent flows are offered after the YAML is written:

  • Device-code (default — works headless / over SSH): the wizard prints verification_url + a 6-character user_code. Open the URL on any device, type the code, approve. The wizard polls oauth2.googleapis.com/token until approval and persists the refresh_token at token_path (mode 0o600).
  • Skip and consent later via the google_auth_start LLM tool — uses the loopback PKCE flow, requires a local browser.

Scopes are comma-separated at the prompt; defaults to gmail.modify. Re-running with a different id adds a second account; re-running with the same id overwrites in place.

Memory DB location

Lets you pick where the SQLite long-term memory file lives. Default is ./data/memory.db. Per-agent isolation is on by default — each agent gets its own DB file under its workspace.

Infrastructure (NATS + runtime)

Asks for the NATS URL, optional user/password, and timeouts. Patches config/broker.yaml.

Skills on/off

Lets you selectively disable shipped extensions you don't plan to use (reduces tool surface exposed to the LLM).

Files the wizard touches

TargetWhat it writes
config/llm.yamlProvider entries, base_url, auth mode
config/plugins/whatsapp.yamlsession_dir, media_dir
config/plugins/telegram.yamltoken (via ${file:...}), allow-list
config/plugins/google.yamlOAuth bundle path, scopes
config/memory.yamlDB location
config/broker.yamlNATS URL, creds
config/extensions.yamlenabled/disabled list
./secrets/*Plaintext secret files (gitignored)

Every YAML patch preserves existing keys and comments via the yaml_patch module — your hand edits survive.

Re-running

Re-run agent setup as many times as you want. Paired channels are detected and skipped unless you explicitly ask to re-pair. To wipe a paired session:

./target/release/agent setup wipe whatsapp --agent ana

Troubleshooting

  • WhatsApp QR expires too fast → the QR refreshes every ~20s; the wizard re-renders. Scan from the phone with a stable network.
  • Google OAuth fails with redirect_uri_mismatch → the wizard binds to 127.0.0.1:<port>; make sure your OAuth client allows http://127.0.0.1 as a redirect URI.
  • NATS unreachable → the wizard will warn but still write config. The runtime's disk queue will drain once NATS comes back.

Agent-centric setup wizard

The hub menu's Configurar agente (canal, modelo, idioma, skills) entry drops the operator into a per-agent submenu. Where the rest of the wizard groups actions by service (Telegram, OpenAI, the browser plugin), this submenu groups them by agent: pick one agent up front, then mutate its model, language, channels, and skills from a single dashboard. Every action reuses the existing channel / LLM / skill flows underneath, so behavior stays in lockstep with the rest of the wizard.

./target/release/agent setup
# → Configurar agente (canal, modelo, idioma, skills)

Dashboard

Agente: kate
  Modelo:   anthropic / claude-haiku-4-5  [creds ✔]
  Idioma:   es
  Canales:  ✔ telegram:default  (bound)
            ✗ whatsapp:default  (unbound)
  Skills:   8 / 24 attached

The dashboard is recomputed from disk on every loop iteration, so the screen always reflects the most recent YAML state.

Action menu

After the dashboard renders, the operator picks one of:

ActionEffect
ModeloAttach / detach / change the LLM provider + model name. Re-uses the LLM credential form when secrets are missing.
IdiomaPick from es / en / pt / fr / it / de, or clear the directive.
CanalesAuth/Reauth, Bind, or Unbind a channel for this agent. Auth flows are the same services_imperative dispatchers the legacy menu uses.
SkillsMulti-select against the skill catalog. Newly added skills with required secrets prompt for creds.
← volverExit the submenu, return to the hub.

YAML mutations

ActionYAML pathOperation
Attach modelagents[<id>].model.provider, …model.modelupsert_agent_field
Detach modelagents[<id>].modelremove_agent_field
Set languageagents[<id>].languageupsert_agent_field
Clear languageagents[<id>].languageremove_agent_field
Bind channelagents[<id>].plugins[], agents[<id>].inbound_bindings[]append_agent_list_item (idempotent)
Unbind channelagents[<id>].plugins[], agents[<id>].inbound_bindings[]remove_agent_list_item by predicate
Replace skillsagents[<id>].skillsupsert_agent_field (full sequence)

All mutations land atomically (tempfile + rename) and are gated by the same process-wide YAML mutex the legacy upsert path uses, so concurrent wizard sessions don't corrupt the file.

Hot-reload

After every successful mutation, the wizard fires a best-effort nexo --config <dir> reload so a running daemon picks up the YAML edit without a manual restart. The call is fire-and-forget: when the binary isn't on PATH or the daemon isn't running, the wizard keeps going silently.

Where the code lives

  • crates/setup/src/agent_wizard.rs — submenu + dashboard.
  • crates/setup/src/yaml_patch.rsread_agent_field, upsert_agent_field, remove_agent_field, append_agent_list_item, remove_agent_list_item.
  • crates/setup/tests/agent_wizard_yaml.rs — schema-roundtrip tests that re-parse the mutated YAML through nexo_config::AgentsConfig.

Verifying releases

Every Nexo release artifact is signed with Sigstore Cosign using keyless OIDC — no long-lived private key, no PGP key management, no out-of-band trust establishment. The signature is tied to the GitHub Actions workflow run that produced the artifact, and a public record lives in the Rekor transparency log.

Why keyless

Traditional signing requires a long-lived signing key. If it leaks, every past release becomes suspect. Keyless signing instead anchors each signature to:

  1. The GitHub Actions OIDC identity of the workflow run (https://token.actions.githubusercontent.com)
  2. The specific repo + workflow file that ran (https://github.com/lordmacu/nexo-rs/.github/workflows/...)
  3. The commit + ref the workflow built from

A short-lived certificate (10 min validity) is issued by Sigstore's fulcio CA, the artifact is signed with it, and the whole bundle is recorded in rekor (immutable). To forge a signature, an attacker would need to compromise GitHub's OIDC infra and the exact workflow path — and even then the forgery shows up in the public log.

Install Cosign

# macOS:
brew install cosign

# Linux (Debian/Ubuntu):
curl -L "https://github.com/sigstore/cosign/releases/latest/download/cosign-linux-amd64" \
  -o /usr/local/bin/cosign
chmod +x /usr/local/bin/cosign

# Linux (Fedora/RHEL):
sudo dnf install cosign

# Verify the install:
cosign version

Verify a Docker image

Every image at ghcr.io/lordmacu/nexo-rs is cosign-signed by the docker.yml workflow. Verify any tag with:

cosign verify ghcr.io/lordmacu/nexo-rs:latest \
  --certificate-identity-regexp 'https://github.com/lordmacu/nexo-rs/.*' \
  --certificate-oidc-issuer https://token.actions.githubusercontent.com

A successful verification prints the full certificate + the Rekor entry URL. Anything else (signature missing, identity mismatch, broken cert chain) means don't trust this image — check the release notes, file an issue.

Verify a downloaded binary / .deb / .rpm / .tar.gz

The sign-artifacts.yml workflow attaches three files next to every release asset:

  • <asset>.sig — the raw signature
  • <asset>.pem — the leaf certificate
  • <asset>.bundle — combined Sigstore bundle (preferred; carries the inclusion proof)

Verify with the bundle (recommended, single command):

cosign verify-blob \
  --bundle nexo-rs_0.1.1_amd64.deb.bundle \
  --certificate-identity-regexp 'https://github.com/lordmacu/nexo-rs/.*' \
  --certificate-oidc-issuer https://token.actions.githubusercontent.com \
  nexo-rs_0.1.1_amd64.deb

Or with the standalone .sig + .pem if you prefer:

cosign verify-blob \
  --signature nexo-rs_0.1.1_amd64.deb.sig \
  --certificate nexo-rs_0.1.1_amd64.deb.pem \
  --certificate-identity-regexp 'https://github.com/lordmacu/nexo-rs/.*' \
  --certificate-oidc-issuer https://token.actions.githubusercontent.com \
  nexo-rs_0.1.1_amd64.deb

Verify in CI / scripted contexts

Drop this in a deploy pipeline:

#!/usr/bin/env bash
set -euo pipefail

ASSET="${1:?usage: $0 <asset-path>}"
BUNDLE="${ASSET}.bundle"

if [ ! -f "$BUNDLE" ]; then
    echo "ERROR: $BUNDLE missing — refusing to deploy unsigned artifact" >&2
    exit 1
fi

cosign verify-blob \
  --bundle "$BUNDLE" \
  --certificate-identity-regexp 'https://github.com/lordmacu/nexo-rs/.*' \
  --certificate-oidc-issuer https://token.actions.githubusercontent.com \
  "$ASSET" \
  || { echo "ERROR: signature verification failed for $ASSET" >&2; exit 2; }

Inspecting the transparency log

Every signature is searchable on Rekor:

# Search by artifact sha256:
cosign tree ghcr.io/lordmacu/nexo-rs:latest

The output shows every cosign-related artifact attached to the image (signatures, attestations, SBOMs) plus the Rekor log index where each was recorded.

What if verification fails

  1. Identity regex doesn't match — the asset may have been built from a fork / unofficial workflow. Re-download from the GitHub release page directly.
  2. bundle file missing — older releases (pre-Phase 27.3) don't have signatures. Tag v0.1.1 is the first signed release.
  3. Cert chain expired / revoked — Sigstore's fulcio root CA has a long lifespan, but the leaf cert is short-lived. cosign automatically fetches the right TUF root; if you see chain errors run cosign initialize to refresh local trust roots.
  4. Network errors talking to Rekor / Fulcio — both have CDN in front. Retry, or use --insecure-ignore-tlog for local verification (drops the transparency log check — only safe in air-gapped trust contexts).

Out of scope (for now)

  • Long-lived PGP keys for the apt / yum repos — needs Phase 27.4 signed-repo work to consume them on the user side. Until that ships, .deb / .rpm signatures live in the Cosign world only.
  • A Homebrew bottle-signing path that lets brew validate without the OIDC chain — Phase 27.6 follow-up.

Reproducible builds + SBOM

Every Nexo release ships with two artefacts that let an operator verify provenance and exact composition:

  1. CycloneDX SBOM (sbom-cyclonedx.json) — every cargo dependency at the exact version + hash that was compiled into the binary.
  2. SPDX SBOM (sbom-spdx.json) — full filesystem scan via syft, captures anything that wasn't a cargo dep (bundled binaries, generated assets, vendored data files).

Both SBOMs are Cosign-signed (*.bundle) using the same keyless OIDC chain documented in Verifying releases.

Reading the SBOMs

# Pretty-print the CycloneDX dep tree:
jq '.components | map({name, version, purl})' sbom-cyclonedx.json | less

# Find a specific crate:
jq '.components[] | select(.name == "tokio")' sbom-cyclonedx.json

# Audit the cargo deps with `cargo-audit` (run against the SBOM,
# without rebuilding):
cargo audit --db ~/.cargo/advisory-db --json | \
  jq -r '.vulnerabilities.list[].advisory.id'

Reproducible build claim

The release workflow targets bit-identical binary between two runs given the same git sha + rust-toolchain.toml + Cargo.lock. The pipeline pins:

  • Rust toolchain: rust-toolchain.toml fixes the channel + components.
  • Dependency versions: Cargo.lock is committed and --locked is used by every release build.
  • Build environment: GitHub Actions ubuntu-latest runner + cargo build --release with no RUSTFLAGS overrides.
  • Build provenance: SLSA Level 2 attestation generated by actions/attest-build-provenance (Phase 27.2 wiring).
  • Cosign signature: each binary + SBOM signed via OIDC (Phase 27.3).

Reproducing a release locally

# 1. Check out the exact tag.
git clone https://github.com/lordmacu/nexo-rs && cd nexo-rs
git checkout v0.1.1

# 2. Build with the locked deps.
rustup show       # confirms the toolchain matches rust-toolchain.toml
cargo build --release --bin nexo --locked

# 3. Compare your binary's sha256 against the release asset:
sha256sum target/release/nexo
# Expected: same hash listed in `sha256sums.txt` on the GitHub release.

If the hashes don't match: the build is not reproducible on your host. Common reasons:

  • Different glibc version → embedded __VERSIONED_SYMBOL strings drift. The release workflow runs on ubuntu-latest (currently Ubuntu 24.04, glibc 2.39); building on Debian 12 (glibc 2.36) produces different bytes.
  • Different LLVM in your local rustc build (rare, mostly affects Mac users compiling with stable + nightly side-by-side).
  • Local ~/.cargo/config.toml injecting RUSTFLAGS.
  • Build PROFILE-DEV vs PROFILE-RELEASE.

For a guaranteed bit-identical reproduction, build inside the same container the workflow uses:

docker run --rm -v $(pwd):/src -w /src \
  rust:1.80-bookworm \
  cargo build --release --bin nexo --locked

This reproduces what the GitHub Actions runner would do — same glibc, same toolchain version, same LLVM.

SLSA verification

The workflow attaches an attestation.intoto.jsonl (SLSA Level 2 provenance) per release. Verify with slsa-verifier:

go install github.com/slsa-framework/slsa-verifier/v2/cli/slsa-verifier@latest

slsa-verifier verify-artifact nexo \
  --provenance-path attestation.intoto.jsonl \
  --source-uri github.com/lordmacu/nexo-rs \
  --source-tag v0.1.1

A green verification proves:

  • The artefact came from the lordmacu/nexo-rs repo
  • It was built by a GitHub-hosted runner (not a fork or local box)
  • The build inputs match what's recorded in the provenance

Auditing for known CVEs

The SBOM lets cargo-audit work without rebuilding:

# Convert CycloneDX → cargo-audit's format:
cyclonedx-cli convert --input-format json \
  --output-format json sbom-cyclonedx.json | \
  jq '...' > deps.json

# Or just feed it to grype (broader scope, multi-format):
grype sbom:./sbom-cyclonedx.json
grype sbom:./sbom-spdx.json

Grype catches CVEs across both Rust crates and any system-level deps captured by syft.

Out of scope (deferred)

  • apk / pkg SBOM for the Termux deb — Termux's package metadata doesn't speak SPDX yet. The release SBOMs cover the same artifact contents though.
  • Reproducible Docker image layers — the current Dockerfile uses apt-get update && apt-get install which pulls whatever's latest at build time. Pinning to specific Debian package versions is a follow-up (Phase 34 hardening).

Architecture overview

nexo-rs is a single-process multi-agent runtime. One binary (agent) hosts every agent, every channel plugin, every extension, and the persistence layer. Coordination between components happens over NATS (with a local tokio-mpsc fallback when NATS is offline).

Why single-process: shared in-memory caches, zero IPC overhead between agent and tool invocations, simpler ops. The broker and disk queue give us the durability a multi-process layout would provide, without the coordination cost.

High-level layout

flowchart TB
    subgraph PROC[agent process]
        direction TB

        subgraph PLUGINS[Channel plugins]
            WA[WhatsApp]
            TG[Telegram]
            MAIL[Email / Gmail poller]
            BR[Browser CDP]
            GOOG[Google APIs]
        end

        subgraph BUS[Event bus]
            NATS[(NATS)]
            LOCAL[(Local mpsc fallback)]
            DQ[(Disk queue + DLQ)]
        end

        subgraph AGENTS[Agent runtimes]
            A1[Agent: ana]
            A2[Agent: kate]
            A3[Agent: ops]
        end

        subgraph STORE[Persistence]
            STM[(Short-term sessions<br/>in-memory)]
            LTM[(Long-term memory<br/>SQLite + sqlite-vec)]
            WS[(Workspace-git<br/>per agent)]
        end

        subgraph TOOLS[Tools & integrations]
            EXT[Extensions<br/>stdio / NATS]
            MCP[MCP client / server]
            LLM[LLM providers]
        end

        PLUGINS --> BUS
        BUS --> AGENTS
        AGENTS --> BUS
        AGENTS --> STORE
        AGENTS --> TOOLS
        TOOLS --> LLM
    end

    USERS[End users] <--> PLUGINS

Workspace crates

The Cargo.toml workspace defines these member crates:

CrateResponsibility
crates/coreAgent runtime, trait, SessionManager, HookRegistry, heartbeat, tool registry
crates/brokerNATS client, local fallback, disk queue, DLQ, backpressure
crates/llmLLM clients (MiniMax, Anthropic, OpenAI-compat, Gemini), retry, rate limiter
crates/memoryShort-term sessions, long-term SQLite, vector search via sqlite-vec
crates/configYAML parsing, env-var resolution, secrets loading
crates/extensionsManifest parser, discovery, stdio + NATS runtimes, watcher, CLI
crates/mcpMCP client (stdio + HTTP), server mode, tool catalog, hot-reload
crates/taskflowDurable flow state machine with wait/resume
crates/resilienceCircuitBreaker three-state machine
crates/setupInteractive wizard, YAML patcher, pairing flows
crates/tunnelPublic HTTPS tunnel for pairing / webhooks
crates/plugins/browserChrome DevTools Protocol client
crates/plugins/whatsappWrapper over whatsapp-rs (Signal Protocol)
crates/plugins/telegramBot API client
crates/plugins/emailIMAP / SMTP
crates/plugins/gmail-pollerCron-style Gmail → broker bridge
crates/plugins/googleGmail / Calendar / Drive / Sheets tools

Binaries

Defined in Cargo.toml:

BinaryEntryPurpose
agentsrc/main.rsMain daemon; also exposes setup, dlq, ext, flow, status subcommands
browser-testsrc/browser_test.rsCDP integration smoke test
integration-browser-checksrc/integration_browser_check.rsEnd-to-end browser flow validation
llm_smokesrc/bin/llm_smoke.rsLLM provider smoke test

Runtime topology

agent runs a single tokio multi-thread runtime. Work is split into independent tasks:

flowchart LR
    MAIN[main tokio runtime]
    MAIN --> PA[Per-agent runtime task]
    MAIN --> PI[Plugin intake loops]
    MAIN --> HB[Heartbeat scheduler]
    MAIN --> MCP[MCP runtime manager]
    MAIN --> EXT[Extension stdio runtimes]
    MAIN --> MET[Metrics server :9090]
    MAIN --> HEALTH[Health server :8080]
    MAIN --> ADMIN[Admin console :9091]
    MAIN --> LOCK[Single-instance lock watcher]

Each agent runtime owns its own subscription to inbound topics, its own session manager view, its own LLM-loop state. Agents do not share mutable in-memory state — coordination between agents happens over the event bus (agent.route.<target_id>).

What lives where — quick mental model

  • A message arrives → lands on plugin.inbound.<channel> (NATS)
  • Agent runtime consumes itSessionManager attaches or creates a session, HookRegistry fires before_message
  • LLM loop runs → tools invoked through the registry, which calls into extensions / MCP / built-ins, each wrapped by CircuitBreaker
  • Tool result flows backafter_tool_call hooks fire, LLM decides next turn
  • Agent emits reply → publishes to plugin.outbound.<channel>
  • Channel plugin delivers → physical message goes to the user

Details per subsystem:

Agent runtime

The agent runtime is the per-agent machinery that consumes inbound events, drives the LLM loop, invokes tools, and emits outbound events. One AgentRuntime is instantiated per configured agent at boot; each runs as its own async task.

Source: crates/core/src/agent/ (behavior.rs, agent.rs, runtime.rs, hook_registry.rs), boot in src/main.rs.

AgentBehavior trait

Every agent implements AgentBehavior (crates/core/src/agent/behavior.rs). The trait is intentionally small — default no-ops let built-in types (like LlmAgentBehavior) override only what they need.

MethodFires onDefault
on_message(ctx, msg)Inbound message from a pluginno-op
on_event(ctx, event)Any event on a subscribed topicno-op
on_heartbeat(ctx)Periodic tick (if heartbeat enabled)no-op
decide(ctx, msg)LLM-reasoning hook (stub for custom flows)empty string

The shipped LlmAgentBehavior implements the full chat-completion loop with tool calls, streaming, rate-limited retry, and hook fan-out.

Boot sequence

sequenceDiagram
    participant Main as src/main.rs
    participant Cfg as AppConfig
    participant Disc as Extension discovery
    participant SM as SessionManager
    participant TR as ToolRegistry
    participant LLM as LLM client
    participant AR as AgentRuntime
    participant Bus as Broker

    Main->>Cfg: load(config_dir)
    Main->>Disc: run_extension_discovery()
    Main->>SM: with_cap(ttl, max_sessions)
    Main->>TR: register built-ins + extensions + MCP
    Main->>LLM: build per provider (w/ CircuitBreaker)
    loop per agent in config
        Main->>AR: new(agent_id, behavior, tools, sm, llm, broker)
        AR->>Bus: subscribe plugin.inbound.<channel>+
        AR->>Bus: subscribe agent.route.<agent_id>
        AR-->>Main: ready
    end
    Main->>Main: install signal handlers
    Main->>Main: serve forever

Request/response lifecycle

A single inbound message drives the following flow inside one agent runtime:

sequenceDiagram
    participant Bus as NATS
    participant AR as AgentRuntime
    participant SM as SessionManager
    participant HR as HookRegistry
    participant LLM as LLM
    participant TR as ToolRegistry
    participant Ext as Extension / MCP / built-in

    Bus->>AR: plugin.inbound.<ch>
    AR->>SM: get_or_create(session_key)
    AR->>HR: fire("before_message")
    loop LLM turn loop
        AR->>LLM: completion(messages, tools)
        LLM-->>AR: assistant turn (text or tool_calls)
        alt tool_calls present
            AR->>HR: fire("before_tool_call", name, args)
            AR->>TR: invoke(tool_name, args)
            TR->>Ext: call
            Ext-->>TR: result
            TR-->>AR: result
            AR->>HR: fire("after_tool_call", name, result)
        else text only
            AR->>Bus: publish plugin.outbound.<ch>
        end
    end
    AR->>HR: fire("after_message")

SessionManager

Defined in crates/core/src/session/manager.rs. Tracks per-user conversational state in memory.

  • Key: SessionKey derived from (agent_id, channel, sender_id); group chats get one session per group
  • Storage: DashMap<SessionKey, Session> — lock-free concurrent map
  • TTL: configured via memory.short_term.session_ttl (default 30 min); each access updates last_access
  • Cap: soft limit DEFAULT_MAX_SESSIONS = 10,000; on overflow the oldest-idle session is evicted before insert
  • Sweeper: background task scans every 1 s, removes expired entries
  • Callbacks: on_expire() fires via tokio::spawn when a session is dropped — used by the MCP runtime to tear down per-session children
stateDiagram-v2
    [*] --> Active: first message
    Active --> Active: on_message / on_event<br/>(last_access updated)
    Active --> Expired: idle > TTL
    Active --> Evicted: cap exceeded,<br/>oldest-idle chosen
    Expired --> [*]: sweeper removes
    Evicted --> [*]: on_expire() fires

HookRegistry

Defined in crates/core/src/agent/hook_registry.rs. Lets extensions inject behavior at well-known points in the lifecycle without patching the runtime.

  • Hook names: arbitrary strings. In practice the runtime fires: before_message, after_message, before_tool_call, after_tool_call, on_session_start, on_session_end
  • Fan-out: sequential by priority (lower first), insertion order breaks ties
  • Cap: 128 handlers per hook name — defensive guard against a buggy extension re-registering on every reload
  • Errors: logged, treated as Continue — one misbehaving hook does not cascade into the rest
  • Override: a hook may return Override(new_args) to mutate what the next hook (or the runtime itself) sees

Heartbeat

# per-agent config
heartbeat:
  enabled: true
  interval: 30s
  • Scheduled per agent if heartbeat.enabled: true
  • Interval parsed via humantime — any humantime duration works
  • Each tick:
    1. Fires AgentBehavior::on_heartbeat(ctx)
    2. Publishes agent.events.<agent_id>.heartbeat
  • Typical uses: proactive messages ("good morning"), reminders, external state syncs (pull Gmail, scan calendar), liveness pings

Graceful shutdown

src/main.rs installs SIGTERM / Ctrl+C handlers. On signal, the process tears down in a specific order so in-flight work finishes cleanly:

flowchart TD
    SIG[SIGTERM / Ctrl+C] --> C1[Cancel dream-sweep loops<br/>5 s grace]
    C1 --> C2[Mark /ready = false<br/>stop new traffic]
    C2 --> C3[Stop plugin intake<br/>no new inbound]
    C3 --> C4[Shutdown MCP runtime manager<br/>5 s clean close]
    C4 --> C5[Shutdown extensions<br/>5 s grace then kill_on_drop]
    C5 --> C6[Stop agent runtimes<br/>drain buffered messages]
    C6 --> C7[Abort metrics + health tasks]
    C7 --> EXIT([exit 0])

This order is enforced in src/main.rs around lines 1389–1458. Extensions get the longest grace period because stdio children can be mid-tool-call; the disk queue absorbs any events that the plugins couldn't finish publishing.

Why this shape

  • One tokio runtime, many tasks: lets you run 10 agents on one CPU core when idle, saturates cores under load. No thread-per-agent bloat.
  • No shared mutable state across agents: each agent holds its own registry views, its own session map. Cross-agent communication goes over the bus → visible, replayable, testable.
  • Hooks instead of inheritance: extensions customize behavior without recompiling the core. Every insertion point is named, sequenced, and capped.

Event bus (NATS)

Every piece of communication between plugins, agents, and the broker layer itself flows over NATS (async-nats = 0.35). When NATS is offline, a local tokio::mpsc bus takes over and a SQLite-backed disk queue holds events until reconnection. No events are lost.

Source: crates/broker/ (nats.rs, local.rs, disk_queue.rs, topic.rs).

Why NATS

  • Subject-based routing fits the "N plugins × M agents" fan-out naturally (plugin.inbound.* wildcards)
  • Low-latency pub/sub with no broker-side state to manage
  • Cluster-ready without rewriting the data plane
  • Async-nats is mature, has JetStream if we ever need it

The design doc discusses the alternatives (RabbitMQ, Redis streams) that were rejected; see proyecto/design-agent-framework.md.

Subject namespace

PatternDirectionExampleWho publishesWho subscribes
plugin.inbound.<plugin>plugin → agentplugin.inbound.whatsappChannel pluginsAgent runtimes
plugin.inbound.<plugin>.<instance>plugin → agentplugin.inbound.telegram.sales_botMulti-instance plugins (WA, TG)Agent runtimes
plugin.outbound.<plugin>agent → pluginplugin.outbound.whatsappAgent tools (send, reply…)Channel plugins
plugin.outbound.<plugin>.<instance>agent → pluginplugin.outbound.whatsapp.anaAgent toolsSpecific plugin instance
plugin.health.<plugin>plugin → runtimeplugin.health.browserPluginsHealth server
agent.events.<agent_id>internalagent.events.anaRuntime internalsDashboards, tests
agent.events.<agent_id>.heartbeatscheduler → agentagent.events.kate.heartbeatHeartbeat schedulerThat agent
agent.route.<target_id>agent → agentagent.route.opsSending agent's delegate toolTarget agent runtime
taskflow.resumeexternal → flowtaskflow.resumeAnything (other agents, services, ops)TaskFlow resume bridge

Multi-instance plugins append an .<instance> suffix so two WhatsApp accounts (e.g. Ana's line and Kate's line) can run side by side without subject collisions.

Agent-to-agent routing

sequenceDiagram
    participant Ana
    participant Bus as NATS
    participant Ops

    Ana->>Ana: LLM decides to delegate
    Ana->>Bus: publish agent.route.ops<br/>(correlation_id=X)
    Bus->>Ops: deliver
    Ops->>Ops: on_message handler runs
    Ops->>Bus: publish agent.route.ana<br/>(correlation_id=X)
    Bus->>Ana: deliver
    Ana->>Ana: correlate reply by ID

The sender always includes a correlation_id in the event envelope; the receiver echoes it on the reply. That's how one agent can fan out to several agents and reassemble results.

Broker abstraction

crates/broker exposes a Broker trait implemented by two backends:

  • NatsBroker — real NATS connection wrapped in a CircuitBreaker
  • LocalBroker — in-process tokio::mpsc for tests and offline mode

Switching between them is driven by config. The local broker matches NATS subject semantics (including . segments and > wildcards), which keeps the test surface identical to production.

Disk queue

When a publish to NATS fails — circuit breaker open, connection lost, transient 5xx — the event is persisted to the disk queue instead of being dropped.

PropertyValue
StorageSQLite
Default path./data/ (configurable via broker.persistence.path)
Tablespending_events, dead_letters
Event formatJSON serialization of Event { id, topic, payload, enqueued_at, attempts }
Drain orderFIFO by enqueued_at
Batch sizeup to 100 per drain() call
Max attempts before DLQ3 (DEFAULT_MAX_ATTEMPTS)
flowchart LR
    PUB[publish] --> OK{NATS up?}
    OK -->|yes| NATS[(NATS)]
    OK -->|no| ENQ[disk_queue.enqueue]
    ENQ --> SQLITE[(pending_events)]
    RECON[NATS reconnect] --> DRAIN[disk_queue.drain]
    SQLITE --> DRAIN
    DRAIN --> NATS
    DRAIN -.->|3 attempts failed| DLQ[(dead_letters)]
    DRAIN -.->|deserialization error| DLQ

Drain on reconnect

When NatsBroker detects reconnection, it calls disk_queue.drain():

  1. Read up to 100 oldest events from pending_events
  2. Republish each to NATS
  3. On success: delete row
  4. On failure: increment attempts, leave row in place
  5. Once attempts >= 3: move to dead_letters

Dead-letter queue (DLQ)

Events that exhaust retries, or fail to deserialize at all, land in dead_letters. They're not silently discarded — CLI lets you inspect and replay them.

agent dlq list              # show all dead events
agent dlq replay <event_id> # move one back to pending_events
agent dlq purge             # drop the table (destructive!)

Replay moves the entry back to pending_events; the next drain cycle retries it with attempts reset.

Backpressure

Two independent mechanisms:

  • Local broker channels are 256-capacity tokio::mpsc per subscriber. If a subscriber is slow, dropped events log a slow consumer warning but the subscription stays alive.
  • Disk queue applies proportional sleep at >50% capacity (scaled from 0 ms up to MAX_BACKPRESSURE_MS = 500 ms). At the hard cap it additionally drops the oldest event and sleeps 500 ms — an intentional "shed load, don't block the producer forever" stance.

The disk queue's backpressure only matters when NATS is down for a long time and the producer is faster than real time. In normal operation the disk queue stays near-empty.

Local fallback

When NATS is unreachable or the circuit breaker on the publish path is Open, the runtime degrades gracefully:

  • Inbound events from local plugins (e.g. a Telegram webhook fielded in-process) go through LocalBroker and reach agents immediately
  • Outbound events that target a plugin hosted in the same process (which is every shipped plugin) also go through LocalBroker
  • Anything that would have crossed a real NATS hop sits in the disk queue until reconnection

In practice, single-machine deployments keep working even with no NATS at all — the disk queue and the local broker together are sufficient for one process. NATS starts earning its keep the moment you scale to multiple processes, machines, or regions.

Fault tolerance

Every external call goes through a CircuitBreaker. Every retryable error has a bounded retry policy with jittered exponential backoff. Every event survives a NATS outage. A second process cannot race the first onto the same bus.

This page collects all of those guardrails in one place.

CircuitBreaker

Source: crates/resilience/src/lib.rs.

A three-state machine wrapped around any fallible external call. Once a dependency is failing, the breaker fails fast instead of piling up calls against a dead endpoint; periodic probes let it recover without human intervention.

stateDiagram-v2
    [*] --> Closed
    Closed --> Open: 5 consecutive failures
    Open --> HalfOpen: backoff elapsed
    HalfOpen --> Closed: 2 consecutive successes
    HalfOpen --> Open: any failure<br/>(backoff × 2, capped)

Defaults

FieldDefaultMeaning
failure_threshold5consecutive failures before opening
success_threshold2consecutive successes in HalfOpen before closing
initial_backoff10 swait time on first open
max_backoff120 scap on exponential backoff

Where it wraps

  • LLM calls — one circuit per provider (MiniMax, Anthropic, OpenAI-compat, Gemini). A provider outage doesn't cascade to others.
  • NATS publish — one circuit over the broker. When it opens the disk queue absorbs writes.
  • CDP commands — one circuit per browser session. A dead Chrome doesn't freeze the agent loop.
  • Extension stdio — implicit via the StdioRuntime lifecycle (crashed child → respawn, bounded).

Signals

CircuitBreaker exposes the usual methods (allow(), on_success(), on_failure()) plus two explicit overrides:

  • trip() — force Open from outside (e.g. a health check decided the dep is down before a call fails)
  • reset() — force Closed (e.g. the operator just restored the dep and doesn't want to wait for the probe window)

Retry policies

Retries live at a layer above the circuit breaker — they handle transient failures (429, 5xx, network blips) that don't warrant flipping the breaker. Every retry policy uses jittered exponential backoff to avoid thundering-herd reconnection storms.

ComponentMax attemptsBackoff range
LLM 429 (rate limit)51 s → 60 s, jittered exponential
LLM 5xx (server error)31 s → 30 s, jittered exponential
NATS publish drain3 per eventdisk queue drain cycle
CDPvia circuit onlybackoff = circuit's open window

These live in crates/llm/src/retry.rs (LLM) and crates/broker/src/disk_queue.rs (NATS drain).

Error classification

Retries only trigger on retryable errors. A 4xx other than 429 — missing key, invalid model, malformed request — fails fast. The rationale: retrying a misconfigured call wastes budget and still fails. Fail loudly, fix the config.

No message drop

The broker layer guarantees at-least-once delivery for publishes that reach the runtime:

flowchart LR
    P[publisher] --> TRY{NATS healthy?}
    TRY -->|yes| NATS[(NATS)]
    TRY -->|no| DQ[(disk queue)]
    DQ --> WAIT{reconnect?}
    WAIT -->|yes| DRAIN[drain FIFO]
    DRAIN --> NATS
    DQ -->|3 failed attempts| DLQ[(dead letters)]
    DLQ --> CLI[agent dlq replay]

In the absolute worst case — NATS down forever, disk full — the disk queue starts shedding oldest events at its hard cap, but the producer never crashes and never silently drops.

Single-instance lockfile

A second agent process pointed at the same data directory would double-subscribe every topic, delivering every message twice. To prevent that, boot acquires a lockfile and kicks out any stale or racing instance.

Source: src/main.rs::acquire_single_instance_lock.

flowchart TD
    START[agent boot] --> READ[read data/agent.lock]
    READ --> EXIST{file exists?}
    EXIST -->|no| WRITE[write our PID]
    EXIST -->|yes| PID[parse PID]
    PID --> ALIVE{/proc/PID/ exists?}
    ALIVE -->|no| WRITE
    ALIVE -->|yes| SIGTERM[send SIGTERM]
    SIGTERM --> WAIT[wait up to 5 s<br/>50 × 100 ms polls]
    WAIT --> DEAD{process gone?}
    DEAD -->|yes| WRITE
    DEAD -->|no| SIGKILL[send SIGKILL]
    SIGKILL --> WRITE
    WRITE --> LOCK[RAII handle alive]

The SingleInstanceLock RAII struct stores our own PID. On drop it only removes the lockfile if the stored PID still matches the current one — so a takeover by a third process doesn't let the original owner wipe the lock on its way out.

Graceful shutdown

See Agent runtime — Graceful shutdown for the ordered teardown sequence. Key points from a fault-tolerance angle:

  • Dream-sweep loops and MCP sessions get explicit grace windows so in-flight work doesn't produce partial state
  • Plugin intake is stopped before agent runtimes — the runtimes drain anything already in their mailboxes before exiting
  • If the disk queue has unflushed events on SIGTERM, they survive to the next boot

Operator guardrails

Beyond the automatic mechanisms:

  • Skill gating — an extension declaring requires.env = ["FOO"] is skipped at discovery when FOO is unset, instead of being registered and failing on every invocation. See Extensions — manifest.
  • Inbound filter — events with neither text nor media (receipts, typing indicators, reactions-only) are dropped before they reach the LLM, saving cost and avoiding noisy turns.
  • Health endpoints:8080/ready and :8080/live expose lifecycle state for k8s liveness / readiness probes.
  • Metrics:9090/metrics (Prometheus) exposes everything from inbound event counts to circuit breaker state; see Metrics.

Transcripts (FTS + redaction)

Per-session JSONL transcripts under agents.<id>.transcripts_dir are the canonical record of every turn. Two optional layers wrap that record:

  • FTS5 index — a SQLite virtual table that mirrors transcript content for MATCH queries. Backs the session_logs tool's search action when present.
  • Redaction — a regex pre-processor that rewrites entry content before it ever reaches disk. Patterns target common credentials and home-directory paths.

Source: crates/core/src/agent/transcripts_index.rs, crates/core/src/agent/redaction.rs, crates/core/src/agent/transcripts.rs.

Configuration

config/transcripts.yaml (optional; absent → defaults below):

fts:
  enabled: true                       # default
  db_path: ./data/transcripts.db      # default

redaction:
  enabled: false                      # default — opt in
  use_builtins: true                  # only relevant if enabled
  extra_patterns:
    - { regex: "TENANT-[0-9]+", label: "tenant_id" }

JSONL is the source of truth. The FTS index is derivable; if the DB is corrupted or deleted, agent transcripts reindex (planned) can rebuild it from disk.

FTS schema

CREATE VIRTUAL TABLE transcripts_fts USING fts5(
    content,
    agent_id        UNINDEXED,
    session_id      UNINDEXED,
    timestamp_unix  UNINDEXED,
    role            UNINDEXED,
    source_plugin   UNINDEXED,
    tokenize = 'unicode61 remove_diacritics 2'
);

The DB is shared across agents; isolation is enforced at query time by WHERE agent_id = ?. User queries are escaped as a single FTS5 phrase so operators (OR, NOT, :) in the user input never reach the engine as syntax.

session_logs integration

When the index is available, the search action returns:

{
  "ok": true,
  "query": "reembolso",
  "backend": "fts5",
  "count": 3,
  "hits": [
    {
      "session_id": "…",
      "timestamp": "2026-04-25T18:00:00Z",
      "role": "user",
      "source_plugin": "wa",
      "preview": "...quería un [reembolso] del pedido..."
    }
  ]
}

If the index is None (FTS disabled or init failed), the action falls back to the legacy substring scan over JSONL. The shape is the same minus backend: "fts5".

Redaction patterns

LabelDetectsExample match
bearer_jwtBearer eyJ… JWT tripletsBearer eyJhbGc.eyJzdWI.dGVzdA
anthropic_keyAnthropic API keyssk-ant-abcdef…
openai_keysk- prefix API keys (OpenAI etc.)sk-abc123…
aws_access_keyAWS access key idAKIAIOSFODNN7EXAMPLE
hex_token_32Long hex strings5d41402abc4b2a76b9719d911017c592
home_pathLinux/macOS home dirs/home/familia, /Users/alice

Each match is replaced with [REDACTED:<label>]. Patterns run in the order above, so more specific shapes (Bearer JWT, Anthropic) win over generic catch-alls below.

A 40-char base64 pattern targeting AWS secret keys was deliberately omitted — it produces too many false positives on legitimate hashes and opaque ids. Operators who need it can add it scoped via extra_patterns.

Custom patterns

redaction:
  enabled: true
  extra_patterns:
    - { regex: "TENANT-[0-9]+",   label: "tenant_id" }
    - { regex: "internal\\.acme", label: "internal_host" }

Custom patterns run after built-ins. Invalid regex aborts boot with a message naming the offending index and label.

What redaction does not do

  • It does not maintain a reverse map. Once content is redacted on disk the original is gone — by design. A reversible mapping would recreate the leak surface this feature is meant to close.
  • It does not rewrite previously-written JSONL files. New entries redact going forward; historical content stays as-is.
  • It does not redact tracing logs — that's a separate concern.
  • The FTS index stores the redacted text, so search results never surface the original secrets either.

Operational notes

  • The FTS index uses WAL journaling and capped pool size of 4 — it shares the same idiom as the long-term memory DB.
  • Insert is best-effort. If an FTS write fails (disk full, lock contention) the tool logs at warn and the JSONL append still succeeds. The source of truth is never compromised.
  • Boot logs include transcripts FTS index ready (or the warn that it fell back) and transcripts redaction active when the redactor has any rule loaded.

nexo-rs vs OpenClaw

OpenClaw is the closest reference point in the multi-channel-agent-gateway space. nexo-rs mined OpenClaw's plugin SDK, channel boundaries, and skills layout for ideas, then rebuilt the runtime in Rust with stricter operational guarantees. This page lays out the differences honestly — including where OpenClaw still has the edge.

Substrate

DimensionOpenClawnexo-rs
LanguageTypeScriptRust
RuntimeNode 22+none — single statically-linked binary
Install footprintpnpm install over ~42 runtime deps + 24 dev depsone binary, 34 MB built (29 MB stripped, 13 MB gzipped)
Cold-startnode boot + module resolutiondirect exec — sub-100ms to agent serve
Mobile targetfeasible with Termux + Nodefirst-class on Termux, no root, no Docker
Memory safetyruntime errorsRust ownership: data races, use-after-free, null deref refused at compile

The single-binary shape is the reason nexo-rs runs comfortably on a phone (Termux) and on a fresh VPS without a Node ecosystem underneath. cargo build --release and ship target/release/agent — that is the whole deliverable.

Process & messaging

DimensionOpenClawnexo-rs
Process modelsingle Node processmulti-process via NATS, in-process LocalBroker fallback when NATS is offline
Subject namespacen/a (in-process buses)plugin.inbound.<plugin>[.instance] / plugin.outbound.… / agent.route.<id> / taskflow.resume
Fault tolerancebest-effortNatsBroker wraps every publish in a CircuitBreaker; failures spill to a SQLite-backed disk queue and drain on reconnect
At-least-once deliveryn/adrain path documented as at-least-once; consumers dedupe by event.id
DLQn/afailed events land in dead_letters after 3 attempts; agent dlq list/replay/purge from the CLI
Subscription survivalrestartNATS subscriptions auto-resubscribe on reconnect with backoff (250 ms → 10 s)

Hot reload

DimensionOpenClawnexo-rs
Config changerestartagent reload (or file-watcher trigger) swaps a RuntimeSnapshot via ArcSwap — in-flight turns finish on the old snapshot, the next event picks up the new one
Watched filesagents.yaml, agents.d/*.yaml, llm.yaml (extra paths via runtime.yaml)
Per-agent reload channelmpsc to each AgentRuntime, the coordinator drains acks to confirm

Per-agent capability sandbox

OpenClaw's plugin allowlist is global to the gateway. nexo-rs pushes the allowlist down to the agent and the binding (the inbound channel surface):

agents:
  - id: kate
    plugins: [whatsapp, telegram, browser, taskflow]
    allowed_tools: ["whatsapp_*", "browser_navigate", "memory_*"]
    outbound_allowlist:
      whatsapp: ["+57…"]
      telegram: [123456789]
    skill_overrides:
      ffmpeg-tools: warn
    accept_delegates_from: ["ana"]
    inbound_bindings:
      - plugin: whatsapp
        instance: kate_wa
        # per-binding overrides for the same agent
        allowed_tools: ["whatsapp_*"]
        outbound_allowlist:
          whatsapp: ["+57…"]

What that buys:

  • An LLM running under kate cannot send messages to a number not in outbound_allowlist, even if a prompt injection asks it to.
  • Two channels exposed to the same agent (sales WA, private TG) carry different capability surfaces — the sales binding doesn't get the private one's tool set.
  • Skill modes (strict / warn / disable) are decided per agent, with explicit requires.bin_versions semver constraints (probed at boot, process-cached).

Secrets

DimensionOpenClawnexo-rs
Credential resolutionenv varsagents.<id>.credentials block per channel; resolver maps to per-channel stores (gauntlet validates at boot)
1Passwordn/aop CLI extension + inject_template tool: render {{ op://Vault/Item/field }} and pipe to allowlisted commands without exposing the secret
Audit logn/aappend-only JSONL at OP_AUDIT_LOG_PATH: every read_secret and inject_template records agent_id, session_id, fingerprint, reveal_allowed — never the value
Capability inventoryn/aagent doctor capabilities [--json] enumerates every write/reveal env toggle (OP_ALLOW_REVEAL, CLOUDFLARE_*, DOCKER_API_*, PROXMOX_*, SSH_EXEC_*) with state + risk

Transcripts

OpenClaw stores transcripts as JSONL and greps them. nexo-rs keeps the JSONL (source of truth) and adds:

  • SQLite FTS5 index (data/transcripts.db) — write-through from TranscriptWriter::append_entry. The session_logs search agent tool uses MATCH queries with phrase-escaped user input so operator strings can't inject FTS operators.
  • Pre-persistence redactor (opt-in) — regex pass over content before write. 6 built-in patterns (Bearer JWT, sk-…, sk-ant-…, AWS access keys, 64+ hex tokens, home paths) plus operator-defined extra_patterns. JSONL and FTS receive the same redacted text.
  • Atomic header writesOpenOptions::create_new(true) so 16 concurrent first-appends to the same session result in exactly one header line.

Durable workflows

OpenClaw doesn't ship a durable-flow primitive. nexo-rs has TaskFlow:

  • taskflow LLM tool with actions start | status | advance | wait | finish | fail | cancel | list_mine.
  • Three wait conditions: Timer { at }, ExternalEvent { topic, correlation_id }, Manual.
  • Single global WaitEngine ticks every 5 s (configurable), resumes flows whose deadlines have passed.
  • taskflow.resume NATS subject lets external services wake external_event flows: publish {flow_id, topic, correlation_id, payload} and the bridge calls try_resume_external.
  • agent flow list/show/cancel/resume from the CLI.
  • Guardrails: timer_max_horizon (default 30 days) blocks unbounded waits; non-empty topic + correlation_id required for external_event.

LLM auth

DimensionOpenClawnexo-rs
AnthropicAPI keyAPI key and claude_subscription OAuth PKCE flow — uses the operator's Claude Code subscription quota instead of API billing
MiniMaxAPI keyAPI key and Token Plan / Coding Plan OAuth bundle (api_flavor: anthropic_messages)
OpenAI-compatAPI keyAPI key + DeepSeek wired out of the box (OpenAI-compat reuse)
Gemininot in corefirst-class client

MCP

OpenClaw supports MCP as a client. nexo-rs is both:

  • Client — stdio and HTTP transports, full tool / resource / prompt catalog, tools/list_changed hot-reload.
  • Serveragent mcp-server exposes the agent's own tools (filtered by allowlist) over stdio for Claude Desktop / Cursor / any MCP-aware host. Proxy tools (ext_*, mcp_*) are unconditionally hidden so the agent doesn't become an open relay.

Build size

target/release/agent             34 MB
target/release/agent (stripped)  29 MB
target/release/agent (.gz -9)    13 MB

For comparison, an OpenClaw install (Node + node_modules after pnpm install) sits in the hundreds of megabytes — most of it needed at runtime, not just build-time.

Where OpenClaw is still ahead

Honest list:

  • Installer & onboarding flow — OpenClaw's openclaw doctor family and the bundled installer give a smoother first-run UX than nexo-rs's agent setup wizard, especially for non-Rust developers.
  • TS familiarity — the JS / TS audience for plugin authors is larger than the Rust audience; if your team writes mostly TypeScript, contributing back to OpenClaw is faster.
  • Track record — OpenClaw has a longer release history, more maintainers, and more shipped extensions in the wild.
  • Apps surface — OpenClaw ships iOS / Android / macOS companion apps; nexo-rs only ships the daemon and the loopback web admin (admin-ui Phase A0–A11 still in progress).

Summary

If you want operational guarantees (single binary, fault-tolerant broker, per-agent sandbox, durable workflows, secrets audit) and you're OK with Rust, nexo-rs.

If you want fast onboarding, a TS plugin ecosystem, and the OpenClaw apps, OpenClaw.

The two projects share enough vocabulary that moving an extension between them is mostly a port, not a rewrite. The plugin SDK shape (stdio-spoken JSON-RPC + a plugin.toml manifest) is deliberately compatible.

Driver subsystem (Phase 67)

The driver subsystem turns the nexo-rs agent runtime into the "human in the loop" for another agent — typically the Claude Code CLI. It runs a goal-bound experiment: spawn the external CLI, watch its tool-use stream, decide allow/deny on every action, feed back acceptance failures, and stop only when the CLI claims "done" AND objective verification passes.

This page describes the architectural shape; concrete impl details live with each sub-phase.

Why

Claude Code (or any other local CLI agent) is excellent at writing code, but it sometimes:

  • over-claims completion — says "done" when tests are red;
  • proposes destructive shell commands when stuck;
  • forgets which approaches it already tried and failed.

A second agent — driven by nexo-rs, backed by a different LLM (MiniMax M2.5), with persistent memory — closes those gaps.

Architecture

nexo-rs daemon
│
├─ "claude-driver" agent
│   ├─ LLM: MiniMax M2.5
│   ├─ memory: short_term + long_term + vector + transcripts
│   └─ skills: claude_cli, git_checkpoint, test_runner,
│              acceptance_eval, escalate
│
└─ MCP server (in-process)
    └─ tool: permission_prompt(tool_name, input) → {allow|deny, message}

claude  (subprocess, one per turn)
└─ claude --resume <id>
          --output-format stream-json
          --permission-prompt-tool mcp__nexo-driver__permission_prompt
          --add-dir <worktree>
          --allowedTools "Read,Grep,Glob,LS,WebFetch"
          -p "<turn prompt>"

Termination model

Claude says "done" — driver does NOT trust it. Driver runs the goal's acceptance criteria (cargo build, cargo test, cargo clippy, PHASES marker, custom verifiers). Only when all pass is the goal declared Done. Otherwise the failures are folded into the next turn's prompt: "you said done, but here's what still fails — fix it".

The driver also stops on budget exhaustion: max turns, wall-time, tokens, or consecutive denies. On exhaustion the driver escalates to the operator (WhatsApp / Telegram via existing channel plugins) with a state dump.

Foundational types — nexo-driver-types

The contract — AgentHarness trait + Goal / Attempt / Decision / AcceptanceCriterion / BudgetGuards types — lives in the leaf crate nexo-driver-types. Every value is serde-serializable so the contract can travel through NATS, get re-imported by extensions, and power admin-ui dashboards without dragging in the daemon.

How a turn flows (Phase 67.1)

#![allow(unused)]
fn main() {
use std::time::Duration;
use nexo_driver_claude::{ClaudeCommand, spawn_turn};
use nexo_driver_types::CancellationToken;

async fn doc(session_id: String) -> anyhow::Result<()> {
let cmd = ClaudeCommand::discover("Implementa Phase 26.z")?
    .resume(session_id)
    .allowed_tools(["Read", "Grep", "Glob", "LS"])
    .permission_prompt_tool("mcp__nexo-driver__permission_prompt")
    .cwd("/tmp/claude-runs/26-z");

let cancel = CancellationToken::new();
let mut turn = spawn_turn(cmd, &cancel, Duration::from_secs(600), Duration::from_secs(1)).await?;

while let Some(ev) = turn.next_event().await? {
    // dispatch on ev (Assistant tool_use → permission_prompt; Result → done check)
    let _ = ev;
}
let _exit = turn.shutdown().await?;
Ok(())
}
}

next_event cooperatively races three signals via tokio::select!: the cancel token, the per-turn deadline, and the JSONL stream. Errors land as Cancelled, Timeout, ParseLine, etc. Cleanup is always shutdown()ChildHandle::Drop is the panic safety net.

Persistence (Phase 67.2)

SqliteBindingStore keeps (goal_id → claude session_id) plus timestamps in a single claude_session_bindings table. Two filters are applied on get:

  • idle TTLlast_active_at must be within idle_ttl of now;
  • max agecreated_at + max_age must be in the future.

Either filter can be None (no filter) or Duration::ZERO (alias).

Three soft-delete-friendly operations live alongside clear:

  • mark_invalid(goal_id) flips last_session_invalid = 1 instead of deleting the row. Phase 67.8 (replay-policy) calls this when Claude rejects a session id mid-turn; the row stays for forensics.
  • touch(goal_id) bumps last_active_at only. Driver loop calls it per observed event so the idle filter doesn't need a structural upsert per turn.
  • purge_older_than(cutoff) reaps rows the operator no longer cares about. Phase 67.6 (worktree janitor) calls it nightly.

Schema migrations: PRAGMA user_version = 1 is the sentinel; every open() runs CREATE TABLE/INDEX IF NOT EXISTS. Future v2 will extend that helper.

Permission flow (Phase 67.3)

Every Claude tool call that isn't on the static allowlist (Read,Grep,Glob,LS,WebFetch) goes through the MCP server before execution:

Claude Code ─── tools/call mcp__nexo-driver__permission_prompt ───▶
                                                                    │
                                                          stdio JSON-RPC
                                                                    │
                                                                    ▼
                                              nexo-driver-permission-mcp (child)
                                                                    │
                                                            calls PermissionDecider
                                                                    │
                                                                    ▼
                                                     {behavior: allow|deny, ...}

PermissionMcpServer exposes one tool, permission_prompt. The in-process AllowSession cache keyed on (tool_name, hash(input)) short-circuits repeat calls (a Claude turn that re-reads the same file pays the decider once).

Outcomes Claude receives are always one of two shapes:

{ "behavior": "allow" }                   // optional updatedInput
{ "behavior": "deny", "message": "..." }

Internally the driver tracks five outcomes — AllowOnce, AllowSession{scope}, Deny, Unavailable, Cancelled — collapsing the last three to deny on the wire. Unavailable (timeout) is fail-closed by design.

Phase 67.3 ships the bin in placeholder modes (--allow-all for dev, --deny-all <reason> for shadow). Phase 67.4 will swap those flags for --socket <path> so the bin asks the daemon's LlmDecider (MiniMax + memory) for each decision.

Goal lifecycle (Phase 67.4)

nexo-driver run goal.yaml
        │
        ▼
DriverOrchestrator::run_goal
        │
        ├─ workspace_manager.ensure(&goal)        ─┐
        │                                          │
        ├─ write_mcp_config(workspace,             ├─ side-effects in
        │     bin_path, socket_path)               │   <workspace>/
        │                                          │
        ├─ DriverSocketServer (already running) ──┘
        │     spawned by builder, owned via JoinHandle
        │
        └─ for each turn:
             ├─ budget.is_exhausted? → BudgetExhausted{axis}
             ├─ AttemptStarted event
             ├─ run_attempt(ctx, params)
             │     spawn `claude --resume <id> ... --mcp-config ...`
             │     event-loop on stream-json
             │     binding_store.upsert(session_id)
             │     acceptance.evaluate(criteria, workspace)
             │     return AttemptResult { outcome }
             ├─ AttemptCompleted event
             └─ match outcome:
                Done            → break, GoalCompleted{Done}
                NeedsRetry{f}   → next turn with prior_failures
                Continue{...}   → next turn (e.g. session-invalid retry)
                Cancelled       → break
                BudgetExhausted → break
                Escalate{r}     → emit Escalate event, break

AttemptOutcome::Continue covers two cases the loop treats the same: the stream ended without Result::Success (Claude crashed early), and a session not found reply that triggered binding_store.mark_invalid so the next turn starts fresh.

NATS subjects emitted (when feature = "nats" and emit_nats_events: true):

  • agent.driver.goal.{started,completed}
  • agent.driver.attempt.{started,completed}
  • agent.driver.decision (Phase 67.7 will populate when LlmDecider records its rationale)
  • agent.driver.acceptance
  • agent.driver.budget.exhausted
  • agent.driver.escalate
  • agent.driver.replay (Phase 67.8 — replay-policy verdict)
  • agent.driver.compact (Phase 67.9 — compact-policy scheduled a /compact <focus> turn)

Compact policy (Phase 67.9)

Long agentic runs let Claude's context grow without bound. The orchestrator runs a CompactPolicy after every successful work turn: when running tokens cross threshold * context_window, the next iteration is rewritten as a /compact <focus> slash command turn so Claude Code shrinks its own context before the next work turn. Compact turns absorb token usage but do not bump the goal's turn counter, so they don't burn the budget. min_turns_between_compacts prevents back-to-back compacts. Set context_window: 0 (or enabled: false) in compact_policy: to disable.

Sub-phases

PhaseWhatStatus
67.0AgentHarness trait + types
67.1claude_cli skill (spawn + stream-json + resume)
67.2Session-binding store (SQLite)
67.3MCP permission_prompt in-process
67.4Driver agent loop + budget guards
67.5Acceptance evaluator
67.6Git worktree sandboxing + per-turn checkpoint
67.7Memoria semántica de decisiones
67.8Replay-policy (resume tras crash mid-turn)
67.9Compact opportunista
67.10Escalación a WhatsApp/Telegram
67.11Shadow mode (calibración)
67.12Multi-goal paralelo
67.13Cost dashboard + admin-ui A4 tile

See also

  • crates/driver-types/README.md — contract surface and layering
  • proyecto/PHASES.md — Phase 67 sub-phase status of record
  • OpenClaw reference: research/src/agents/harness/types.ts
  • OpenClaw subprocess pattern: research/extensions/codex/src/app-server/transport-stdio.ts

Project tracker + multi-agent dispatch (Phase 67.A–H)

The project-tracker subsystem lets a nexo-rs agent answer "qué fase va el desarrollo" through Telegram / WhatsApp / a shell, and lets it dispatch async programmer agents that ship phases on its behalf.

The implementation is layered:

LayerCrateResponsibility
Project filesnexo-project-trackerParse PHASES.md + FOLLOWUPS.md, watch for changes, expose read tools.
Multi-agent statenexo-agent-registryDashMap + SQLite store of every in-flight goal, cap + queue + reattach.
Goal controlnexo-driver-loopspawn_goal / pause_goal / resume_goal / cancel_goal per-goal.
Tool surfacenexo-dispatch-toolsprogram_phase, dispatch_followup, hook system, agent control + query, admin.
Capability gatenexo-config + nexo-coreDispatchPolicy per agent / binding, ToolRegistry filter.

Project tracker (Phase 67.A)

FsProjectTracker reads <root>/PHASES.md (required) and <root>/FOLLOWUPS.md (optional) at startup, caches parsed state behind a parking-lot RwLock with a 60 s TTL, and starts a notify watcher on the parent directory that invalidates the cache on Modify | Create | Remove events.

Read tools register through nexo_dispatch_tools::READ_TOOL_NAMES (project_status, project_phases_list, followup_detail, git_log_for_phase).

Set ${NEXO_PROJECT_ROOT} to point at a workspace other than the daemon's cwd.

Multi-agent registry (Phase 67.B)

AgentRegistry is the single source of truth for every goal the driver has admitted. Each entry holds an ArcSwap<AgentSnapshot> (turn N/M, last acceptance, last decision summary, diff_stat) so list_agents / agent_status readers never block writers.

  • admit(handle, enqueue) enforces the global cap. Beyond the cap, enqueue=true parks the goal as Queued; enqueue=false rejects.
  • release(goal_id, terminal) returns the next-up queued goal so the orchestrator can promote it via promote_queued once the worktree / binding is ready.
  • apply_attempt(AttemptResult) refreshes the live snapshot. Idempotent against out-of-order replay (lower turn_index ignored).
  • Reattach (Phase 67.B.4) walks the SQLite store at boot and rehydrates Running rows. With resume_running=false they flip to LostOnRestart and surface to the operator.

LogBuffer keeps a per-goal ring of recent driver events for the agent_logs_tail tool — bounded so a chatty goal cannot OOM the process.

Persistence wiring (Phase 71)

The bin reads agent_registry.store from config/project-tracker/project_tracker.yaml and opens SqliteAgentRegistryStore when the resolved path is non-empty. Env placeholders (${NEXO_AGENT_REGISTRY_DB:-./data/agents.db}) are expanded before the open. Path open failures fall back to MemoryAgentRegistryStore with a warn so a corrupt sqlite file never bricks boot.

When the registry is sqlite-backed and reattach_on_boot: true, the bin runs the reattach sweep with resume_running=false. Every prior-run Running row flips to LostOnRestart, and any notify_origin / notify_channel hook attached to that goal fires once with an [abandoned] summary so the originating chat learns the goal could not be resumed. Subprocess respawn is intentionally not attempted — restoring a Claude Code worktree the daemon no longer owns is unsafe to do silently and lives under Phase 67.C.1.

Shutdown drain (Phase 71.3)

On SIGTERM the bin runs nexo_dispatch_tools::drain_running_goals before plugin teardown so notify_origin reaches WhatsApp / Telegram while their adapters are still alive. Each Running goal's Cancelled hooks fire with a [shutdown] summary; per-hook dispatch is bounded by a 2 s timeout so a stuck publish cannot hold shutdown hostage. The row then flips to LostOnRestart so the next boot's reattach sweep does not re-fire the same notification.

[shutdown] daemon stopping — goal `<id>` was running and has
been marked abandoned. Re-dispatch with `program_phase
phase_id=<phase>` if you still need it.

SIGKILL still bypasses this — the boot-time reattach sweep is the safety net for that case.

Turn-level audit log (Phase 72)

Live state (AgentSnapshot) only carries the latest decision / diff / acceptance per goal. Once a turn rolls forward the previous turn's data is gone. To answer "what did the agent actually do across its 40 turns?" the runtime now writes a durable row per turn into a goal_turns table on the same agents.db:

goal_turns(
    goal_id      TEXT,
    turn_index   INTEGER,
    recorded_at  INTEGER,
    outcome      TEXT,        -- done | continue | needs_retry | …
    decision     TEXT,        -- last Decision rendered as
                              --   "<tool> (allow|deny:msg|observe:note) — rationale"
    summary      TEXT,        -- mirror of AgentSnapshot.last_progress_text
    diff_stat    TEXT,
    error        TEXT,        -- pre-rendered for needs_retry / escalate / budget
    raw_json     TEXT,        -- full AttemptResult payload
    PRIMARY KEY (goal_id, turn_index)
);

EventForwarder writes a row on every AttemptResult event, upsert-on-conflict so a replay can't dup history. The new chat tool agent_turns_tail goal_id=<uuid> [n=20] returns a markdown table of the last N rows (default 20, capped at 1000):

showing 20 of 40 turn(s) for `…`

| turn | outcome | decision | summary | error |
|---|---|---|---|---|
| 21 | continue | Edit (allow) — patch crate slack | wired Plugin trait | - |
| 22 | needs_retry | Bash (allow) — cargo build | … | E0432 in slack/src/lib.rs |
…

Best-effort writes: an append failure logs a warn but never blocks the driver loop. When the registry isn't sqlite-backed (memory fallback), the tool reports "set agent_registry.store in project_tracker.yaml" rather than silently returning empty.

Async dispatch (Phase 67.C + 67.E)

DriverOrchestrator::spawn_goal(self: Arc<Self>, goal) returns a tokio::task::JoinHandle so the calling tool returns the goal id instantly without waiting for the run to finish. Per-goal pause / cancel signals (watch<bool> and CancellationToken::child_token) let pause_agent / cancel_agent target one goal without taking down the rest of the orchestrator.

program_phase_dispatch is the heart of the dispatch surface: it reads the sub-phase out of PHASES.md, runs DispatchGate::check, constructs a Goal with the dispatcher / origin metadata, asks the registry for a slot, and either spawns the goal or returns Queued / Forbidden / NotFound. dispatch_followup is the mirror that pulls the description from a FOLLOWUPS.md item.

Capability gate (Phase 67.D)

DispatchPolicy { mode, max_concurrent_per_dispatcher, allowed_phase_ids, forbidden_phase_ids } lives on AgentConfig and (as Option<DispatchPolicy>) on InboundBinding. The per-binding override fully replaces the agent-level value so an operator can be precise per channel ("asistente is none everywhere except this Telegram chat where it is full").

DispatchGate::check short-circuits in this order:

  1. capability NoneCapabilityNone (every kind).
  2. ReadOnly capability + write kind → CapabilityReadOnly.
  3. write + require_trusted + !sender_trustedSenderNotTrusted. Read tools bypass the trust gate so list_agents stays open for unpaired senders.
  4. forbidden_phase_ids match → PhaseForbidden.
  5. non-empty allowed_phase_ids + no match → PhaseNotAllowed.
  6. dispatcher / sender / global caps. Global cap with queue_when_full=true is admitted; the orchestrator queues it. Without queue → GlobalCapReached.

ToolRegistry::apply_dispatch_capability(policy, is_admin) prunes the registry of dispatch tool names not allowed by the resolved policy. ToolRegistryCache::get_or_build_with_dispatch builds the per-binding filtered registry that respects both allowed_tools and dispatch_policy. Hot reload (Phase 18) constructs a fresh ToolRegistryCache per snapshot, so a new dispatch_policy lands on the next intake without restart; in-flight goals keep their pre-reload tool surface so a hot reload never preempts.

Completion hooks (Phase 67.F)

Each hook is (on: HookTrigger, action: HookAction, id). Triggers fire on Done | Failed | Cancelled | Progress { every_turns }. Actions:

  • notify_origin — publish a markdown summary to the chat that triggered the goal. No-op when origin.plugin == "console".
  • notify_channel { plugin, instance, recipient } — publish to an explicit channel different from the origin (escalate to ops).
  • dispatch_phase { phase_id, only_if } — chain another goal when only_if matches the firing transition. Implemented via a pluggable DispatchPhaseChainer so the runtime owns program_phase_dispatch plumbing.
  • nats_publish { subject } — JSON payload to a custom subject.
  • shell { cmd, timeout } — opt-in via allow_shell_hooks. Capability PROGRAM_PHASE_ALLOW_SHELL_HOOKS registered with the setup inventory so agent doctor capabilities flags it the moment the operator exports the env var. Receives NEXO_HOOK_GOAL_ID / PHASE_ID / TRANSITION / PAYLOAD_JSON env vars.

HookIdempotencyStore (SQLite) keeps (goal_id, transition, action_kind, action_id) UNIQUE so at-least-once NATS replay or a mid-hook restart cannot fire a hook twice.

HookRegistry (in-memory DashMap<GoalId, Vec<CompletionHook>>) backs add_hook / remove_hook / agent_hooks_list.

NATS subjects (Phase 67.H.2)

SubjectProducer
agent.dispatch.spawnedprogram_phase_dispatch admitted
agent.dispatch.deniedDispatchGate::check denied
agent.tool.hook.dispatchedhook fired ok
agent.tool.hook.failedhook attempt errored
agent.registry.snapshot.<goal_id>per-goal periodic beacon
agent.driver.progressevery Nth completed work-turn

Plus the existing Phase 67.0–67.9 subjects: agent.driver.{goal,attempt}.{started,completed}, agent.driver.{decision,acceptance,budget.exhausted,escalate,replay,compact}.

CLI (Phase 67.H.1)

nexo-driver-tools mirrors the chat tool surface for shell use:

nexo-driver-tools status [--phase <id> | --followups]
nexo-driver-tools dispatch <phase_id>
nexo-driver-tools agents list [--filter running|queued|...]
nexo-driver-tools agents show <goal_id>
nexo-driver-tools agents cancel <goal_id> [--reason "…"]

origin.plugin = "console" so notify_origin is a no-op (the operator sees stdout, not a chat reply).

Built-in registration (nexo daemon)

The default nexo agent binary registers every dispatch tool definition at boot via nexo_core::agent::dispatch_handlers::register_dispatch_tools_into. The LLM sees program_phase, list_agents, agent_status, etc. in its toolset; per-binding dispatch_capability (config/agents.yaml) prunes the write tools for bindings that opted out.

What's NOT bundled by default is the runtime DispatchToolContext — the orchestrator + registry + tracker references the handlers consult. Without it, a tool call returns a clean dispatch tools require AgentContext.dispatch to be set at boot error instead of pretending success. Two integration paths from there:

  • In-process orchestrator — boot a DriverOrchestrator alongside the agents, share one AgentRegistry. See the next section for the wiring sample.
  • NATS-based dispatch — agent bin publishes a message to agent.driver.dispatch.request that a separate nexo-driver daemon consumes. This is the topology to use when the Claude subprocess needs hardware (GPU box) the agent daemon doesn't have. The dispatch tool surface only changes in the registry it consults; operators can swap the in- process AgentRegistry for one that mirrors a NATS-backed registry without touching the handlers.

Boot wiring (B8)

The integrator's main.rs ties everything together. Minimal shape:

use std::sync::Arc;
use nexo_agent_registry::{AgentRegistry, MemoryAgentRegistryStore, LogBuffer};
use nexo_core::agent::{
    dispatch_handlers::{register_dispatch_tools_into, DispatchToolContext},
    tool_registry::ToolRegistry,
};
use nexo_dispatch_tools::{
    event_forwarder::EventForwarder,
    hooks::{DefaultHookDispatcher, HookRegistry, NoopNatsHookPublisher},
    policy_gate::CapSnapshot,
    NoopTelemetry,
};
use nexo_pairing::PairingAdapterRegistry;
use nexo_project_tracker::FsProjectTracker;

// 1. Project tracker.
let tracker: Arc<dyn nexo_project_tracker::ProjectTracker> =
    Arc::new(FsProjectTracker::open(std::env::current_dir().unwrap())?);

// 2. Agent registry + log buffer.
let registry = Arc::new(AgentRegistry::new(
    Arc::new(MemoryAgentRegistryStore::default()),
    4,
));
let log_buffer = Arc::new(LogBuffer::new(200));
let hook_registry = Arc::new(HookRegistry::new());

// 3. Hook dispatcher with the channel adapters that Phase 26
//    registered (whatsapp / telegram).
let pairing = PairingAdapterRegistry::new();
// pairing.register(WhatsappPairingAdapter::new(...));
// pairing.register(TelegramPairingAdapter::new(...));
let hook_dispatcher = Arc::new(DefaultHookDispatcher::new(
    pairing,
    Arc::new(NoopNatsHookPublisher),
));

// 4. Orchestrator with EventForwarder so registry / log_buffer /
//    hooks see every driver event.
let inner_sink: Arc<dyn nexo_driver_loop::DriverEventSink> =
    Arc::new(nexo_driver_loop::NoopEventSink);
let event_sink: Arc<dyn nexo_driver_loop::DriverEventSink> =
    Arc::new(EventForwarder::new(
        registry.clone(),
        log_buffer.clone(),
        hook_registry.clone(),
        hook_dispatcher.clone(),
        inner_sink,
    ));
// (orchestrator builder consumes event_sink)

// 5. Bundle for AgentContext.dispatch.
let dispatch_ctx = Arc::new(DispatchToolContext {
    tracker,
    orchestrator: orch.clone(),
    registry,
    hooks: hook_registry,
    log_buffer,
    default_caps: CapSnapshot {
        queue_when_full: true,
        ..Default::default()
    },
    require_trusted: true,
    telemetry: Arc::new(NoopTelemetry),
});

// 6. Register the handlers into the base ToolRegistry. The
//    per-binding cache prunes write tools when capability=None
//    or read_only.
let base = ToolRegistry::new();
register_dispatch_tools_into(&base);

// 7. Per-session AgentContext.with_dispatch(dispatch_ctx)
//    + .with_sender_trusted(true) + .with_inbound_origin(plugin,
//    instance, sender).

Without step 6 the handlers exist but aren't reachable by the LLM. Without step 4 the registry / log_buffer / hooks stay inert. Without step 5 the handlers return MissingDispatchCtx.

See also

  • proyecto/PHASES.md — Phase 67.A–H sub-phase status of record.
  • architecture/driver-subsystem.md — Phase 67.0–67.9 driver loop
    • replay + compact policies.

Configuration layout

nexo-rs loads configuration from a single directory (passed via --config <path>, default ./config). The runtime reads a small set of required YAML files and a handful of optional ones.

Source: crates/config/src/lib.rs::AppConfig::load.

Directory tree

config/
├── agents.yaml              # required — base agent catalog
├── agents.d/                # optional — drop-in agents, merged in alpha order
│   ├── ana.example.yaml     # template (committed)
│   └── *.yaml               # real definitions (gitignored)
├── broker.yaml              # required — NATS / local broker + disk queue
├── llm.yaml                 # required — LLM providers
├── memory.yaml              # required — short-term + long-term + vector
├── extensions.yaml          # optional — extension search paths, toggles
├── mcp.yaml                 # optional — MCP servers the agent consumes
├── mcp_server.yaml          # optional — expose this agent as an MCP server
├── tool_policy.yaml         # optional — per-tool / per-agent policy
├── runtime.yaml             # optional — hot-reload watcher settings
├── plugins/
│   ├── whatsapp.yaml
│   ├── telegram.yaml
│   ├── email.yaml
│   ├── browser.yaml
│   ├── google.yaml
│   └── gmail-poller.yaml
└── docker/                  # optional — overrides for containerized runs
    ├── agents.yaml
    ├── llm.yaml
    └── …

Required vs optional

The loader fails startup if any required file is missing or malformed. Optional files return None when absent and unlock related features only if present.

FileKind
agents.yamlrequired
broker.yamlrequired
llm.yamlrequired
memory.yamlrequired
extensions.yamloptional
mcp.yamloptional
mcp_server.yamloptional
tool_policy.yamloptional
runtime.yamloptional — hot-reload knobs; defaults enable reload at 500 ms debounce. See Config hot-reload.
plugins/*.yamloptional (only needed for plugins you enable)

Drop-in agents

Files under config/agents.d/*.yaml are merged into the base agents.yaml in lexicographic filename order. Each file has the same top-level shape (agents: [...]); entries append to the base list.

Common patterns:

  • 00-dev.yaml / 10-prod.yaml — control override order by numeric prefix
  • Keep agents.yaml public-safe and drop sensitive business content (sales prompts, pricing, phone numbers) into gitignored config/agents.d/ana.yaml
  • Ship config/agents.d/<name>.example.yaml as a template so the shape stays discoverable

Details in Drop-in agents.

Docker layout

config/docker/ mirrors the main layout and is consumed when the compose file mounts it at /app/config/docker:

# docker-compose.yml
command: ["agent", "--config", "/app/config/docker"]

Secrets inside Docker containers live at /run/secrets/<name> — the compose definitions use ${file:/run/secrets/...} references. See LLM config — auth for the full secret resolution rules.

Env vars and secrets in YAML

YAML values can reference env vars and files:

SyntaxMeaning
${VAR}read env var, fail if unset or empty
${VAR:-fallback}env var if set and non-empty, else fallback
${VAR-fallback}env var if set (even empty), else fallback
${file:./secrets/x}read file contents, trimmed of whitespace

Path-traversal rules for ${file:...}:

  • Relative paths are rooted at the current working directory
  • .. segments are rejected outright
  • Absolute paths must sit under one of these whitelisted roots:
    • /run/secrets/ (Docker secrets)
    • /var/run/secrets/ (Kubernetes projected volumes)
    • ./secrets/ (project-local)
    • the directory pointed at by $CONFIG_SECRETS_DIR (operator-defined)

Everything else is refused at parse time with an explicit error naming the invalid path and the allowed roots.

Validation

All config structs deserialize with #[serde(deny_unknown_fields)], so typos fail fast:

unknown field `modl`, expected `model`
at line 4, column 5 in config/agents.yaml

Missing required fields produce the same kind of message:

missing field `model`
at line 5, column 3 in config/agents.yaml

Env / file resolution errors identify the placeholder and the file:

env var MINIMAX_API_KEY not set (referenced in llm.yaml)
${file:../etc/passwd}: `..` not allowed in file reference (in broker.yaml)

Boot sequence

flowchart TD
    START([agent --config path]) --> LOAD[AppConfig::load]
    LOAD --> REQ{required files<br/>present & parseable?}
    REQ -->|no| FAIL([fail fast, exit 1])
    REQ -->|yes| OPT[read optional files]
    OPT --> DROP[merge config/agents.d/]
    DROP --> RESOLVE[resolve env / file placeholders]
    RESOLVE --> VAL[struct-level validation<br/>deny_unknown_fields]
    VAL --> SEM[semantic validation<br/>validate_agents, MCP headers]
    SEM --> READY([AppConfig ready])

Next

agents.yaml

The agent catalog. One entry per agent; each entry declares the model, channels, tools, sandboxing, and behavioral knobs for that agent.

Source: crates/config/src/types/agents.rs.

Top-level shape

agents:
  - id: ana
    model:
      provider: minimax
      model: MiniMax-M2.5
    plugins: [whatsapp]
    inbound_bindings:
      - plugin: whatsapp
    allowed_tools:
      - whatsapp_send_message
    outbound_allowlist:
      whatsapp:
        - "573000000000"
    system_prompt: |
      You are Ana, …

Full field reference

All fields use #[serde(deny_unknown_fields)] — typos fail fast.

Identity & model

FieldTypeRequiredDefaultPurpose
idstringUnique agent id. Used as session key, subject suffix, workspace dir name.
model.providerstringProvider key in llm.yaml (e.g. minimax, anthropic).
model.modelstringModel id understood by that provider.
descriptionstring""Human-readable role. Injected into # PEERS for delegation discovery.

Channels

FieldTypeDefaultPurpose
plugins[string][]Plugin ids this agent wants to expose tools for (whatsapp, telegram, browser, …).
inbound_bindingsarray[]Per-plugin binding list. Empty = legacy wildcard (receive everything).

Each inbound_bindings[] entry can override the agent-level defaults for that channel: allowed_tools, outbound_allowlist, skills, model, system_prompt_extra, sender_rate_limit, allowed_delegates. Useful for running the same agent on two channels with different rules. See Per-binding capability override below for the full override surface and merge rules.

Tool sandboxing

FieldTypeDefaultPurpose
allowed_tools[string][]Build-time pruning of the tool registry. Glob suffix * allowed. Empty = all tools registered.
tool_rate_limitsobjectnullPer-tool rate limit patterns. Glob-matched.
tool_args_validation.enabledbooltrueToggle JSON-schema validation of tool arguments.
outbound_allowlistobject{}Per-plugin recipient allowlist (e.g. phone numbers, chat ids). Defense-in-depth for send tools.

allowed_tools semantics:

  • For legacy agents (no inbound_bindings) the allowlist is applied at registry-build time — tools not matching the patterns are removed from the registry before the LLM sees them.
  • For agents with inbound_bindings the base registry keeps every tool and enforcement happens per-binding at turn time (see Per-binding capability override) so a binding's override can both narrow AND expand within the registry. Defense-in-depth: the LLM only receives tools allowed by the matched binding, and the tool-call execution path rejects any hallucinated name outside the same allowlist.

In both modes the LLM never receives disallowed tool definitions; the difference is where the filter is applied.

System prompt & workspace

FieldTypeDefaultPurpose
system_promptstring""Prepended to every LLM turn. Defines persona, rules, examples.
workspacepath""Directory with IDENTITY.md, SOUL.md, USER.md, AGENTS.md, MEMORY.md. Loaded at turn start. See Soul, identity & learning.
extra_docs[path][]Workspace-relative markdown files appended as # RULES — <filename>.
transcripts_dirpath""Directory for per-session JSONL transcripts. Empty = disabled.
skills_dirpath"./skills"Base directory for local skill files.
skills[string][]Local skill ids to inject into the system prompt. Resolved from skills_dir.
languagestringnullOutput language for the LLM's reply. ISO code ("es", "en", "en-US") or human name ("Spanish", "español"). When set, the runtime renders a # OUTPUT LANGUAGE system block telling the model to keep workspace docs in English (single source of truth, plays nicely with recall + dreaming) but reply to the user in the configured language. Per-binding language overrides this for the matched channel. See Output language.

Heartbeat

heartbeat:
  enabled: true
  interval: 30s
FieldTypeDefaultPurpose
heartbeat.enabledboolfalseTurn heartbeat on for this agent.
heartbeat.intervalhumantime"5m"Interval between on_heartbeat() fires.

See Agent runtime — Heartbeat.

Runtime knobs

config:
  debounce_ms: 2000
  queue_cap: 32
FieldTypeDefaultPurpose
config.debounce_msu642000Debounce window for burst-of-messages coalescing.
config.queue_capusize32Per-agent mailbox capacity.
sender_rate_limit.rpsf64Per-sender token-bucket refill rate.
sender_rate_limit.burstu64Bucket size.

Agent-to-agent delegation

FieldTypeDefaultPurpose
allowed_delegates[glob][]Peers this agent may delegate to. Empty = no restriction.
accept_delegates_from[glob][]Inverse gate: peers allowed to delegate to this agent.

Routing uses agent.route.<target_id> over NATS with a correlation_id. See Event bus — Agent-to-agent routing.

Dreaming (memory consolidation)

dreaming:
  enabled: false
  interval_secs: 86400
  min_score: 0.35
  min_recall_count: 3
  min_unique_queries: 2
  max_promotions_per_sweep: 20
  weights:
    frequency: 0.24
    relevance: 0.30
    recency: 0.15
    diversity: 0.15
    consolidation: 0.10

Defaults shown. See Soul — Dreaming.

Workspace-git

workspace_git:
  enabled: false
  author_name: "agent"
  author_email: "agent@localhost"

When enabled, the agent's workspace directory is a git repo that the runtime commits to after dream sweeps, forge_memory_checkpoint, and session close. Good for forensic replay.

Google auth (per-agent OAuth)

google_auth:
  client_id: ${GOOGLE_CLIENT_ID}
  client_secret: ${file:./secrets/google_secret.txt}
  scopes:
    - https://www.googleapis.com/auth/gmail.readonly
  token_file: ./data/workspace/ana/google_token.json
  redirect_port: 17653

Used by crates/plugins/google to run OAuth PKCE per agent.

Deprecated in Phase 17 — prefer declaring Google accounts in a dedicated config/plugins/google-auth.yaml and binding them from credentials.google (see next section). Inline google_auth still boots with a warn so existing deployments keep working; it is auto-migrated into the credential store at startup.

Credentials (per-agent WhatsApp / Telegram / Google)

Pins each agent to the plugin instance / Google account it may use for outbound traffic. The runtime resolves the target at publish time from the agent id — the LLM cannot pick the instance via tool args, closing the prompt-injection vector.

credentials:
  whatsapp: personal          # must match whatsapp.yaml instance label
  telegram: ana_bot           # must match telegram.yaml instance label
  google:   ana@gmail.com     # must match google-auth.yaml accounts[].id
  # Silence the "inbound ≠ outbound" warning when intentional:
  # telegram_asymmetric: true

Validated at boot by the gauntlet (agent --check-config runs the same checks without starting the daemon). Omitting credentials: keeps the legacy single-account behavior for back-compat.

Full schema + migration guide: config/credentials.md.

Relationship diagram

flowchart LR
    AG[agent entry] --> MOD[model provider]
    AG --> PL[plugins list]
    AG --> IB[inbound_bindings]
    AG --> AT[allowed_tools]
    AG --> OA[outbound_allowlist]
    AG --> WS[workspace]
    AG --> HB[heartbeat]
    AG --> DEL[delegation gates]
    IB -->|per-binding override| AT
    IB -->|per-binding override| OA
    MOD -->|resolved from| LLM[llm.yaml]
    PL -->|tools from| PLUG[plugins/*.yaml]
    WS -->|files| SOUL[SOUL.md /<br/>IDENTITY.md /<br/>MEMORY.md]

Per-binding capability override

A single agent can expose distinct capability surfaces per InboundBinding without running two agent processes. Typical use: the same Ana agent answers WhatsApp with a narrow sales-only surface and Telegram with the full catalogue.

Schema

Every inbound_bindings[] entry accepts the following optional overrides. Unset fields inherit the agent-level value.

FieldTypeStrategyNotes
allowed_tools[string]replace["*"] = every registered tool
outbound_allowlistobjectreplace (whole)Whatsapp/telegram recipient lists
skills[string]replaceResolved from agent-level skills_dir
modelobjectreplaceMust keep the same provider
system_prompt_extrastringappendRendered as # CHANNEL ADDENDUM block
sender_rate_limitinherit | disable | {rps, burst}3-wayUntagged enum
allowed_delegates[string]replacePeer allowlist for the delegate tool
languagestringreplaceOutput language for replies on this channel. Falls through to the agent-level language field when omitted. See Output language.

Anything else (workspace, transcripts_dir, heartbeat, memory, workspace_git, google_auth) stays at the agent level — identity and persistent state do not change per channel.

Example

agents:
  - id: ana
    model: { provider: anthropic, model: claude-haiku-4-5 }
    plugins: [whatsapp, telegram]
    workspace: ./data/workspace/ana
    skills_dir: ./skills
    system_prompt: |
      You are Ana.
    allowed_tools: []            # agent-level = permissive; bindings narrow
    outbound_allowlist: {}
    inbound_bindings:
      - plugin: whatsapp
        allowed_tools: [whatsapp_send_message]
        outbound_allowlist:
          whatsapp: ["573115728852"]
        skills: []
        sender_rate_limit: { rps: 0.5, burst: 3 }
        system_prompt_extra: |
          Channel: WhatsApp sales. Follow the ETB/Claro lead flow.
      - plugin: telegram
        instance: ana_tg
        allowed_tools: ["*"]
        outbound_allowlist:
          telegram: [1194292426]
        skills: [browser, github, openstreetmap]
        model: { provider: anthropic, model: claude-sonnet-4-5 }
        allowed_delegates: ["*"]
        sender_rate_limit: disable
        system_prompt_extra: |
          Channel: private Telegram. Full tool access allowed.

Boot-time validation

The runtime rejects configs with:

  • Duplicate (plugin, instance) tuples in the same agent.
  • Telegram instance referenced by a binding but not declared in config/plugins/telegram.yaml.
  • Binding model.provider different from the agent-level provider (the LLM client is wired once per agent).
  • Skills listed in a binding whose directory does not exist under skills_dir.

A binding that sets no overrides is allowed but logs a warn.

Matching order

Bindings are evaluated top-to-bottom; the first match wins. If you have both {plugin: telegram, instance: None} (wildcard) and {plugin: telegram, instance: "admin"}, declare the specific entry first — otherwise the wildcard consumes every Telegram event.

Runtime isolation

  • Tool list shown to the LLM is filtered through the binding's allowed_tools; tools hidden on WhatsApp remain invisible even if the LLM hallucinates the name.
  • Tool-call execution re-checks the allowlist and returns not_allowed for anything outside — stops hallucination loops without executing the forbidden tool.
  • Outbound tools (whatsapp_send_message, telegram_send_message) read outbound_allowlist from the matched binding, so WhatsApp sends on the sales channel cannot reach numbers that only the private channel allows.
  • Sender rate limit buckets are keyed per binding; flood on one channel cannot drain the quota on another.

Back-compat

Agents without inbound_bindings keep the pre-feature behavior byte- for-byte: the agent-level allowed_tools is pruned into the base registry at boot, and the runtime synthesises a policy from agent- level defaults (keyed at binding_index = usize::MAX).

Output language

Operators pin the language an agent replies in without rewriting workspace markdown. Workspace docs (IDENTITY, SOUL, MEMORY, USER, AGENTS) and tool descriptions stay in English — the single source of truth that recall, dreaming, vector search, and developer tooling all read. The runtime injects a # OUTPUT LANGUAGE system block right after the agent's system_prompt, telling the model to read those docs as-is but reply to the user in the configured language.

Where to set it

agents:
  - id: ana
    language: es                # default for every binding on this agent
    inbound_bindings:
      - plugin: whatsapp
        # → uses Spanish (inherits from the agent)
      - plugin: telegram
        instance: support_intl
        language: en            # → uses English on this channel only
      - plugin: telegram
        instance: bilingual_qa
        language: ""            # → no directive (model picks)

Resolution

Precedence (first non-empty wins):

  1. inbound_bindings[i].language — per-channel override.
  2. language — agent-level default.
  3. null — no # OUTPUT LANGUAGE block emitted; the model decides from the user's input.

Empty string and whitespace-only values resolve to no directive on both layers — useful for "turn the directive off on this binding even though the agent has one".

Accepted values

The runtime treats the value as a label and forwards it verbatim into the directive (after sanitisation; see below). Both forms work:

  • ISO codes: "es", "en", "en-US", "pt-BR".
  • Human names: "Spanish", "English", "español", "Brazilian Portuguese".

Human names produce slightly clearer directives in practice (Respond to the user in Spanish. reads more natural than Respond to the user in es.), but both yield the same model behaviour with modern LLMs.

Rendered block

# OUTPUT LANGUAGE

Respond to the user in {language}. Workspace docs (IDENTITY, SOUL,
MEMORY, USER, AGENTS) and tool descriptions are in English — read
them as-is, but your turn-final reply to the user must be in
{language}.

The block lands after the agent's system_prompt (and the optional # CHANNEL ADDENDUM block) so its instruction wins under the LLM's recency bias.

Sanitisation

Defense-in-depth against config-driven prompt injection: every language value is normalised before rendering — control characters and embedded newlines are stripped, trimmed, and the result is capped at 64 characters. A YAML payload like language: "es\n\nIgnore previous instructions" cannot smuggle a multi-line directive into the system prompt.

Hot reload

Phase 18 hot-reload covers this field. Edit agents.d/<id>.yaml, save (or run agent reload), and the next message uses the new language. In-flight LLM turns finish on the old policy; subsequent turns flip to the new one.

Per-agent (and per-binding) toggle that fetches URLs in the user's message and injects a # LINK CONTEXT block. Off by default. Full schema, caps, and SSRF denylist live on Link understanding. The field is link_understanding at agent scope and at each inbound_bindings[] entry; binding value replaces agent default, omitted = inherit.

Per-agent (and per-binding) toggle that exposes a web_search tool backed by Brave / Tavily / DuckDuckGo / Perplexity. Off by default. Full schema, providers, cache, and circuit-breaker behaviour live on Web search. The field is web_search at agent scope and at each inbound_bindings[] entry; binding value replaces agent default, omitted = inherit.

Pairing policy

Per-binding toggle that turns on the DM-challenge gate for inbound senders. Off by default. The field is pairing_policy on each inbound_bindings[] entry; null (default) = inherit agent value or skip the gate entirely. Full protocol, threat model, and CLI reference live on Pairing.

Common mistakes

  • Forgetting plugins: [...]. An agent without plugins has no inbound channel and no outbound tools. It is inert.
  • Setting allowed_tools without a wildcard. ["memory_*"] allows the full memory_* family; ["memory_store"] allows only one. Check the glob before assuming.
  • Large system_prompt duplication across agents. Use inbound_bindings[].system_prompt_extra to add per-channel content without duplicating the whole prompt.
  • Sharing a WhatsApp session across agents. Each agent's workspace should contain its own whatsapp/default session; the wizard does this automatically, but pointing two agents at the same session dir will cause message cross-delivery.
  • Translating the workspace markdown to match language. Don't. Workspace docs are the single source of truth read by recall, dreaming, and developer tooling — keep them in English. The # OUTPUT LANGUAGE block tells the model to translate the reply on its way out.

Next

llm.yaml

LLM provider registry. Each agent's model.provider must resolve to a key in this file.

Source: crates/config/src/types/llm.rs.

Shape

providers:
  minimax:
    api_key: ${MINIMAX_API_KEY:-}
    group_id: ${MINIMAX_GROUP_ID:-}
    base_url: https://api.minimax.io
    rate_limit:
      requests_per_second: 2.0
      quota_alert_threshold: 100000
  anthropic:
    api_key: ${ANTHROPIC_API_KEY:-}
    base_url: https://api.anthropic.com
    rate_limit:
      requests_per_second: 2.0
    auth:
      mode: oauth_bundle
      bundle: ./secrets/anthropic_oauth.json
retry:
  max_attempts: 5
  initial_backoff_ms: 1000
  max_backoff_ms: 60000
  backoff_multiplier: 2.0

Per-provider fields

FieldTypeRequiredDefaultPurpose
api_keystringAPI key. Supports ${ENV_VAR} and ${file:…}.
base_urlurlAPI endpoint. Override to use a proxy or a local server.
group_idstringMiniMax-only. Group identifier.
rate_limit.requests_per_secondf642.0Outbound throttle.
rate_limit.quota_alert_thresholdu64Optional soft-alarm tokens-per-day threshold.
api_flavorenumopenai_compatopenai_compat or anthropic_messages. Lets MiniMax expose the Anthropic wire.
embedding_modelstringOverride model used for embeddings (e.g. Gemini's text-embedding-004).
safety_settingsJSONGemini-only; attached verbatim to requests.

Top-level retry block

Applies to every provider that doesn't define its own:

FieldDefaultPurpose
max_attempts5Total attempts including the first try.
initial_backoff_ms1000First backoff.
max_backoff_ms60000Cap.
backoff_multiplier2.0Exponential factor.

Retries are jittered to avoid thundering-herd reconnects. See Fault tolerance — Retry policies.

Auth modes

auth:
  mode: auto | static | token_plan | oauth_bundle
  bundle: ./secrets/anthropic_oauth.json
  setup_token_file: ./secrets/anthropic_setup.json
  refresh_endpoint: https://auth.example.com/refresh
  client_id: your-oauth-client
modeWhen
autoLet the provider client decide from available credentials.
staticUse api_key verbatim.
token_planMiniMax "Token Plan" OAuth bundle.
oauth_bundleAnthropic PKCE OAuth bundle written by agent setup.

Supported providers

KeyNotes
minimaxPrimary provider. MiniMax M2.5. OpenAI-compat or Anthropic-flavour wire.
anthropicClaude models. API key or OAuth subscription.
openaiOpenAI API and anything speaking its wire (Ollama, Groq, local proxies).
geminiGoogle Gemini, including embedding support.

Provider-specific docs

Common mistakes

  • api_key: sk-… committed to git. Use ${ENV_VAR} or ${file:./secrets/…}; the secrets/ directory is gitignored.
  • Mismatched embedding_model dimensions. The vector store asserts embedding.dimensions matches the model output. A mismatch aborts startup with an explicit message.
  • Setting both api_key and auth.mode: oauth_bundle. The auth mode wins. The api_key is kept as a fallback for tools that bypass the OAuth path.

Input-token reduction (context_optimization)

Four independent kill switches for prompt caching, online history compaction, pre-flight token counting, and the workspace bundle cache. Full schema, defaults, and rollout guidance in Operations → Context optimization.

broker.yaml

Broker topology, disk persistence, and fallback behavior.

Source: crates/config/src/types/broker.rs.

Shape

broker:
  type: nats          # nats | local
  url: nats://localhost:4222
  auth:
    enabled: false
    nkey_file: ./secrets/nats.nkey
  persistence:
    enabled: true
    path: ./data/queue
  limits:
    max_payload: 4MB
    max_pending: 10000
  fallback:
    mode: local_queue
    drain_on_reconnect: true

Fields

FieldTypeDefaultPurpose
typenats | locallocallocal keeps the whole bus in-process; nats uses a real NATS server.
urlurlNATS connection URL (ignored when type: local).
auth.enabledboolfalseTurn on NKey mTLS.
auth.nkey_filepathPath to the NKey file when auth.enabled.
persistence.enabledbooltrueTurn on the SQLite disk queue.
persistence.pathpath./data/queueDirectory for the disk queue SQLite DB.
limits.max_payloadsize4MBReject events larger than this.
limits.max_pendingu6410000Hard cap on the disk queue; past this, oldest events are shed.
fallback.modelocal_queue | droplocal_queueWhat to do when NATS is unreachable.
fallback.drain_on_reconnectbooltrueReplay the disk queue when NATS returns.

Operational notes

  • type: local for single-machine dev. You don't need NATS running just to try the agent. The local broker matches NATS subject semantics, so everything works the same.
  • Disk queue always on in production. Even on a single machine. It's the guarantee against losing events on a NATS blip.
  • drain_on_reconnect: true is FIFO. See Event bus — Disk queue.

See also:

memory.yaml

Short-term sessions, long-term SQLite storage, and optional vector search.

Source: crates/config/src/types/memory.rs.

Shape

short_term:
  max_history_turns: 50
  session_ttl: 24h
  max_sessions: 10000

long_term:
  backend: sqlite
  sqlite:
    path: ./data/memory.db

vector:
  enabled: false
  backend: sqlite-vec
  default_recall_mode: hybrid
  embedding:
    provider: http
    base_url: https://api.openai.com/v1
    model: text-embedding-3-small
    api_key: ${OPENAI_API_KEY}
    dimensions: 1536
    timeout_secs: 30

Short-term

Per-session conversation buffer held in memory by SessionManager.

FieldDefaultPurpose
max_history_turns50Turns kept before oldest are pruned into long-term memory.
session_ttl24hHow long a session lives idle before eviction. humantime syntax.
max_sessions10000Soft cap. On overflow the oldest-idle session is evicted (fires on_expire). 0 = unbounded.

Long-term

Persisted memory, durable across restarts.

FieldOptionsDefaultPurpose
backendsqlite | redissqliteStorage engine.
sqlite.pathpath./data/memory.dbSQLite file (with sqlite-vec extension loaded when vector enabled).
redis.urlurlRedis connection string (when backend: redis).

Vector

Opt-in semantic memory.

FieldDefaultPurpose
enabledfalseOpt-in.
backendsqlite-vecZero-extra-infrastructure vector index.
default_recall_modehybridUsed when the memory tool call omits mode. Options: keyword, vector, hybrid.
embedding.providerhttpWhere to fetch embeddings. http = any OpenAI-compatible embeddings server.
embedding.base_urlEmbeddings endpoint.
embedding.modelModel id, e.g. text-embedding-3-small, nomic-embed-text.
embedding.api_keyKey for the embeddings server. Supports ${ENV_VAR} / ${file:…}.
embedding.dimensionsMust match the model output (1536 for OpenAI 3-small; 768 for nomic). Mismatch aborts startup.
embedding.timeout_secs30Embeddings request timeout.

Memory layers

flowchart LR
    MSG[incoming message] --> STM[short-term<br/>in-memory buffer]
    STM -->|turns exceed max| PRUNE[prune]
    PRUNE --> LTM[(long-term<br/>SQLite)]
    LTM --> EMB{vector<br/>enabled?}
    EMB -->|yes| VEC[(sqlite-vec index)]
    TOOL[memory tool] --> RECALL{recall mode}
    RECALL -->|keyword| LTM
    RECALL -->|vector| VEC
    RECALL -->|hybrid| LTM
    RECALL -->|hybrid| VEC

Per-agent isolation

Each agent's memory DB lives under its workspace when workspace_git is enabled — keeps memories forensically reviewable and prevents one agent from reading another's history.

See also:

Drop-in agents

config/agents.d/*.yaml is a merge-directory for agent definitions that should not live in agents.yaml — typically anything with business content (sales prompts, pricing tables, internal phone numbers, customer-facing identities).

Source: crates/config/src/lib.rs (merge logic).

Why it exists

  • Keep agents.yaml public-safe and checked into git
  • Keep sensitive content gitignored and loaded at runtime
  • Compose layered configs (00-dev.yaml, 10-prod.yaml) without editing a single monolithic file
  • Ship .example.yaml templates so the shape stays discoverable

.gitignore rules include:

config/agents.d/*.yaml
!config/agents.d/*.example.yaml

The .example.yaml files are committed and serve as templates; the real .yaml files are not.

Merge order

Files are loaded in lexicographic filename order and their agents arrays are concatenated to the base agents.yaml:

flowchart TD
    BASE[agents.yaml] --> MERGE[merged catalog]
    D1[agents.d/00-shared.yaml] --> MERGE
    D2[agents.d/10-ana.yaml] --> MERGE
    D3[agents.d/20-kate.yaml] --> MERGE
    EX[agents.d/ana.example.yaml] -.->|gitignored template<br/>usually not loaded| MERGE

Every file must have the top-level agents: [...] shape:

# config/agents.d/10-ana.yaml
agents:
  - id: ana
    model:
      provider: minimax
      model: MiniMax-M2.5
    plugins: [whatsapp]
    inbound_bindings:
      - plugin: whatsapp
    system_prompt: |
      …private content…

Agent id collisions

Two files cannot define the same agent.id. On collision the loader fails fast with a clear message. If you want to override an agent, either:

  • Replace the entry (rename or remove the original)
  • Use inbound_bindings[] per-binding overrides inside a single entry

Common patterns

Public vs. private split

config/agents.yaml                  # committed, only support/ops agents
config/agents.d/ana.yaml            # gitignored, full sales prompt
config/agents.d/kate.yaml           # gitignored, personal assistant
config/agents.d/ana.example.yaml    # committed, empty template

Environment layering

config/agents.d/00-common.yaml      # shared defaults
config/agents.d/10-dev.yaml         # dev-only overrides (loaded only on dev box)

Swap the 10-*.yaml file per environment. Docker compose can mount the right one from a secret volume.

Validation

  • #[serde(deny_unknown_fields)] still applies to every file
  • validate_agents() runs after the merge — checks duplicate ids, missing plugin references, invalid skill directories
  • Errors name the file and the offending agent id

Per-agent credentials

Bind each agent to specific WhatsApp / Telegram / Google accounts so outbound traffic originates from the right number, bot, or mailbox — never from a shared pool.

Mental model

Three layers:

  1. Plugin instance — a labelled WhatsApp session or Telegram bot in config/plugins/{whatsapp,telegram}.yaml. Each instance owns its own token / session_dir and an optional allow_agents list.
  2. Google account — an entry in the optional config/plugins/google-auth.yaml. Each account is 1:1 with an agent_id.
  3. Agent binding — in config/agents.d/<agent>.yaml, the credentials: block pins the agent to the instance / account it may use for outbound tool calls.

The runtime runs a boot-time gauntlet that cross-checks all three layers before any plugin boots. Every invariant violation surfaces in a single report so you can fix the full YAML in one edit.

Config schemas

config/agents.d/ana.yaml

agents:
  - id: ana
    credentials:
      whatsapp: personal        # must match whatsapp.yaml instance
      telegram: ana_bot         # must match telegram.yaml instance
      google:   ana@gmail.com   # must match google-auth.yaml accounts[].id
      # Opt-out for the symmetric-binding warning when inbound bot and
      # outbound bot are intentionally different:
      # telegram_asymmetric: true
    inbound_bindings:
      - { plugin: whatsapp, instance: personal }
      - { plugin: telegram, instance: ana_bot }

config/plugins/whatsapp.yaml

whatsapp:
  - instance: personal
    session_dir: ./data/workspace/ana/whatsapp/personal
    media_dir:   ./data/media/whatsapp/personal
    allow_agents: [ana]           # defense-in-depth ACL
  - instance: work
    session_dir: ./data/workspace/kate/whatsapp/work
    media_dir:   ./data/media/whatsapp/work
    allow_agents: [kate]

config/plugins/telegram.yaml

telegram:
  - instance: ana_bot
    token: ${file:./secrets/telegram/ana_token.txt}
    allow_agents: [ana]
    allowlist:
      chat_ids: [1194292426]
  - instance: kate_bot
    token: ${file:./secrets/telegram/kate_token.txt}
    allow_agents: [kate]

config/plugins/google-auth.yaml

google_auth:
  accounts:
    - id: ana@gmail.com
      agent_id: ana                       # 1:1 — the gauntlet enforces it
      client_id_path:     ./secrets/google/ana_client_id.txt
      client_secret_path: ./secrets/google/ana_client_secret.txt
      token_path:         ./secrets/google/ana_token.json
      scopes:
        - https://www.googleapis.com/auth/gmail.modify

Agents that still declare the legacy inline google_auth block are auto-migrated into this store on boot (a warning tells you to migrate).

What the gauntlet validates

CheckLenientStrict
Duplicate session_dir across instanceserrorerror
session_dir that is a parent of anothererrorerror
Credential file with lax permissions (linux 0o077)errorerror
credentials.<ch> points to an instance that does not existerrorerror
Agent listens on >1 instance without declaring credentials.<ch>errorerror
Instance allow_agents excludes a binding agenterrorerror
Inbound instance ≠ outbound instance (no <ch>_asymmetric)warnerror
Inline agents.<id>.google_auth without matching google-auth.yamlwarnwarn

Linux permission check is skipped for /run/secrets/* (Docker secrets) and can be disabled entirely with CHAT_AUTH_SKIP_PERM_CHECK=1.

Topics

Outbound tool calls land on instance-suffixed topics when the resolver has a binding:

plugin.outbound.whatsapp.<instance>
plugin.outbound.telegram.<instance>

Unlabelled (instance: None) plugin entries keep publishing to the legacy bare topic plugin.outbound.whatsapp / plugin.outbound.telegram for full back-compat.

CLI gate

# Run the full gauntlet without booting the daemon. Exits 0 clean,
# 1 on errors, 2 on warnings-only.
agent --config ./config --check-config

# Promote warnings to errors (CI lane).
agent --config ./config --check-config --strict

The gate scans agents.yaml, every agents.d/*.yaml, whatsapp.yaml, telegram.yaml, and google-auth.yaml. Sample failure:

credentials: FAILED with 1 error(s):
   1. agent 'ana_per_binding_example' binds credentials.telegram='ana_tg' but no such telegram instance exists (available: [])

Secrets in logs

The credential layer never logs a raw account id. Every reference is via an 8-byte sha256(account_id) fingerprint rendered as hex:

2025-04-24T16:03:42Z INFO credentials.audit agent="ana" channel="whatsapp" fp=a3f2…7c direction=outbound

The fingerprint is pinned — switching the algorithm is an explicit breaking change tracked by crates/auth/tests/fingerprint_stability.rs.

Observability

Nine Prometheus series land at /metrics:

SeriesTypeLabels
credentials_accounts_totalgaugechannel
credentials_bindings_totalgaugeagent, channel
channel_account_usage_totalcounteragent, channel, direction, instance
channel_acl_denied_totalcounteragent, channel, instance
credentials_resolve_errors_totalcounterchannel, reason
credentials_breaker_stategaugechannel, instance
credentials_boot_validation_errors_totalcounterkind
credentials_insecure_paths_totalgauge
credentials_google_token_refresh_totalcounteraccount_fp, outcome

Back-compat

  • Configs without a credentials: block keep working — the resolver infers outbound from the single inbound_bindings entry when it is unambiguous; otherwise outbound tools are marked unbound and fall back to the legacy bare topic.
  • Plugin entries with instance: None stay on the legacy bare topic.
  • agents.<id>.google_auth still registers google_* tools for that agent; google-auth.yaml is preferred going forward.

Hot-reload (no daemon restart)

Edit agents.d/*.yaml, plugins/whatsapp.yaml, plugins/telegram.yaml, or plugins/google-auth.yaml, then trigger a reload via the loopback admin endpoint:

curl -fsSX POST http://127.0.0.1:9091/admin/credentials/reload | jq
{
  "accounts_wa": 2,
  "accounts_tg": 2,
  "accounts_google": 1,
  "warnings": [],
  "version": 4
}

The resolver runs the gauntlet against the fresh files, then atomically swaps bindings in place. Plugin tools holding Arc<…> references see the new state on their next call. Failure mode: gauntlet errors return HTTP 400 with the error list; the previous bindings stay active so a typo in YAML does not knock out the runtime.

CredentialHandles already issued to in-flight tool calls keep working — handles are by-value clones; the resolver only mediates lookup of future calls.

What the reload does NOT cover

  • Adding a brand-new WhatsApp / Telegram instance still requires a restart for the plugin (each instance owns its own session_dir
    • websocket). The resolver picks up the new account but the plugin side stays as-was until next boot.
  • Removing an account leaks its breaker entry in BreakerRegistry until restart. No correctness impact.

Google client_id / client_secret rotation

Rewriting the secret files (./secrets/<agent>_google_client_id.txt, ..._client_secret.txt) is picked up automatically on the next google_* tool call — GoogleAuthClient checks file mtime before each network hop and re-reads when it advanced. No reload call required for that case. Audit log line:

INFO credentials.audit event="google_secrets_refreshed" \
  google_*: re-read client_id/client_secret after on-disk rotation

Strict mode

agent --check-config --strict promotes warnings to errors. Two checks behave differently under strict:

ConditionLenientStrict
Inline agents.<id>.google_auth block (legacy)warn + auto-migrateBuildError::LegacyInlineGoogleAuth, fail boot
Asymmetric inbound ≠ outbound (no <ch>_asymmetric: true)warnerror

Run --strict in CI to gate PRs that touch credential YAML.

Migrating

  1. Add instance: + allow_agents: to each entry in whatsapp.yaml / telegram.yaml.
  2. Create config/plugins/google-auth.yaml with one accounts[] per agent that needs Gmail.
  3. Add credentials: to each agents.d/*.yaml.
  4. Run agent --check-config --strict. Fix every listed error.
  5. Commit.

pollers.yaml

The Phase 19 generic poller subsystem. One runner orchestrates N modules — each module is an impl Poller (gmail, rss, calendar, webhook_poll, or anything you write yourself) — and every module shares the same scheduler, lease, breaker, cursor persistence, and outbound dispatch via Phase 17 credentials.

Source: crates/poller/, crates/config/src/types/pollers.rs.

Top-level shape

pollers:
  enabled: true
  state_db: ./data/poller.db
  default_jitter_ms: 5000
  lease_ttl_factor: 2.0
  failure_alert_cooldown_secs: 3600
  breaker_threshold: 5
  jobs:
    - id: ana_leads
      kind: gmail
      agent: ana
      schedule: { every_secs: 60 }
      config:
        query: "is:unread subject:lead"
        deliver: { channel: whatsapp, to: "57300...@s.whatsapp.net" }
        message_template: |
          New lead 🚨
          {snippet}

Absent file → subsystem off (no jobs spawn, no admin endpoint).

Top-level fields

FieldDefaultPurpose
enabledtrueMaster switch. false skips everything below.
state_db./data/poller.dbSQLite path for poll_state + poll_lease. Created if missing.
default_jitter_ms5000Random offset added to next_run_at when a job's schedule does not declare its own. Avoids thundering herd.
lease_ttl_factor2.0Lease TTL = factor × interval (min 30s). A daemon that crashes mid-tick releases the lease via expiry; another worker takes over without rerunning side effects unless your module is non-idempotent.
failure_alert_cooldown_secs3600Per-job cooldown for failure_to alerts. Persisted in poll_state.last_failure_alert_at so it survives restarts.
breaker_threshold5Consecutive Transient errors before the per-job circuit breaker opens.
jobs[]Per-job entries (see below).

Per-job fields

FieldRequiredPurpose
idUnique. Used as session key for state, metrics, admin endpoints, lease.
kindDiscriminator. Must match a registered Poller::kind() (see Built-ins and Build a poller).
agentAgent whose Phase 17 credentials this job uses. The runner looks up the binding for whatever channel the module needs (Google for fetch, WhatsApp/Telegram for outbound, etc).
scheduleOne of every, cron, at (see Schedules).
configModule-specific options. Validated by Poller::validate at boot. Bad config rejects this job only — siblings keep loading.
failure_to{ channel, to } for an alert when consecutive_errors crosses breaker_threshold. Optional — omit to log only.
paused_on_bootfalsePersist paused = 1 in state at startup. Useful for staged rollouts.

Schedules

# Repeat every N seconds. Most common.
schedule: { every_secs: 60 }

# 6-field cron: sec min hour dom mon dow.
schedule:
  cron: "0 */5 * * * *"          # every 5 minutes on the boundary
  tz: "America/Bogota"           # accepted; evaluated in UTC unless cron-tz feature on
  stagger_jitter_ms: 2000        # local override for this job

# One-shot at an RFC3339 instant. After it fires the job stays paused.
schedule: { at: "2026-04-26T15:00:00Z" }

Built-ins

kindPurposeCursorAuth
gmailSearch Gmail, regex extract, dispatchReserved (Gmail UNREAD + mark_read does dedup)Phase 17 Google
rssRSS / Atom feedsETag + bounded seen-id ringNone
webhook_pollGeneric JSON GET / POSTBounded seen-id ringNone / custom headers
google_calendarCalendar v3 events incremental syncnextSyncTokenPhase 17 Google

gmail

- id: ana_leads
  kind: gmail
  agent: ana
  schedule: { every_secs: 60 }
  config:
    query: "is:unread subject:(lead OR interesado)"
    newer_than: "1d"             # avoids back-filling years on first deploy
    max_per_tick: 20
    dispatch_delay_ms: 1000      # throttle between dispatches in same tick
    sender_allowlist: ["@mycompany.com"]
    extract:
      name: "Nombre:\\s*(.+)"
      phone: "Tel:\\s*(\\+?\\d+)"
    require_fields: [name, phone]
    message_template: |
      New lead 🚨 {name} — {phone}
      {snippet}
    mark_read_on_dispatch: true
    deliver: { channel: whatsapp, to: "57300...@s.whatsapp.net" }

Multiple gmail jobs for the same agent share a cached GoogleAuthClient — token refreshes happen once across all jobs.

google_* errors are classified: 401 / invalid_grant / revokedPermanent (auto-pause), 5xx / network → Transient (backoff).

rss

- id: ana_blog_watch
  kind: rss
  agent: ana
  schedule: { every_secs: 600 }
  config:
    feed_url: https://example.com/feed.xml
    max_per_tick: 5
    message_template: "{title}\n{link}"
    deliver: { channel: telegram, to: "1194292426" }

ETag from the previous response is sent as If-None-Match. 304 Not Modified produces a zero-cost tick.

webhook_poll

- id: ana_jira_assigned
  kind: webhook_poll
  agent: ana
  schedule: { every_secs: 300 }
  config:
    url: https://company.atlassian.net/rest/api/3/search
    method: GET
    headers:
      Authorization: "Bearer ${JIRA_TOKEN}"
      Accept: "application/json"
    items_path: "issues"        # dotted path to the array; "" for root
    id_field: "id"              # field used for dedup
    max_per_tick: 10
    message_template: "[{key}] {fields}"
    deliver: { channel: telegram, to: "1194292426" }
    # SSRF guard — must opt in to hit private / loopback hosts:
    # allow_private_networks: true

401 / 403Permanent. Any other 4xx → Permanent. 5xx → Transient.

google_calendar

- id: ana_calendar_sync
  kind: google_calendar
  agent: ana
  schedule: { every_secs: 300 }
  config:
    calendar_id: primary
    skip_cancelled: true
    message_template: "📅 {summary} — {start}\n{html_link}"
    deliver: { channel: telegram, to: "1194292426" }

First tick captures nextSyncToken and dispatches nothing (baseline). Subsequent ticks use syncToken=... and dispatch the diff. 410 Gone (token expired) is classified Permanent — operator runs agent pollers reset <id> to re-baseline.

Multi-job per built-in

Same agent + same kind, multiple jobs — completely independent. The runner gives each its own cursor, breaker, schedule, metrics, and pause/resume controls. The GoogleAuthClient is the only thing shared (intentional, so quota and refresh costs aren't multiplied).

# Three Gmail polls for Ana, all independent
- id: ana_leads
  kind: gmail
  agent: ana
  schedule: { every_secs: 60 }
  config:
    query: "is:unread label:lead"
    deliver: { channel: whatsapp, to: "57300...@s.whatsapp.net" }
    # …

- id: ana_invoices
  kind: gmail
  agent: ana
  schedule: { every_secs: 600 }
  config:
    query: "is:unread label:invoice"
    deliver: { channel: telegram, to: "1194292426" }
    # …

- id: ana_alerts
  kind: gmail
  agent: ana
  schedule: { cron: "0 */15 * * * *" }
  config:
    query: "is:unread from:monitor@infra.com"
    deliver: { channel: telegram, to: "9876543210" }
    # …

Pause ana_invoices independently with agent pollers pause ana_invoices.

CLI

agent pollers list                 # plain table; --json for machine output
agent pollers show ana_leads      # detail of one job
agent pollers run ana_leads       # manual tick (bypasses schedule + lease)
agent pollers pause ana_invoices  # paused = 1
agent pollers resume ana_invoices
agent pollers reset ana_calendar_sync --yes  # destructive; clears cursor
agent pollers reload              # re-read pollers.yaml + diff

The daemon must be running (CLI hits the loopback admin server at 127.0.0.1:9091).

Admin endpoints

GET  /admin/pollers
GET  /admin/pollers/<id>
POST /admin/pollers/<id>/run
POST /admin/pollers/<id>/pause
POST /admin/pollers/<id>/resume
POST /admin/pollers/<id>/reset
POST /admin/pollers/reload

reload returns a ReloadPlan JSON: { add, replace, remove, keep }. Validation runs across every job in the new file before any task is touched — a typo never knocks healthy siblings offline.

Agent tools

When the poller subsystem is up, every agent gets six LLM-callable tools registered on its ToolRegistry:

ToolEffect
pollers_listList every job + status
pollers_showInspect one job
pollers_runTrigger a tick out-of-band
pollers_pauseSet paused = 1
pollers_resumeSet paused = 0
pollers_resetWipe cursor + errors (destructive)

Each registered Poller impl can also expose per-kind custom tools via Poller::custom_tools() — gmail ships gmail_count_unread out of the box. See Build a poller.

Create / delete are intentionally not exposed: prompt-injection could plant a webhook_poll aimed at internal infra. Operators own pollers.yaml + agent pollers reload.

Failure-destination

- id: ana_leads
  kind: gmail
  # …
  failure_to:
    channel: telegram
    to: "1194292426"     # alerts on the operator's chat

When the per-job circuit breaker trips (consecutive_errors >= breaker_threshold), the runner publishes a text message to the configured channel (resolved via Phase 17 just like the happy path) and records the timestamp for cooldown gating. Cooldown is failure_alert_cooldown_secs global default, overridable per job in a future revision.

Observability

Seven Prometheus series exposed under /metrics:

SeriesTypeLabels
poller_ticks_totalcounterkind, agent, job_id, status={ok,transient,permanent,skipped}
poller_latency_mshistogramkind, agent, job_id
poller_items_seen_totalcounterkind, agent, job_id
poller_items_dispatched_totalcounterkind, agent, job_id
poller_consecutive_errorsgaugejob_id
poller_breaker_stategaugejob_id (0=closed, 1=half-open, 2=open)
poller_lease_takeovers_totalcounterjob_id

Migrating from gmail-poller.yaml

The legacy crate nexo-plugin-gmail-poller keeps its YAML schema but no longer drives its own loop. On boot the wizard auto-translates every legacy job into a kind: gmail entry, folds it into cfg.pollers.jobs, and logs a deprecation warn. Explicit entries in pollers.yaml win on id collision so a manual migration is never clobbered.

To migrate cleanly:

  1. Run agent --check-config to print every translated id.
  2. Copy each into config/pollers.yaml under pollers.jobs, adjusting the agent: field if the legacy agent_id was inferred.
  3. Delete config/plugins/gmail-poller.yaml.

MiniMax M2.5

MiniMax M2.5 is the primary LLM provider for nexo-rs. It's the first provider implemented and the recommended default for new agents.

Source: crates/llm/src/minimax.rs, crates/llm/src/minimax_auth.rs.

Why it's primary

  • Strong tool-calling support on both the OpenAI-compat wire and the Anthropic Messages wire
  • Token Plan auth lets you run agents on a subscription without per-request billing headaches
  • Aggressive price/performance for multi-agent deployments

If you don't have a specific reason to pick another provider, start with MiniMax.

Configuration

# config/llm.yaml
providers:
  minimax:
    api_key: ${MINIMAX_API_KEY:-}
    group_id: ${MINIMAX_GROUP_ID:-}
    base_url: https://api.minimax.io
    rate_limit:
      requests_per_second: 2.0
      quota_alert_threshold: 100000

Per-agent selection:

# config/agents.d/ana.yaml
agents:
  - id: ana
    model:
      provider: minimax
      model: MiniMax-M2.5

Wire formats (api_flavor)

MiniMax exposes two HTTP shapes. The client auto-detects from base_url but can be overridden via api_flavor.

api_flavorEndpointShapeWhen
openai_compat (default){base_url}/text/chatcompletion_v2OpenAI chat completionsRegular API keys, most use cases
anthropic_messages{base_url}/v1/messagesAnthropic MessagesToken Plan / Coding keys served at api.minimax.io/anthropic

Auto-detection: if base_url ends in /anthropic, the client picks anthropic_messages automatically.

Authentication

Static API key

Simple path: put the key in env or a secrets file.

Env var precedence (first wins):

  1. MINIMAX_CODE_PLAN_KEY
  2. MINIMAX_CODING_API_KEY
  3. ./secrets/minimax_code_plan_key.txt
  4. api_key field in llm.yaml

Token Plan OAuth bundle

For subscription-based access. The wizard writes a bundle to ./secrets/minimax_token_plan.json:

{
  "access_token": "...",
  "refresh_token": "...",
  "expires_at": "2026-05-01T12:00:00Z",
  "region": "https://api.minimax.io"
}

Auto-refresh: 60 seconds before expires_at, a background task POSTs to {region}/oauth/token with grant_type=refresh_token and rewrites the bundle atomically. Concurrent refreshes are serialized behind a mutex — you never get two refresh calls in flight.

Mid-flight 401: if an API call returns 401 while holding what we thought was a valid token (clock skew, revocation), the client force-refreshes once and retries the request. A second 401 is surfaced as a credential error.

Shared OAuth client id for the MiniMax Portal flow: 78257093-7e40-4613-99e0-527b14b39113.

Request / response flow

sequenceDiagram
    participant A as Agent loop
    participant RL as RateLimiter
    participant C as MiniMaxClient
    participant AU as AuthSource
    participant MX as MiniMax API

    A->>C: chat(ChatRequest)
    C->>RL: acquire()
    C->>AU: fresh_bearer()
    AU->>AU: refresh if <60s to expiry
    AU-->>C: access_token
    C->>MX: POST chatcompletion_v2 / v1/messages
    alt 200
        MX-->>C: ChatResponse
    else 401
        C->>AU: force_refresh()
        C->>MX: retry once
    else 429
        MX-->>C: Retry-After
        C-->>A: LlmError::RateLimit
    else 5xx
        MX-->>C: error body
        C-->>A: LlmError::ServerError
    end

Supported features

FeatureOpenAI-compatAnthropic-messages
Chat completions
Tool calling
Streaming (SSE)
Token usage in stream✅ (stream_options.include_usage)✅ native
Multimodal (images)
JSON modelimited

Rate limiting

Per-provider token bucket. requests_per_second: 2.0 refills one slot every 500 ms. Acquired before every request.

An optional quota_alert_threshold emits a structured warn log when the remaining quota (if the provider reports it) crosses the threshold. Useful for Prometheus alerting.

Error classification

ResponseMappingBehavior
429LlmError::RateLimit { retry_after_ms }Retried by the LLM retry layer (up to 5 attempts)
5xxLlmError::ServerError { status, body }Retried (up to 3 attempts)
401Internal auth refresh + single retry, then LlmError::CredentialInvalidFail-fast after refresh attempt
Other 4xxLlmError::OtherFail fast

See Retry & rate limiting.

Common mistakes

  • Forgetting group_id. MiniMax requires a group id alongside the key for most endpoints. The wizard sets this; manual configs often miss it.
  • Pointing base_url at /anthropic with a regular API key. That endpoint is for Token Plan / Coding keys only — regular keys will 401. Leave base_url at https://api.minimax.io.
  • Refreshing the bundle manually mid-flight. The client already serializes refreshes. Editing the file while the agent runs can lead to an atomic write race — stop the agent, edit, restart.

Anthropic / Claude

Native Anthropic client with multiple authentication paths: static API key, setup tokens, full OAuth PKCE subscription flow, or automatic import from the local Claude Code CLI.

Source: crates/llm/src/anthropic.rs, crates/llm/src/anthropic_auth.rs. Phase 15 added the subscription flow end-to-end.

Configuration

# config/llm.yaml
providers:
  anthropic:
    api_key: ${ANTHROPIC_API_KEY:-}
    base_url: https://api.anthropic.com
    rate_limit:
      requests_per_second: 2.0
    auth:
      mode: oauth_bundle
      bundle: ./secrets/anthropic_oauth.json

Per-agent selection:

model:
  provider: anthropic
  model: claude-haiku-4-5

Authentication modes

auth.modeCredentialHeader
staticapi_key (sk-ant-…)x-api-key: <key>
setup_tokensk-ant-oat01-… (min 80 chars)Authorization: Bearer <key> + anthropic-beta: oauth-2025-04-20
oauth_bundle{access, refresh, expires_at} JSONAuthorization: Bearer <access>
autotries all of the above in order

auto resolution order

Used when auth.mode: auto or omitted:

flowchart TD
    START[anthropic client build] --> B1{oauth_bundle<br/>file exists?}
    B1 -->|yes| USE1[use OAuth bundle]
    B1 -->|no| B2{Claude Code CLI<br/>credentials found?}
    B2 -->|yes| USE2[import from<br/>~/.claude/.credentials.json]
    B2 -->|no| B3{setup_token<br/>file exists?}
    B3 -->|yes| USE3[use setup token]
    B3 -->|no| B4{api_key<br/>set?}
    B4 -->|yes| USE4[use static key]
    B4 -->|no| FAIL([fail: no credentials])

OAuth bundle

The wizard runs a PKCE flow in the browser and writes the bundle to ./secrets/anthropic_oauth.json:

{
  "access_token": "...",
  "refresh_token": "...",
  "expires_at": "2026-05-01T12:00:00Z"
}
  • Refresh endpoint: https://console.anthropic.com/v1/oauth/token
  • Refresh cadence: 60 seconds before expires_at, background task POSTs grant_type=refresh_token
  • Concurrency: all refreshes serialize behind a mutex
  • Shared OAuth client id: 9d1c250a-e61b-44d9-88ed-5944d1962f5e
  • Stale-token handling: a 401 mid-flight marks the token stale so the next refresh fires immediately instead of waiting for the expiry window

CLI credentials import

If you're already running Claude Code CLI on the same host, the client auto-detects and imports ~/.claude/.credentials.json. Zero config — if it exists and is valid, it's used.

Tool calling

Native Anthropic shape:

  • Tool definitions: {name, description, input_schema}
  • Tool invocation: tool_use blocks with id, name, input
  • Tool result: tool_result blocks correlated via tool_use_id

Streaming uses native SSE; a dedicated parser in crates/llm/src/stream.rs handles message_start, content_block_*, and message_delta events.

Error classification

ResponseMappingBehavior
429LlmError::RateLimit { retry_after_ms } (fallback 60s)Retried
401 / 403LlmError::CredentialInvalid with context (API vs OAuth)Marks OAuth token stale; fails fast so the operator sees it
5xxLlmError::ServerErrorRetried
Other 4xxLlmError::OtherFail fast

Supported features

  • Chat completions ✅
  • Tool calling ✅
  • Streaming (SSE) ✅
  • Multimodal (images) ✅
  • Prompt caching ✅ (via Anthropic beta headers)
  • Extended thinking ✅ (model-dependent)

Common mistakes

  • Setup-token string under 80 chars. The setup-token validator refuses it at parse time. Make sure you pasted the full string.
  • api_key + oauth_bundle both set. The auth mode wins. The static key is kept only as a fallback the auto-resolver may pick up if the bundle is missing.
  • Claude Code CLI credentials being used unintentionally. If auto mode is on and you installed CLI on the host, that path wins before api_key. Set auth.mode: static to pin the static key.

OpenAI-compatible

Client for OpenAI itself and for any upstream that speaks the same wire: Ollama, Groq, OpenRouter, LM Studio, vLLM, Azure OpenAI, or your own proxy.

Source: crates/llm/src/openai_compat.rs.

Configuration

# config/llm.yaml
providers:
  openai:
    api_key: ${OPENAI_API_KEY:-}
    base_url: https://api.openai.com/v1
    rate_limit:
      requests_per_second: 2.0

Per-agent:

model:
  provider: openai
  model: gpt-4o

Known-working upstreams

Point base_url at any of these and it works out of the box:

Upstreambase_url
OpenAIhttps://api.openai.com/v1
Ollamahttp://localhost:11434/v1
Groqhttps://api.groq.com/openai/v1
OpenRouterhttps://openrouter.ai/api/v1
LM Studiohttp://localhost:1234/v1
vLLMhttp://<host>:<port>/v1
Azure OpenAIAzure resource URL (watch for differences)
MiniMax (compat mode)https://api.minimax.io

Authentication

Single mode: static API key sent as Authorization: Bearer <key>. Some upstreams ignore the key entirely (Ollama, local vLLM) — supply any non-empty string to satisfy the config validator.

Features & gaps

FeatureStatus
Chat completions
Tool calling✅ (OpenAI function-calling shape)
Streaming
tool_choice: auto | required | none | {type:function}
JSON mode / structured outputsupstream-dependent
Multimodalupstream-dependent
Embeddingssupported for OpenAI proper; other upstreams may vary

Feature gating when the upstream lacks support: we do not pre-probe features — a call that requires a feature the upstream doesn't speak will fail with the upstream's own error (typically a 400). The error bubbles up as LlmError::Other and does not retry, so you notice quickly.

Error classification

ResponseMappingBehavior
429LlmError::RateLimit (fallback 30s)Retried
5xxLlmError::ServerErrorRetried
Other 4xxLlmError::OtherFail fast

Common mistakes

  • Trailing slash in base_url. Some upstreams are lenient, some are not. Stick to the form shown in the table.
  • Using Azure OpenAI without the deployment path. Azure requires an extra segment (/openai/deployments/<name>/chat/completions) that the vanilla OpenAI path doesn't. Currently not supported out of the box; use a proxy or a custom provider if you need Azure.
  • Relying on JSON mode everywhere. Many local servers don't enforce schemas. Validate the response yourself when using Ollama / LM Studio for critical tool args.

DeepSeek

Connector for DeepSeek's hosted models. The API is OpenAI-compatible end to end (same /v1/chat/completions shape, same SSE streaming, same Bearer auth) so the connector is a thin factory that wraps OpenAiClient with DeepSeek's default endpoint.

Source: crates/llm/src/deepseek.rs.

Configuration

# config/llm.yaml
providers:
  deepseek:
    api_key: ${DEEPSEEK_API_KEY}
    # base_url defaults to https://api.deepseek.com/v1 when blank.
    # Override only for self-hosted gateways or testing fixtures.
    base_url: ""
    rate_limit:
      requests_per_second: 2.0
      quota_alert_threshold: 100000

Pin the agent to it:

agents:
  - id: ana
    model:
      provider: deepseek
      model: deepseek-chat

Models

Model idUse case
deepseek-chatGeneral-purpose. Supports tool calling.
deepseek-reasonerLong-form reasoning. No tool calling in current API revision.

deepseek-reasoner agents must therefore leave allowed_tools empty (or list only tools the agent never plans to invoke). Tool calls fired against the reasoner endpoint return an error from upstream.

Streaming

Identical to OpenAI's SSE format, so OpenAiClient::chat_stream parses it without per-provider code. nexo_llm_stream_ttft_seconds and nexo_llm_stream_chunks_total Prometheus series labelled with provider="deepseek" show up automatically.

Tool calling

deepseek-chat follows OpenAI's tool-calling spec verbatim. JSON arguments deserialise the same way; parallel_tool_calls is honoured.

Rate limits

DeepSeek returns standard 429 with a retry-after header. The existing retry plumbing (crates/llm/src/retry.rs) consumes that header so 429s back off cleanly without touching the connector.

Quota / cost

DeepSeek's pricing is per-1M-tokens; the TokenUsage returned by each ChatResponse is forwarded to the standard agent_llm_tokens_total counter (labels: provider="deepseek", model, usage_kind).

Known limitations

  • No native embeddings client — DeepSeek does not currently publish an embeddings endpoint. Use a different provider for embedding_model if your agent needs vector search.
  • Reasoner tool-call gap — see Models. Validate at boot by leaving allowed_tools: [] on agents pinned to deepseek-reasoner.
  • Cache awareness — DeepSeek's KV-cache hit information is surfaced through the same cache_usage field as the OpenAI client reports it.

See also

Rate limiting & retry

Every LLM provider client sits behind a token bucket and a bounded retry policy with decorrelated jittered exponential backoff. This page is the definitive reference for those two mechanisms.

Source: crates/llm/src/retry.rs, crates/llm/src/rate_limiter.rs, crates/llm/src/quota_tracker.rs.

Rate limiter

Token bucket, acquired before every outbound request.

  • interval = 1 / requests_per_second
  • One token per request
  • Bucket fully refills after interval per slot
  • Per-provider, per-agent — each client has its own bucket, so one noisy agent can't starve another even when they share a provider
rate_limit:
  requests_per_second: 2.0
  quota_alert_threshold: 100000   # optional

At 2.0 rps, the bucket tops up a slot every 500 ms. A burst of 3 requests will wait briefly on the third.

Quota tracker

Optional. When a provider returns remaining-quota info (header, response body), quota_tracker records it via record_usage() on the token response. If the remaining crosses quota_alert_threshold, a structured warn log is emitted:

WARN quota threshold crossed  provider=minimax remaining=99500 threshold=100000

Pair with a Prometheus log-scraping rule for an alert.

Retry policy

Retries live above the circuit breaker. They handle transient failures that don't warrant flipping the breaker.

Error classMax attemptsBackoff curve
429 (rate limit)5max(retry-after, jittered_backoff)
5xx (server)3jittered_backoff
401 (auth)1 refresh + 1 retry(internal to the client)
Other 4xx0 (fail fast)

Decorrelated jittered backoff

Not simple exponential — the next backoff is a uniform random draw in a growing range:

next = uniform(base, max(base, last × multiplier))

Defaults from llm.yaml retry block:

FieldDefault
initial_backoff_ms1000
max_backoff_ms60000
backoff_multiplier2.0

Why decorrelated jitter: multiple clients hitting the same 429 don't re-fire in lockstep. Desynchronization is built-in.

flowchart LR
    REQ[request] --> API{API response}
    API -->|200| OK[return ChatResponse]
    API -->|429| RL[RateLimit]
    API -->|5xx| SE[ServerError]
    API -->|401| AU[CredentialInvalid]
    API -->|4xx| F[Other fail fast]

    RL --> D1{attempts<br/>< 5?}
    SE --> D2{attempts<br/>< 3?}
    AU --> REF[auth refresh<br/>+ single retry]
    D1 -->|yes| BO1[wait max(retry_after,<br/>jittered_backoff)]
    D1 -->|no| F
    D2 -->|yes| BO2[wait jittered_backoff]
    D2 -->|no| F
    BO1 --> REQ
    BO2 --> REQ
    REF --> REQ

Error classification per provider

The providers classify HTTP responses into a shared LlmError so the retry layer can be common code:

HTTPLlmError variantRetried?
200Ok(ChatResponse)
429RateLimit { retry_after_ms }✅ up to 5
5xxServerError { status, body }✅ up to 3
401 / 403CredentialInvalid❌ (client handles refresh internally)
Other 4xxOther

Tuning

  • Bursty workloads: bump requests_per_second cautiously; the upstream's own rate limits won't move, so you'll just pay more 429s to find the ceiling.
  • Flaky networks: raise max_attempts for 5xx; keep max_backoff_ms bounded so slow agents don't spiral.
  • Subscription plans: lower requests_per_second to keep daily usage under caps; pair with quota_alert_threshold.

See also

WhatsApp

End-to-end WhatsApp channel: Signal Protocol pairing, inbound message bridge, outbound send/reply/reaction/media tools, optional voice transcription.

Source: crates/plugins/whatsapp/ (thin wrapper over the whatsapp-rs crate).

Topics

DirectionSubjectNotes
Inboundplugin.inbound.whatsappLegacy single-account
Inboundplugin.inbound.whatsapp.<instance>Multi-account routing
Outboundplugin.outbound.whatsappLegacy single-account
Outboundplugin.outbound.whatsapp.<instance>Multi-account routing

During pairing the plugin also publishes qr lifecycle events on the inbound topic so the wizard can render the QR.

Config

# config/plugins/whatsapp.yaml
whatsapp:
  enabled: true
  session_dir: ""            # empty → per-agent default
  media_dir: ./data/media/whatsapp
  instance: default
  acl:
    allow_list: []           # empty + empty env = open ACL
    from_env: WA_AGENT_ALLOW
  behavior:
    ignore_chat_meta: true
    ignore_from_me: true
    ignore_groups: false
  bridge:
    response_timeout_ms: 30000
    on_timeout: noop         # noop | apology_text
  transcriber:
    enabled: false
    skill: whisper
  public_tunnel:
    enabled: false
    only_until_paired: true

Key fields:

FieldDefaultPurpose
session_dirper-agentSignal Protocol state. Each account needs its own dir.
instanceNoneLabel for multi-account routing. Unlabelled keeps the legacy bare topic.
allow_agents[]Agents permitted to publish from this instance. Empty = accept any agent holding a resolver handle. Defense-in-depth for the per-agent credentials binding.
acl.allow_list[]Bare JIDs allowed to reach the agent. Empty + empty env = open.
behavior.ignore_chat_metatrueSkip muted / archived / locked chats on the phone.
behavior.ignore_from_metrueDrop the agent's own replies to prevent loops.
behavior.ignore_groupsfalseSkip group chats entirely when true.
bridge.response_timeout_ms30000Per-message handler deadline.
bridge.on_timeoutnoopnoop (no reply) or apology_text.
transcriber.enabledfalseVoice → text via skill.
public_tunnel.enabledfalseExpose /whatsapp/pair through a Cloudflare tunnel.
public_tunnel.only_until_pairedtrueTear down the tunnel after Connected.

Pairing

Pairing is setup-time only. The runtime refuses to start without paired credentials.

sequenceDiagram
    participant U as Operator
    participant W as agent setup
    participant WA as whatsapp-rs Client
    participant P as Phone

    U->>W: setup pair whatsapp --agent ana
    W->>WA: new_in_dir(session_dir)
    WA-->>W: QR image
    W-->>U: render QR (Unicode blocks)
    U->>P: Settings → Linked Devices → scan
    P->>WA: pair
    WA-->>W: Connected
    W->>W: persist creds to session_dir/.whatsapp-rs/creds.json
  • Credentials at <session_dir>/.whatsapp-rs/creds.json
  • Daemon-collision check at <session_dir>/.whatsapp-rs/daemon.json blocks a second process on the same account
  • Multi-account via Client::new_in_dir() — no XDG_DATA_HOME mutation
  • Credential expiry mid-run (401 loop) → operator must re-pair; no runtime QR fallback

Tools exposed to the LLM

ToolSignatureNotes
whatsapp_send_message(to, text)Send to arbitrary JID.
whatsapp_send_reply(chat, reply_to_msg_id, text)Quote a specific inbound message.
whatsapp_send_reaction(chat, msg_id, emoji)Emoji tap-back.
whatsapp_send_media(to, file_path, caption?, mime?)File attachment.

All tools honor the per-binding outbound_allowlist.whatsapp — empty list = unrestricted, populated = hard allowlist.

Event shapes

Inbound payloads (on plugin.inbound.whatsapp[.<instance>]):

// message
{
  "kind": "message",
  "from": "573000000000@s.whatsapp.net",
  "chat": "573000000000@s.whatsapp.net",
  "text": "hi",
  "reply_to": null,
  "is_group": false,
  "timestamp": 1714000000,
  "msg_id": "3EB0..."
}

// media_received
{
  "kind": "media_received",
  "from": "...",
  "chat": "...",
  "msg_id": "...",
  "local_path": "./data/media/whatsapp/abc.jpg",
  "mime": "image/jpeg",
  "caption": null
}

// qr  (pairing only)
{"kind": "qr", "ascii": "...", "png_base64": "...", "expires_at": ...}

// lifecycle
{"kind": "connected" | "disconnected" | "reconnecting" | "credentials_expired"}

// observability
{"kind": "bridge_timeout", "msg_id": "...", "waited_ms": 30000}

Gotchas

  • Shared session_dir across agents = cross-delivery. Each agent should point at its own <workspace>/whatsapp/default. The wizard does this automatically; manual configs need care.
  • ignore_chat_meta: true silently skips muted/archived chats. If a user archives a chat on the phone, the agent never sees it again until they unarchive.
  • Credential expiry is irreversible without re-pair. whatsapp-rs will loop on 401. Watch for credentials_expired lifecycle events and alert.

See Setup wizard — WhatsApp pairing.

Telegram

Bot API channel with long-polling intake, multi-bot routing, full send/reply/reaction/edit/location/media tool surface, and optional voice auto-transcription.

Source: crates/plugins/telegram/.

Topics

DirectionSubjectNotes
Inboundplugin.inbound.telegramLegacy single-bot
Inboundplugin.inbound.telegram.<instance>Per-bot routing
Outboundplugin.outbound.telegramLegacy single-bot
Outboundplugin.outbound.telegram.<instance>Per-bot routing

Each instance subscribes only to its own outbound topic, so two bots in the same process don't cross-wire.

Config

# config/plugins/telegram.yaml
telegram:
  token: ${file:./secrets/telegram_token.txt}
  instance: sales_bot
  polling:
    enabled: true
    interval_ms: 25000
    offset_path: ./data/media/telegram/sales_bot.offset
  allowlist:
    chat_ids: []        # empty = accept all
  auto_transcribe:
    enabled: false
    command: ./extensions/openai-whisper/target/release/openai-whisper
    language: es
  bridge_timeout_ms: 120000

Key fields:

FieldDefaultPurpose
token— (required)Bot API token from @BotFather.
instanceNoneLabel for multi-bot routing. Unlabelled keeps the legacy bare topic.
allow_agents[]Agents permitted to publish from this bot. Empty = accept any agent holding a resolver handle. Defense-in-depth for the per-agent credentials binding.
polling.enabledtrueLong-polling intake. Webhook not yet supported.
polling.interval_ms25000Long-poll timeout hint. Telegram clamps to [1 s, 50 s].
polling.offset_path./data/media/telegram/offsetFile to persist update offset across restarts.
allowlist.chat_ids[]Numeric chat ids allowed. Empty = accept all.
auto_transcribe.enabledfalseVoice → text.
auto_transcribe.command./extensions/openai-whisper/.../openai-whisperPath to whisper binary.
bridge_timeout_ms120000Handler deadline before a bridge_timeout event fires.

Auth

Single mode: static bot token. No OAuth. Store it under ./secrets/ and reference via ${file:...}.

flowchart LR
    SETUP[agent setup] --> ASK[ask for bot token]
    ASK --> F[./secrets/telegram_token.txt]
    F -.->|${file:...}| CFG[config/plugins/telegram.yaml]
    CFG --> RUN[runtime: HTTP Bot API with long-poll]

Tools exposed to the LLM

ToolNotes
telegram_send_messageSend text to chat id (negative for groups/channels).
telegram_send_replyQuote a specific prior message.
telegram_send_reactionEmoji on a message.
telegram_edit_messageModify a prior message's text.
telegram_send_locationGPS coordinates.
telegram_send_mediaFile upload with caption and mime hint.

All tools enforce outbound_allowlist.telegram per binding.

Event shapes

// message
{
  "kind": "message",
  "from": "12345",
  "chat": "12345",
  "chat_type": "private",
  "text": "hi",
  "reply_to": null,
  "is_group": false,
  "timestamp": 1714000000,
  "msg_id": "42",
  "username": "jdoe",
  "media": [],
  "latitude": null,
  "longitude": null,
  "forward": null
}

// media item (inside `media`)
{
  "kind": "voice" | "photo" | "video" | "document" | "audio",
  "local_path": "./data/media/telegram/....ogg",
  "file_id": "AgACAgEA...",
  "mime_type": "audio/ogg",
  "duration_s": 4,
  "width": null,
  "height": null,
  "file_name": null
}

// callback_query  (inline-keyboard button press, auto-ACKed)
{"kind": "callback_query", "from": "...", "chat": "...", "data": "buy"}

// chat_membership
{"kind": "chat_membership", "chat": "...", "status": "added" | "kicked" | ...}

// lifecycle
{"kind": "connected" | "disconnected"}
{"kind": "bridge_timeout", "msg_id": "...", "waited_ms": ...}

Forwarded messages include a forward object:

"forward": {
  "source": "user" | "channel" | "chat",
  "from_user_id": 12345,
  "from_chat_id": null,
  "date": 1714000000
}

Gotchas

  • Webhook mode is not supported yet. Long-polling only.
  • polling.interval_ms is clamped by Telegram. Values outside [1000, 50000] get capped by the server side; default 25000 is a good middle ground.
  • Negative chat ids are groups/channels. Telegram uses negative ids for group chats; positive for private. Don't strip the sign.
  • Auto-transcribe requires the whisper skill extension. The command path must point at a working binary, otherwise inbound voice messages arrive without text.

Email

Generic SMTP/IMAP plugin. Scaffolded but not yet wired — config shape is defined, but no tool surface or inbound bridge ships today. For a working Gmail → agent pipeline today, use gmail-poller.

Source: crates/plugins/email/ (empty lib.rs), config in crates/config/src/types/plugins.rs.

Config

# config/plugins/email.yaml
email:
  smtp:
    host: smtp.example.com
    port: 587
    username: agent@example.com
    password: ${file:./secrets/email_password.txt}
  imap:
    host: imap.example.com
    port: 993
FieldDefaultPurpose
smtp.host— (required)SMTP server.
smtp.port587SMTP port.
smtp.username— (required)SMTP auth user.
smtp.password— (required)SMTP auth password.
imap.hostIMAP server (inbound).
imap.port993IMAP port.

Status

  • No NATS topics active
  • No tools exposed to the LLM
  • No inbound bridge
  • Config schema reserved so future phases can land incrementally

What to use instead

For inbound triage:

  • gmail-poller — cron-style Gmail polling with regex capture groups and template-based dispatch to any plugin.outbound.* topic. Production-ready.

For outbound notifications:

  • Delegate to a send agent wired to a transactional-email provider via a custom extension, until this plugin lands.

Track progress under the future Phase 17 in ../PHASES.md.

Browser (Chrome DevTools Protocol)

Drives a real Chrome/Chromium instance via CDP. Agents can navigate, click, fill, screenshot, and run JS — with stable element refs that work across DOM mutations within a single turn.

Source: crates/plugins/browser/.

Topics

DirectionSubjectNotes
Outboundplugin.outbound.browserTool invocations
Eventsplugin.events.browser.<method_suffix>Mirrored CDP notifications

Browser is an outbound-only plugin — there is no unsolicited inbound event from a web page to the agent.

Config

# config/plugins/browser.yaml
browser:
  headless: false
  executable: ""                     # empty → search PATH
  cdp_url: ""                        # empty → launch new Chrome
  user_data_dir: ./data/browser/profile
  window_width: 1280
  window_height: 800
  connect_timeout_ms: 10000
  command_timeout_ms: 15000
  args: []                           # extra CLI flags for Chrome
FieldDefaultPurpose
headlessfalseLaunch Chrome without a UI.
executable""Chrome binary path. Empty = search PATH.
cdp_url""Connect to an existing Chrome DevTools server (e.g. http://127.0.0.1:9222). Empty = launch a new instance.
user_data_dir./data/browser/profileChrome profile cache. Keeps cookies, logins.
window_width / window_height1280 / 800Viewport.
connect_timeout_ms10000How long to wait for Chrome startup / remote connect.
command_timeout_ms15000Per-CDP-command execution timeout.
args[]Extra CLI flags forwarded verbatim to the spawned Chrome. Ignored when cdp_url is set. Later args win — use this to override built-in flags when a restricted environment needs it (e.g. --no-sandbox on Termux).

Auth

None. CDP is an unauthenticated protocol — use cdp_url only with a loopback / firewalled Chrome.

Tools exposed to the LLM

ToolPurpose
browser_navigateLoad URL and wait for load event.
browser_clickClick by element ref (@e12) or CSS selector.
browser_fillType into input / textarea / contenteditable. Replaces content.
browser_screenshotBase64 PNG of the viewport.
browser_evaluateRun JS, return value as JSON.
browser_snapshotText DOM tree with stable element refs.
browser_scroll_toScroll a target element into view.
browser_current_urlCurrent page URL.
browser_wait_forPoll for an element to appear.
browser_go_back / browser_go_forwardNavigation history.
browser_press_keyKeyboard events.

All tools are prefixed browser_* for glob filtering in allowed_tools.

Element refs

browser_snapshot emits a text tree where every actionable element has a ref like @e12. Those refs are stable within the snapshot turn but invalidated by any subsequent DOM mutation:

sequenceDiagram
    participant A as Agent
    participant B as Browser plugin
    participant C as Chrome

    A->>B: browser_snapshot
    B->>C: DOM.describeNode(..)
    C-->>B: tree
    B-->>A: "Login @e12\nEmail @e13\n..."
    A->>B: browser_fill(@e13, "user@…")
    B->>C: DOM.focus + Input.dispatch
    A->>B: browser_click(@e12)
    Note over A,B: refs still valid<br/>(same snapshot turn)
    A->>B: browser_snapshot
    Note over B: refs from prior snapshot<br/>now INVALID

Rule: take a snapshot, act on refs from that snapshot, take a new snapshot before acting again.

Gotchas

  • browser_fill replaces content. No append mode. To add text to existing content, read the current value first (via evaluate) then send the merged string.
  • Connecting to an existing Chrome (cdp_url) skips the profile setup. Any login state is whatever that Chrome already has.
  • Element refs expire on DOM mutation. The plugin does not auto-refresh — refs from a stale snapshot will error or misfire.
  • Headless sites break. Some sites detect headless Chrome and behave differently. Use headless: false for those.

Google (OAuth, Gmail, Calendar, Drive) + gmail-poller

Two related subsystems:

  • google plugin — per-agent OAuth client plus a generic google_call tool that lets an agent hit any Google API the granted scopes allow
  • gmail-poller plugin — cron-style scheduler that polls Gmail, matches subjects/bodies with regex, and dispatches results to any outbound topic (WhatsApp, Telegram, another agent)

Sources: crates/plugins/google/ and crates/plugins/gmail-poller/.

google — per-agent OAuth

Config

Two shapes supported:

Preferred (Phase 17) — declare accounts in a dedicated store and bind them from the agent via credentials.google:

# config/plugins/google-auth.yaml
google_auth:
  accounts:
    - id: ana@gmail.com
      agent_id: ana                     # 1:1; gauntlet enforces the binding
      client_id_path:     ./secrets/google/ana_client_id.txt
      client_secret_path: ./secrets/google/ana_client_secret.txt
      token_path:         ./secrets/google/ana_token.json
      scopes:
        - https://www.googleapis.com/auth/gmail.modify

Gmail-poller picks these up automatically; agents see google_* tools when the store has an entry matching their agent_id.

Legacy inline (still works, logs a migration warn):

# agents.yaml
google_auth:
  client_id: ${GOOGLE_CLIENT_ID}
  client_secret: ${file:./secrets/google_secret.txt}
  scopes:
    - gmail.readonly
    - gmail.send
    - calendar
    - drive.file
  token_file: ./data/workspace/ana/google_token.json
  redirect_port: 17653
FieldDefaultPurpose
client_id / client_secretOAuth app creds from Google Cloud Console.
scopesOAuth scopes. Short-form (gmail.readonly) auto-expanded to full URL.
token_filegoogle_tokens.jsonPersistent refresh-token JSON. Relative paths resolve from workspace.
redirect_port8765Loopback callback port. Must match the "Authorized redirect URI" in the OAuth client.

Pairing flow

sequenceDiagram
    participant A as Agent LLM
    participant T as google_auth_start
    participant B as Browser
    participant L as Loopback listener<br/>127.0.0.1:<port>/callback
    participant G as Google OAuth

    A->>T: invoke
    T->>L: start listener
    T-->>A: return consent URL
    A->>B: ask user to open URL
    B->>G: consent flow
    G->>L: redirect w/ code
    L->>G: exchange code → tokens
    L->>L: persist refresh_token<br/>(mode 0o600)
    L-->>A: success

The wizard wraps this as a one-shot step, but runtime tools expose the same primitives for re-auth.

Device-code flow (headless setup)

agent setup google offers a second consent path that does not require a local browser — useful for servers, CI, and SSH-only environments. The wizard:

  1. POSTs to oauth2.googleapis.com/device/code with the account's client_id and scopes.
  2. Prints a 6-character user_code + a verification_url to the terminal.
  3. Polls oauth2.googleapis.com/token (default every 5 s) until the operator approves on any device.
  4. Persists the resulting refresh_token at token_path with mode 0o600.
╭─ Device-code OAuth ───────────────────────────────────────
│  Open in any browser:          https://www.google.com/device
│  Code to enter:                HBQM-WLNF
│  (valid for 1800s)
╰───────────────────────────────────────────────────────────

Waiting for approval...
✔ Tokens persisted at ./secrets/ana_google_token.json.

The Google Cloud Console OAuth client must be type "TVs and Limited Input devices" for this flow — Desktop/Web clients reject device-code with client_type_disabled.

Lazy-refresh of client_id / client_secret

GoogleAuthClient.config is ArcSwap<GoogleAuthConfig>. Every network call (exchange_code, request_device_code, poll_device_token, refresh_token) first invokes refresh_secrets_if_changed, which compares mtime on client_id_path and client_secret_path and re-reads them when they advance. Rotating the secret files (e.g. quarterly key rotation in Google Cloud Console) takes effect on the next tool call without a daemon restart.

Steady-state cost: one fs::metadata call per outbound request. Audit trail (target credentials.audit):

INFO event="google_secrets_refreshed" \
  google_*: re-read client_id/client_secret after on-disk rotation

Tools exposed

ToolPurpose
google_auth_startStart OAuth, return the consent URL.
google_auth_statusReport {authenticated, expires_in_secs, has_refresh, scopes}. Safe to poll.
google_callGeneric {method, url, body?} against any *.googleapis.com endpoint. Auto-refreshes access token.
google_auth_revokeRevoke the refresh token; forces full re-auth.

Supported APIs

Anything under *.googleapis.com that the granted scopes permit. Common call shapes:

  • Gmail v1https://gmail.googleapis.com/gmail/v1/users/me/messages?q=is:unread
  • Calendar v3https://www.googleapis.com/calendar/v3/calendars/primary/events
  • Drive v3https://www.googleapis.com/drive/v3/files?q=mimeType='application/pdf'
  • Sheets v4https://sheets.googleapis.com/v4/spreadsheets/<id>/values/A1:D10

Gotchas

  • 401 means the refresh token was revoked. Re-auth via google_auth_start.
  • 403 means a scope wasn't granted. Add the scope, revoke, re-auth.
  • Token file leaks → revoke immediately. The file holds a refresh token with the granted scopes.

gmail-poller — cron-style Gmail bridge

Poll Gmail, extract fields via regex, render a template, dispatch to any outbound topic. Multi-account, allowlisted by sender substring, rate-limited per dispatch.

Config

# config/plugins/gmail-poller.yaml
gmail_poller:
  enabled: true
  interval_secs: 60
  accounts:
    - id: default
      agent_id: ana           # Phase 17 — binds the account to an agent; defaults to `id` when omitted
      token_path: ./data/workspace/ana/google_token.json
      client_id_path: ./secrets/google_client_id.txt
      client_secret_path: ./secrets/google_client_secret.txt
  jobs:
    - name: lead_forward
      account: default
      query: "is:unread subject:(lead OR interesado)"
      newer_than: 1d
      interval_secs: 120
      forward_to_subject: plugin.outbound.whatsapp.default
      forward_to: "573000000000@s.whatsapp.net"
      extract:
        name: "Nombre:\\s*(.+)"
        phone: "Tel:\\s*(\\+?\\d+)"
      require_fields: [name, phone]
      message_template: |
        New lead 🚨
        {name} — {phone}
        Subject: {subject}
        {snippet}
      mark_read_on_dispatch: true
      max_per_tick: 20
      dispatch_delay_ms: 1000
      sender_allowlist: ["@mycompany.com", "partners@"]

Per-job fields

FieldDefaultPurpose
name— (required)Job id.
account"default"Which OAuth account to use.
query— (required)Gmail search (is:unread, etc.).
newer_thanGmail newer_than: suffix (1d, 2h) — avoids back-filling.
interval_secsroot intervalOverride per-job poll cadence.
forward_to_subjectBroker topic to publish dispatched message.
forward_toRecipient passed through (JID, chat id, phone).
extract{}Named regex groups applied to the email body. First group wins.
require_fields[]Skip dispatch if any listed extracted field is empty.
message_template— (required)Template with {field}, {subject}, {snippet} placeholders.
mark_read_on_dispatchtrueMark the thread as read after successful dispatch.
dispatch_delay_ms1000Sleep between multi-match dispatches.
max_per_tick20Hard cap per poll cycle.
sender_allowlist[]Substring/domain filter on From: header. Empty = accept all.

Event shape

{
  "to": "<forward_to>",
  "kind": "text",
  "text": "<rendered message_template>",
  "subject": "<email subject>",
  "<extract key>": "<captured group>"
}

Published to <forward_to_subject>.

Error backoff

Sustained errors are backed off: [0, 0, 0, 30, 60, 120, 300] seconds (caps at 300). Transient failures don't stop the poll loop.

Gotchas

  • Gmail API only — no IMAP. This plugin is Google-specific. For generic IMAP triage, use a custom extension.
  • sender_allowlist is substring, not regex. Simpler to read, simpler to get wrong. Quote boundary characters explicitly.
  • extract regex must compile. Invalid regex fails the whole job at boot with an error naming the field.

See also

Short-term memory

Per-session conversational buffer held entirely in memory. Tracks the turns of the ongoing conversation so the LLM has context on every completion request.

Source: crates/core/src/session/ (types.rs, manager.rs) — the Session struct owns the short-term buffer.

What lives in a session

Each Session stores:

FieldTypePurpose
historyVec<Interaction>FIFO of turns (role + content + timestamp)
contextserde_json::ValueFree-form JSON blob for per-session state
last_accesstimestampUsed by TTL sweeper and cap eviction

An Interaction is {role: User | Assistant | Tool, content, timestamp}.

Sliding window — max_history_turns

short_term:
  max_history_turns: 50

Hard cap, sliding FIFO. When history.len() > max_history_turns, the oldest entry is removed on the next push:

flowchart LR
    MSG[new turn] --> PUSH[history.push]
    PUSH --> CHECK{len > max?}
    CHECK -->|no| DONE[done]
    CHECK -->|yes| DROP[history.remove(0)]
    DROP --> DONE

Old content is lost, not promoted. If you need long-term persistence, the agent must explicitly call the memory tool with action remember. See Long-term memory.

Session cap and eviction

short_term:
  max_sessions: 10000

Soft cap across the whole process. On overflow, the oldest-idle session (lowest last_access) is evicted to make room. Eviction fires the on_expire callbacks — used by workspace-git to checkpoint before tearing down the session.

max_sessions: 0 disables the cap (unbounded). Leave it at the default unless you have a specific reason — the cap is DoS protection against a spammer rotating chat_ids.

TTL sweeper

short_term:
  session_ttl: 24h

Sessions expire after session_ttl of inactivity. The sweeper runs every ttl / 4 (so every 6 h with the default 24 h TTL) and drops expired sessions.

stateDiagram-v2
    [*] --> Active: first message
    Active --> Active: message / event<br/>(last_access updated)
    Active --> Expired: idle > session_ttl
    Active --> Evicted: cap exceeded
    Expired --> [*]: sweeper
    Evicted --> [*]: on_expire callbacks fire

Expiry also fires on_expire — good place to hook session-close commits to a workspace-git repo.

Relationship to other memory layers

flowchart LR
    STM[short-term<br/>in-memory Vec] -.->|tool call:<br/>memory.remember| LTM[(long-term<br/>SQLite)]
    LTM -.->|vector enabled| VEC[(sqlite-vec)]
    STM -.->|transcripts_dir| TR[(JSONL transcripts)]
    STM -.->|session close| WSG[(workspace-git)]

STM does not auto-promote to LTM. Promotion happens via:

  • Explicit memory.remember tool call from the agent
  • Dream sweeps (Phase 10.6) that scan recall-event signals and promote hot memories
  • Session-close commits to workspace-git if enabled

Gotchas

  • Lost turns are gone. Once a turn falls off the sliding window it is not recoverable. If it mattered, save it to LTM before the next turn.
  • max_sessions: 0 has no DoS guard. Only do this in single-tenant setups where you control the sender id space.
  • last_access updates on any access. That includes heartbeat ticks if they read the session — effectively keeping a session alive past its TTL as long as the agent is alive.

Long-term memory (SQLite)

Durable memory shared by every agent in the process. One SQLite file, multi-tenant via an agent_id column on every row. Survives restarts.

Source: crates/memory/src/long_term.rs.

Storage location

long_term:
  backend: sqlite
  sqlite:
    path: ./data/memory.db

One file for all agents. Per-agent isolation is enforced by WHERE agent_id = ? on every query — not by separate DB files. An idx_memories_agent(agent_id, created_at DESC) index keeps those queries fast.

If you want per-agent file separation, override sqlite.path per agent via an inbound_bindings[] override or a per-agent config directory.

Schema

The runtime creates these tables at boot if they don't exist.

memories — atomic facts

CREATE TABLE memories (
  id            TEXT PRIMARY KEY,  -- UUID
  agent_id      TEXT NOT NULL,
  content       TEXT NOT NULL,
  tags          TEXT DEFAULT '[]', -- JSON array
  concept_tags  TEXT DEFAULT '[]', -- auto-derived (phase 10.7)
  created_at    INTEGER NOT NULL   -- ms since epoch
);
CREATE INDEX idx_memories_agent ON memories(agent_id, created_at DESC);

memories_fts — full-text search (FTS5)

CREATE VIRTUAL TABLE memories_fts USING fts5(
  content,
  id        UNINDEXED,
  agent_id  UNINDEXED
);

Powers the keyword recall mode with BM25 ranking.

interactions — conversation archive

CREATE TABLE interactions (
  id          TEXT PRIMARY KEY,
  session_id  TEXT NOT NULL,
  agent_id    TEXT NOT NULL,
  role        TEXT,
  content     TEXT,
  created_at  INTEGER
);
CREATE INDEX idx_interactions_session ON interactions(session_id, created_at DESC);

reminders — phase 7 heartbeat reminders

CREATE TABLE reminders (
  id            TEXT PRIMARY KEY,
  agent_id      TEXT NOT NULL,
  session_id    TEXT NOT NULL,
  plugin        TEXT,
  recipient     TEXT,
  message       TEXT,
  due_at        INTEGER,
  claimed_at    INTEGER,
  delivered_at  INTEGER,
  created_at    INTEGER
);
CREATE INDEX idx_reminders_due
  ON reminders(agent_id, delivered_at, due_at ASC);

recall_events — signal tracking (phase 10.5)

CREATE TABLE recall_events (
  id         INTEGER PRIMARY KEY AUTOINCREMENT,
  agent_id   TEXT,
  memory_id  TEXT,
  query      TEXT,
  score      REAL,
  ts_ms      INTEGER
);

Every recall() hit records a row. Dream sweeps read this to decide what to promote.

memory_promotions — dreaming ledger (phase 10.6)

CREATE TABLE memory_promotions (
  memory_id    TEXT PRIMARY KEY,
  agent_id     TEXT,
  promoted_at  INTEGER,
  score        REAL,
  phase        TEXT
);

Prevents double-promotion across sweeps.

vec_memories — vector index (phase 5.4, optional)

Created on demand when vector.enabled: true. See Vector search.

What gets written when

ActionWrites to
Agent calls memory.remember(content, tags)memories, memories_fts, vec_memories (if enabled)
Every turninteractions (used for transcripts, not promoted into memories)
Agent calls forge_reminder(...)reminders
Every recall() hitrecall_events (one row per result returned)
Dream sweep promotes hot memorymemory_promotions

Memory tool

Single unified tool with three actions, visible to the LLM as memory:

ActionRequiredOptionalReturns
remembercontenttags[], context{ok, id}
recallquerylimit (default 5), mode (keyword | vector | hybrid){ok, results: [{id, content, tags}]}
forgetid{ok}

Results do not include similarity scores — only content and tags. Scores are used internally for dreaming signal tracking but aren't surfaced to the LLM to avoid encouraging score-gaming prompts.

Other memory-related tools:

  • forge_memory_checkpoint — snapshot the workspace-git repo (phase 10.9)
  • memory_history — git log + optional unified diff (phase 10.9)

Per-agent isolation

flowchart TB
    subgraph PROC[agent process]
        DB[(./data/memory.db<br/>single SQLite file)]
    end
    A1[agent: ana] -->|WHERE agent_id = 'ana'| DB
    A2[agent: kate] -->|WHERE agent_id = 'kate'| DB
    A3[agent: ops] -->|WHERE agent_id = 'ops'| DB

One LongTermMemory instance per process, shared across agents via Arc. The MemoryTool attached to each agent passes ctx.agent_id to every query.

Workspace-git (phase 10.9)

A separate per-agent git repo lives in the agent's workspace directory (not inside the memory DB). When workspace_git.enabled: true, the runtime commits after:

  • Dream sweeps (Phase 10.6)
  • forge_memory_checkpoint tool calls
  • Session close (on_expire)

Good for forensic replay — you can git log to see the memory state at any point. See Soul — MEMORY.md.

Gotchas

  • One DB, multi-tenant. A query missing its agent_id filter would leak across agents. All runtime code goes through the LongTermMemory API which injects it automatically.
  • Vacuum is manual. SQLite does not auto-compact after deletes. Run VACUUM; periodically (or PRAGMA auto_vacuum=incremental from day one).
  • recall_events grows unboundedly. Dream sweeps periodically prune, but a dreaming-disabled agent's table will grow forever. Add a retention job if you run without dreaming.

Vector search

Optional semantic memory via sqlite-vec — a virtual table inside the same SQLite file used for long-term memory. No separate service, no extra process, no migration.

Source: crates/memory/src/vector.rs, crates/memory/src/embedding/.

Turning it on

vector:
  enabled: true
  backend: sqlite-vec
  default_recall_mode: hybrid
  embedding:
    provider: http
    base_url: https://api.openai.com/v1
    model: text-embedding-3-small
    api_key: ${OPENAI_API_KEY}
    dimensions: 1536
    timeout_secs: 30

Dimension must match the model output:

ModelDimensions
text-embedding-3-small1536
text-embedding-3-large3072
nomic-embed-text768
Gemini text-embedding-004768

A mismatch aborts startup with an explicit error. If you already have vectors at a different dimension, you must delete the DB (or the vector table) and rebuild the index.

Storage

CREATE VIRTUAL TABLE vec_memories USING vec0(
  memory_id TEXT PRIMARY KEY,
  embedding FLOAT[<dimensions>]
);

The virtual table lives in the same SQLite file as memories. A join on memory_id brings you back the content and tags.

Embedding provider

#![allow(unused)]
fn main() {
trait EmbeddingProvider {
    fn dimension(&self) -> usize;
    async fn embed(&self, texts: &[String]) -> Result<Vec<Vec<f32>>>;
}
}

Phase 5.4 ships one provider: http — any OpenAI-compatible /embeddings endpoint. That covers OpenAI, Gemini (via its API), Ollama, LM Studio, and self-hosted inference.

Local-only providers (fastembed, candle) are intentional follow-ups — the HTTP provider is enough to unblock everything downstream.

Recall modes

Set the default in memory.yaml and override per tool call with the mode argument.

keyword — FTS5 + concept expansion

flowchart LR
    Q[query] --> CT[derive 3 concept tags]
    Q --> M[FTS5 MATCH<br/>query OR tag1 OR tag2 OR tag3]
    CT --> M
    M --> R[rank by BM25]
    R --> RES[top N]
  • Fast, no embedding cost
  • Misses semantic neighbors that don't share vocabulary
  • The extra concept tags are auto-derived from the query and help narrow down concept matches

vector — nearest-neighbor

flowchart LR
    Q[query] --> EMB[embed]
    EMB --> VEC[vec_memories<br/>MATCH k=N*2]
    VEC --> JOIN[join memories<br/>filter by agent_id]
    JOIN --> RES[top N by distance]
  • Catches paraphrases and cross-vocabulary matches
  • Embedding request on every call — watch costs and latency
  • Falls back to keyword on provider error (via hybrid) — not on pure vector mode, where errors surface

hybrid — Reciprocal Rank Fusion

The default recommendation. Runs both keyword and vector, then fuses ranks with the RRF formula 1 / (K + rank + 1) where K = 60:

flowchart LR
    Q[query] --> K[keyword search]
    Q --> V[vector search]
    K --> RRF[RRF fusion<br/>K=60]
    V --> RRF
    RRF --> RES[top N by fused score]

Vector errors degrade gracefully to keyword-only without raising.

Tool interaction

The memory tool takes an optional mode param:

{
  "action": "recall",
  "query": "what's the client's address?",
  "limit": 5,
  "mode": "hybrid"
}

If omitted, default_recall_mode is used.

Cost and latency profile

ModePer recall
keyword1 SQL query, no LLM call
vector1 embedding HTTP call + 1 SQL query
hybrid1 embedding HTTP call + 2 SQL queries + fusion

For high-throughput agents that recall on every turn, start with keyword and upgrade to hybrid only where you see miss rate matter.

Gotchas

  • Changing embedding model = full reindex. The dimension check catches the obvious case, but even same-dimension model swaps produce semantically different vectors; the old index becomes stale.
  • sqlite3_auto_extension registers once per process. Not a problem in production, but test suites that instantiate multiple SQLite connections across tests may hit edge cases.
  • Vector returns distance, not similarity. Lower is closer. Hybrid fusion normalizes across both, so callers don't see this directly unless they bypass the tool.

Manifest (plugin.toml)

Every extension ships a plugin.toml at its root. It declares identity, transport, capabilities, runtime requirements, and any bundled MCP servers. The runtime parses and validates the manifest before spawning anything.

Source: crates/extensions/src/manifest.rs.

Minimal example

[plugin]
id = "weather"
version = "0.1.0"
name = "Weather"
description = "Fetch weather by city name."
min_agent_version = "0.1.0"
priority = 0

[capabilities]
tools = ["get_weather"]
hooks = []

[transport]
type = "stdio"
command = "./weather"
args = []

[requires]
bins = ["curl"]
env = ["WEATHER_API_KEY"]

[context]
passthrough = false

[meta]
author = "you"
license = "MIT OR Apache-2.0"

Sections

[plugin]

FieldRequiredPurpose
idUnique id. Regex ^[a-z][a-z0-9_-]*$, ≤ 64 chars. Must not be a reserved id (see below).
versionSemver.
nameHuman-readable label.
description≤ 512 UTF-8 chars.
min_agent_versionSemver. Checked against the running agent version at load time.
priorityi32, default 0. Lower fires first in hook chains.

Reserved ids: agent, browser, core, email, heartbeat, memory, telegram, whatsapp. The host may register more via register_reserved_ids().

[capabilities]

[capabilities]
tools = ["get_weather", "get_forecast"]
hooks = ["before_message", "after_tool_call"]
channels = []
providers = []

At least one capability list must be non-empty. Names match ^[a-z][a-z0-9_]*$, ≤ 64 chars, no duplicates.

[transport]

One of three forms:

# stdio — spawn a child process
[transport]
type = "stdio"
command = "./my-extension"
args = ["--verbose"]
# nats — talk over a NATS subject prefix
[transport]
type = "nats"
subject_prefix = "ext.myext"
# http — call over HTTP
[transport]
type = "http"
url = "https://localhost:8080"

Validation: command, subject_prefix, url non-empty; url must be http(s)://.

[requires]

[requires]
bins = ["ffmpeg", "imagemagick"]
env  = ["OPENAI_API_KEY"]

Declarative preconditions used for gating: when the runtime discovers the extension, it calls Requires::missing(). If any bins is not on $PATH or any env is unset, the extension is skipped (warn, not fail) and its tools are not registered.

See Stdio runtime — Gating.

[context]

[context]
passthrough = true

When true, every tool call sent to this extension has _meta = { agent_id, session_id } injected into the JSON args. Lets the extension tell calls apart per-agent without the runtime having to encode the split into every tool signature.

[mcp_servers] (phase 12.7)

Inline MCP server declarations bundled with the extension:

[mcp_servers.gmail]
type = "stdio"
command = "./gmail-mcp"
args = []

[mcp_servers.calendar]
type = "streamable_http"
url = "https://mcp.example.com/calendar"

Each server name must match ^[a-z][a-z0-9_-]*$, ≤ 32 chars. Alternatively, drop a sidecar .mcp.json next to plugin.toml if the manifest has no [mcp_servers] section.

Validation at a glance

flowchart TD
    READ[read plugin.toml] --> PARSE[parse TOML]
    PARSE --> ID{id valid?<br/>regex + length<br/>+ not reserved}
    ID --> VER{version<br/>valid semver?}
    VER --> MIN{min_agent_version<br/>satisfied?}
    MIN --> CAPS{at least one<br/>capability declared?}
    CAPS --> NAMES{capability names<br/>valid + unique?}
    NAMES --> TRANS{transport<br/>non-empty +<br/>http scheme valid?}
    TRANS --> MCP{mcp_server names<br/>valid?}
    MCP --> OK([Manifest accepted])
    ID --> FAIL([Diagnostic: Error])
    VER --> FAIL
    MIN --> FAIL
    CAPS --> FAIL
    NAMES --> FAIL
    TRANS --> FAIL
    MCP --> FAIL

Any failure produces a DiagnosticLevel::Error in the discovery report — the candidate is dropped but scanning continues so an operator sees every broken manifest at once.

Agent-version gating

[plugin]
min_agent_version = "0.2.0"

On load the runtime compares against the agent build version. A mismatch logs a diagnostic and drops the candidate. Useful for shipping a manifest that relies on a newer host API without crash-looping older deployments. The host can override the reported version for tests via set_agent_version().

Next

  • Discovery and NATS runtime — how the manifest drives spawn
  • CLIagent ext validate <path> checks a manifest without touching the registry
  • Templates — prebuilt skeletons to copy

Stdio runtime + Discovery

The stdio runtime is the default way extensions run: a child process speaking line-delimited JSON-RPC over stdin/stdout. This page covers how the runtime discovers, spawns, supervises, and registers tools from a stdio extension.

Source: crates/extensions/src/discovery.rs, crates/extensions/src/runtime/stdio.rs.

Discovery

# config/extensions.yaml
extensions:
  enabled: true
  search_paths: [./extensions]
  ignore_dirs: [node_modules, .git, target]
  disabled: []
  allowlist: []            # empty = all allowed
  max_depth: 4
  follow_links: false
  watch:
    enabled: false
    debounce_ms: 500

ExtensionDiscovery walks each search path, looking for plugin.toml files:

flowchart TD
    ROOT[search_paths root] --> WALK[walkdir max_depth]
    WALK --> IGNORE{dir in<br/>ignore_dirs?}
    IGNORE -->|yes| SKIP[skip]
    IGNORE -->|no| FIND[find plugin.toml]
    FIND --> PARSE[parse + validate manifest]
    PARSE --> SIDE[sidecar .mcp.json if manifest<br/>has no mcp_servers]
    SIDE --> PRUNE[prune nested candidates]
    PRUNE --> DEDUP[dedupe by id]
    DEDUP --> DIS[apply disabled filter]
    DIS --> ALLOW[apply allowlist filter]
    ALLOW --> SORT[sort by root_index, id]
    SORT --> CANDS[DiscoveryReport<br/>candidates + diagnostics]

Prune-nested removes any candidate whose root_dir is a strict descendant of another — avoids registering an extension twice if it happens to live inside another extension's tree. Algorithm is O(N × depth).

follow_links = false is the default (monorepo-safe). When enabled, symlink escapes out of the root raise DiagnosticLevel::Error.

Gating

Before spawn, Requires::missing() runs:

flowchart LR
    CAND[candidate] --> REQ[requires.bins<br/>+ requires.env]
    REQ --> BINS{all on $PATH?}
    BINS -->|no| SKIP1[warn + skip]
    BINS -->|yes| ENV{all env set?}
    ENV -->|no| SKIP2[warn + skip]
    ENV -->|yes| SPAWN[spawn runtime]

A skipped extension does not register any tools. The warn log names exactly which bin or env var was missing.

Spawn model

sequenceDiagram
    participant H as Host (agent)
    participant S as StdioRuntime
    participant C as Child process

    H->>S: spawn(manifest, cwd)
    S->>C: tokio::process::Command
    S->>C: {"jsonrpc":"2.0","method":"initialize",<br/>"params":{"agent_version","extension_id"},"id":0}
    C-->>S: {"result":{"server_version","tools":[...],"hooks":[...]}}
    S-->>H: HandshakeInfo
    H->>H: register each tool as ExtensionTool
    H->>H: register each hook as ExtensionHook
  • Child is spawned with the extension's directory as cwd
  • stdin + stdout is the RPC channel (line-delimited JSON)
  • stderr is routed to the agent's tracing output
  • Handshake timeout: default 10 s

Tool descriptors

{
  "name": "get_weather",
  "description": "Look up weather by city.",
  "input_schema": { "type": "object", "properties": { "city": { "type": "string" } }, "required": ["city"] }
}

The host wraps each descriptor in an ExtensionTool:

  • Registered name: ext_{plugin_id}_{tool_name} (truncated with hash suffix if it exceeds 64 chars)
  • Description prefixed with [ext:{id}] so the LLM knows the origin
  • input_schema copied to the registered tool

Context passthrough

If the manifest sets context.passthrough = true, every call() injects:

{ "_meta": { "agent_id": "...", "session_id": "..." }, ...user_args }

The extension can decide how to split state per agent or session.

Env injection

The host passes through most env vars to the child, but blocks secret-like names via substring/suffix rules:

  • Suffixes: _TOKEN, _KEY, _SECRET, _PASSWORD, _CREDENTIAL, _PAT, _AUTH, _APIKEY, _BEARER, _SESSION
  • Substrings: PASSWORD, SECRET, CREDENTIAL, PRIVATE_KEY

Extensions that need a secret should read it from a file path the host passes by argument, or have the secret baked into their own requires.env entry (which the operator whitelists consciously).

Supervision

stateDiagram-v2
    [*] --> Spawning
    Spawning --> Ready: handshake ok
    Ready --> Restarting: child crash
    Restarting --> Ready: handshake ok again
    Restarting --> Failed: max attempts<br/>in restart_window
    Ready --> Shutdown: graceful signal
    Failed --> Shutdown
    Shutdown --> [*]

Supervisor policy:

  • Max restart attempts within a sliding restart_window
  • Exponential backoff base_backoffmax_backoff
  • Each transport is wrapped in a CircuitBreaker named ext:stdio:{id} so hung children don't freeze the agent loop

Graceful shutdown sends an empty message, waits shutdown_grace (default 3 s), then kills the child.

Watcher (phase 11.2 follow-up)

With extensions.watch.enabled: true the runtime watches search_paths for changes to any plugin.toml. Change-set is debounced (debounce_ms) and compared by SHA-256 of the file to squash spurious writes.

On change the runtime logs — it does not auto-reload. The operator restarts the agent to pick up the new manifest. Hot reload is a future phase.

Gotchas

  • Blocked env vars surprise extensions. If an extension expected OPENAI_API_KEY to come through and it wasn't declared in requires.env, the name-based block may silently strip it. Declare the env you need — that whitelists it.
  • follow_links: true + symlinked monorepo layouts can cause discovery to traverse out of the search root. Keep follow_links: false unless you know the layout is bounded.
  • Children crashing during handshake. You get a single DiagnosticLevel::Error per candidate, not a retry loop. Fix the binary, restart the host.

NATS runtime

For extensions that run out-of-process and manage their own lifecycle — a long-lived service on another machine, a container in an orchestrator, an operator-maintained daemon. The agent talks to them over NATS RPC instead of stdin/stdout.

Source: crates/extensions/src/runtime/nats.rs.

When to pick NATS over stdio

Use stdioUse NATS
Extension is a binary you ship with the agentExtension is a separate service you operate
Lifecycle is tied to the agentLifecycle is independent (k8s, systemd)
Fast local startup; co-resident on same hostMight be remote or shared between hosts
Dev-loop: install once and forgetSensitive deployment — deploy independently of the agent

Stdio is the default. Reach for NATS when the extension's failure domain must be separated from the agent's.

Manifest

[plugin]
id = "heavy-compute"
version = "0.3.0"

[capabilities]
tools = ["long_running_job"]

[transport]
type = "nats"
subject_prefix = "ext.heavy-compute"

Wire shape

Single request/reply subject:

{subject_prefix}.{extension_id}.rpc
sequenceDiagram
    participant A as Agent
    participant N as NATS
    participant E as Extension service

    A->>N: publish ext.heavy-compute.rpc<br/>{method:"initialize", ...}
    N->>E: deliver
    E->>N: reply HandshakeInfo
    N-->>A: tools + hooks
    A->>A: register ExtensionTool per tool
    Note over A,E: steady state
    loop tool call
        A->>N: {method:"tools/long_running_job", params, id}
        N->>E: deliver
        E-->>N: result
        N-->>A: reply
    end

The JSON-RPC shape is identical to stdio — only the transport changes. Extensions don't need to know which form the host chose.

Liveness

Instead of supervising a child process, the NATS runtime uses heartbeats:

FieldDefaultPurpose
heartbeat_interval15 sExpected beacon cadence from the extension.
heartbeat_grace_factor3Mark failed after grace_factor × interval silence.

A failed extension logs a warn and is marked unavailable. Tools stay registered in the registry but calls error out immediately. When the extension starts beaconing again, it's automatically marked available.

Circuit breaker

Same pattern as stdio: one CircuitBreaker per extension, ext:nats:{id}, wrapping every RPC. Prevents a flapping extension from piling up outstanding calls against it.

Deployment recipes

Docker compose side service

services:
  agent:
    image: nexo-rs:latest
    depends_on: [nats, heavy-compute]
  nats:
    image: nats:2.10-alpine
  heavy-compute:
    image: my-ext:0.3.0
    command: ["--nats-url", "nats://nats:4222",
              "--subject-prefix", "ext.heavy-compute"]

Kubernetes

Run the extension as its own Deployment with its own resource limits, rollouts, and observability. Share the NATS cluster via a Service. Scale extensions independently of agents.

Gotchas

  • subject_prefix collisions. Two extensions with the same prefix will step on each other. Enforce uniqueness in your ops convention.
  • Latency. NATS over LAN is sub-millisecond, but any network hop is orders of magnitude slower than stdio's pipe. Don't pick NATS for a 1 kHz tool call pattern.
  • Auth on the broker. NATS auth applies to extensions too — if you turn on NKey mTLS, every extension service must be enrolled.

CLI (agent ext)

Operator-facing commands for discovering, installing, validating, and toggling extensions. Every subcommand accepts --json for scripting.

Source: crates/extensions/src/cli/.

Subcommands

agent ext list                           [--json]
agent ext info <id>                      [--json]
agent ext enable <id>
agent ext disable <id>
agent ext validate <path>
agent ext doctor                         [--runtime] [--json]
agent ext install <path>                 [--update] [--enable] [--dry-run] [--link] [--json]
agent ext uninstall <id> --yes           [--json]

list — discovered extensions

Walks the configured search_paths, prints each candidate, its transport, and its enabled/disabled state.

info <id> — manifest + status

Prints the full parsed manifest, the runtime state if the agent is currently running, and any diagnostics attached to the candidate.

enable / disable — toggle in extensions.yaml

Rewrites the disabled list in config/extensions.yaml:

extensions:
  disabled: [weather]

No runtime side effect; operator must restart the agent to apply.

validate <path> — manifest check without registering

Parses and validates a plugin.toml at <path>. Good for CI checks on an extension's manifest before shipping.

doctor — preflight checks

Runs the same Requires::missing() logic as discovery, plus transport-specific checks:

flowchart TB
    START([agent ext doctor]) --> DISC[discover candidates]
    DISC --> REQ[check requires.bins + requires.env]
    REQ --> RUNT{--runtime?}
    RUNT -->|yes| SPAWN[spawn each stdio extension<br/>and handshake]
    RUNT -->|no| DONE([report table])
    SPAWN --> DONE

--runtime actually spawns each stdio extension and runs the handshake — useful to catch a broken binary before production boot.

Adds an extension to the active search_paths:

agent ext install ./extensions/weather
agent ext install /abs/path/to/my-ext --link --enable
  • --update replaces an existing extension with the same id
  • --enable adds it to extensions.yaml enabled (default: disabled until you enable)
  • --dry-run prints what would happen without writing
  • --link creates a symlink instead of copying — requires an absolute source path. Good for dev loops.

uninstall <id> --yes

Removes the extension's directory from the active search path (or the symlink, in --link installs). --yes is mandatory — no accidental destruction.

Exit codes

CodeMeaning
0Success
1Extension not found / --update target missing
2Invalid manifest / invalid source / --link needs absolute path
3Config write failed
4Invalid id (reserved or empty)
5Target exists (use --update)
6Id collision across roots
7uninstall missing --yes confirmation
8Copy / atomic swap failed
9Runtime check(s) failed (doctor --runtime)

Non-zero codes are stable for scripting.

JSON mode

Every subcommand that produces human output also supports --json for machine consumption. Fields are stable per code-phase; schema is not officially frozen yet — pin to a specific agent version in CI.

Common ops flows

Ship an extension to staging

agent ext validate ./my-ext/plugin.toml
agent ext install ./my-ext --link --enable
agent ext doctor --runtime

Disable a flapping extension without redeploying

agent ext disable weather   # writes to extensions.yaml
systemctl reload agent       # or restart, depending on deployment

CI gate

# .github/workflows/extension.yml
- run: cargo build --release
- run: agent ext validate ./plugin.toml

Templates

The repo ships two extension templates as starting points. Copy one, rename it, fill in the tools, done.

Location: extensions/template-rust/ and extensions/template-python/.

What's shared

Both templates follow the same wire protocol and directory shape:

<your-ext>/
├── plugin.toml        # manifest (see ./manifest.md)
├── README.md          # what the extension does
├── <binary or script> # stdio-RPC entry point
└── ...                # build files specific to the language

The agent talks to both in the same JSON-RPC 2.0 shape:

  • initialize — handshake; returns {server_version, tools, hooks}
  • tools/<name> — tool invocation; returns the tool's result
  • hooks/<name> — hook invocation (when any hook is declared)

Line-delimited JSON over stdin/stdout. stderr is forwarded to the agent's tracing output — that's your debug log.

Rust template (extensions/template-rust/)

Standalone Cargo project outside the agent workspace — its own Cargo.toml, own Cargo.lock, own target/. Keeps your extension's deps independent of the agent's.

template-rust/
├── Cargo.toml
├── Cargo.lock
├── plugin.toml
├── README.md
├── src/
│   └── main.rs        # JSON-RPC loop
└── target/            # (gitignore)

src/main.rs implements:

#![allow(unused)]
fn main() {
// pseudocode
loop {
    let line = read_line_from_stdin();
    let req: JsonRpcRequest = parse(line);
    let result = match req.method.as_str() {
        "initialize" => handshake_info(),
        "tools/ping" => ping(req.params),
        "tools/add"  => add(req.params),
        "hooks/before_message" => pass(),
        _ => method_not_found(),
    };
    write_line_to_stdout(json!({ "jsonrpc": "2.0", "id": req.id, "result": result }));
}
}

Build with cargo build --release; the release binary at ./target/release/template-rust is what plugin.toml::transport.command points at.

Python template (extensions/template-python/)

template-python/
├── plugin.toml
├── main.py       # #!/usr/bin/env python3
└── README.md

stdlib only (no pip install). Same JSON-RPC loop over stdin/stdout. Logs to stderr via print(..., file=sys.stderr).

Good for quick extensions where starting a Python interpreter per tool call is acceptable (batch workloads, cron-ish tasks, one-off scripting).

Promoting a template to your own extension

flowchart LR
    COPY[copy template-rust<br/>to my-extension] --> EDIT[edit plugin.toml<br/>id, version, tools]
    EDIT --> CODE[implement tools/...]
    CODE --> BUILD[cargo build --release]
    BUILD --> VAL[agent ext validate<br/>./my-extension/plugin.toml]
    VAL --> INSTALL[agent ext install<br/>./my-extension --link --enable]
    INSTALL --> DOCTOR[agent ext doctor<br/>--runtime]

Conventions in the shipped templates

  • plugin.toml declares the minimum required capabilities — no phantom hooks or tools
  • requires.bins / requires.env left empty; add your own
  • [context] passthrough = false — opt in explicitly when you need per-agent / per-session state
  • License left blank — pick one and add it to [meta]

Gotchas

  • Rust template builds in its own workspace. Don't cargo add from the repo root — that edits the agent workspace, not the extension.
  • Python template spawns a new interpreter per extension, not per tool call. Stdin/stdout stay open for the life of the process. Don't exit after one tool call.
  • JSON-RPC ids must echo back. If your handler drops the id field, the agent can't correlate the reply.

1Password extension

A bundled stdio extension that wraps the op CLI with a service-account token. Read-only: it never creates or edits secrets. Two main use cases:

  • Look up a secret you don't already have in env (read_secret).
  • Use a secret in a command without ever exposing it to the agent (inject_template).

Source: extensions/onepassword/. Skill prompt: skills/onepassword/SKILL.md.

Tools

ToolReveals secret?Audited
statusnono
whoaminono
list_vaultsnono
list_itemsno — strips field valuesno
read_secretonly if OP_ALLOW_REVEAL=trueyes
inject_templatetemplate-only mode reveals only with OP_ALLOW_REVEAL=true; exec mode never reveals to the LLMyes

read_secret

{ "action": "read_secret", "reference": "op://Prod/Stripe/api_key" }

Default response (reveal off):

{
  "ok": true,
  "reference": "op://Prod/Stripe/api_key",
  "vault": "Prod", "item": "Stripe", "field": "api_key",
  "length": 26,
  "fingerprint_sha256_prefix": "3f9a7c2e1b48d5a0",
  "reveal": false
}

With OP_ALLOW_REVEAL=true|1|yes set on the agent process, the response also contains { "value": "...", "reveal": true }.

inject_template

Resolves {{ op://Vault/Item/field }} placeholders via op inject. Two execution paths:

Template-only

{ "action": "inject_template",
  "template": "Authorization: Bearer {{ op://Prod/API/token }}\n" }
  • Reveal off → { length, fingerprint_sha256_prefix, reveal: false }
  • Reveal on → { rendered: "Authorization: Bearer abc…", reveal: true }

Exec (piped to a command)

{ "action": "inject_template",
  "template": "Bearer {{ op://Prod/API/token }}",
  "command": "curl",
  "args": ["-H", "@-", "https://api.example.com/me"] }
  • command must be in OP_INJECT_COMMAND_ALLOWLIST (comma-separated). Default empty → exec mode disabled.
  • Rendered template is never returned to the LLM. Only the downstream command's exit_code, stdout (capped at max_stdout_bytes, default 4096, max 16384), and stderr.
  • Both stdout and stderr are redacted before being returned — Bearer JWT, sk-…, sk-ant-…, AKIA…, and 32+ char hex tokens are replaced with [REDACTED:<label>].

Dry run

{ "action": "inject_template",
  "template": "{{ op://A/B/c }} {{ op://X/Y/z }}",
  "dry_run": true }

Validates each op:// reference's shape without resolving values. Returns references_validated.

Configuration

Environment variables consumed by the extension:

VarPurposeDefault
OP_SERVICE_ACCOUNT_TOKENrequired
OP_ALLOW_REVEALtrue/1/yes to allow value revealoff
OP_AUDIT_LOG_PATHJSONL audit log path./data/secrets-audit.jsonl
OP_INJECT_COMMAND_ALLOWLISTcomma-separated allowed exec commandsempty (exec disabled)
OP_INJECT_TIMEOUT_SECSper-call timeout (capped at MAX_TIMEOUT_SECS)30
OP_TIMEOUT_SECSper-call timeout for non-inject commands15
AGENT_IDinjected by the host on spawn — appears in audit
AGENT_SESSION_IDinjected by the host on spawn

Audit log

read_secret and inject_template append one JSON line per call to OP_AUDIT_LOG_PATH. The log is append-only and contains only metadata — never the secret value.

{"ts":"2026-04-25T18:00:00Z","action":"read_secret","agent_id":"kate","session_id":"f1...","op_reference":"op://Prod/Stripe/token","fingerprint_sha256_prefix":"a1b2c3d4e5f6789a","reveal_allowed":false,"ok":true}
{"ts":"2026-04-25T18:00:05Z","action":"inject_template","agent_id":"kate","session_id":"f1...","references":["op://Prod/Stripe/token"],"command":"curl","args_count":4,"dry_run":false,"ok":true,"exit_code":0,"stdout_total_bytes":124,"stdout_returned_bytes":124,"stdout_truncated":false}
{"ts":"2026-04-25T18:00:10Z","action":"inject_template","agent_id":"kate","session_id":null,"references":["op://Bad/Ref"],"command":"rm","args_count":0,"dry_run":false,"ok":false,"error":"command_not_in_allowlist"}

Failures writing the log are reported to stderr and never block the tool — the secret has already been read or piped; refusing to log would be worst-of-both-worlds.

Rotate with logrotate or any append-aware rotator. Keeping the log on a partition with limited write access (separate user, AppArmor, or dedicated tmpfs) reduces forensic tampering surface.

Threat model

  • The agent process is trusted. Reveal is gated by an env var the operator controls; once on, the value is just a string in memory that flows through the LLM, transcripts, and any tool that touches it.
  • Exec mode is the recommended path for any operation that does not require the agent to see the secret. The LLM only knows that the operation succeeded, not what the credential looked like.
  • Redaction is best-effort. Stdout from a poorly-behaved command could still leak a secret in a shape we don't recognize. Cap the max_stdout_bytes aggressively when in doubt.
  • The audit log is not encrypted. It contains references and fingerprints, not values. If even the references are sensitive, put the log on a permissioned filesystem.

Model Context Protocol (MCP)

nexo-rs is both an MCP client (consumes tools from external MCP servers) and an MCP server (exposes its own tools so editors like Claude Desktop, Cursor, Zed can use them). Same wire, different directions.

Source: crates/mcp/, bridges in crates/core/src/agent/mcp_*.

The two directions

flowchart LR
    subgraph IDE[MCP clients]
        CD[Claude Desktop]
        CUR[Cursor]
        ZED[Zed]
    end
    subgraph AGENT[agent process]
        AS[Agent-as-server<br/>stdio bridge]
        AC[Agent-as-client<br/>session runtime]
    end
    subgraph EXT[External MCP servers]
        GS[Gmail MCP]
        DB[DB MCP]
        WF[Workflow MCP]
    end

    IDE --> AS
    AS --> AR[Agent tools registry]
    AC --> EXT
    AR --> AC
  • Server side — an MCP client (e.g. Claude Desktop) runs agent mcp serve. The agent's internal tools appear as MCP tools in that client.
  • Client side — the agent spawns external MCP servers (stdio or HTTP) and registers their tools into its own ToolRegistry, so agents can call them exactly like built-ins or extensions.

Phase map

PhaseWhat it adds
12.1MCP client over stdio
12.2MCP client over HTTP (streamable + SSE fallback)
12.3Tool catalog — merge MCP tools with extensions and built-ins
12.4Session runtime — per-session child spawn, sentinel-shared default
12.5Resources — resources/list + resources/read with optional LRU cache
12.6Agent as MCP server (stdio)
12.7MCP servers declared by extensions
12.8tools/list_changed debounced hot-reload

All eight landed. See PHASES.md.

Why both sides

Being a client lets agents tap any MCP ecosystem without needing a custom extension per service — if the thing you want speaks MCP, you can reach it today.

Being a server lets the carefully-sandboxed tool surface of nexo-rs (allowed_tools, outbound_allowlist, etc.) be reused from any MCP-speaking client. Your LLM-driven IDE gets access to WhatsApp send, Gmail poll, browser CDP, and everything else — without you wiring each one into the IDE's config.

Wire shape (both directions)

JSON-RPC 2.0. For transports:

  • stdio — child process, line-delimited JSON on stdin/stdout
  • streamable HTTP — modern MCP 2024-11-05 shape
  • SSE — legacy; used as automatic fallback
sequenceDiagram
    participant H as Host (agent or IDE)
    participant S as MCP server

    H->>S: initialize (id=0)
    S-->>H: InitializeResult (capabilities, serverInfo)
    H->>S: notifications/initialized (fire-and-forget)
    loop steady state
        H->>S: tools/list
        S-->>H: tools[]
        H->>S: tools/call {name, args}
        S-->>H: content blocks
    end
    alt tool list changes
        S-->>H: notifications/tools/list_changed
        H->>S: tools/list (debounced refresh)
    end

Where to go next

MCP client (stdio + HTTP)

How nexo-rs consumes tools from external MCP servers. Every MCP tool ends up in the same ToolRegistry that hosts built-ins and extensions — the LLM calls them identically.

Source: crates/mcp/src/client.rs, crates/mcp/src/http/client.rs, crates/mcp/src/manager.rs, crates/mcp/src/session.rs, crates/core/src/agent/mcp_catalog.rs.

Config

# config/mcp.yaml
mcp:
  enabled: true
  session_ttl: 30m
  idle_reap_interval: 60s
  connect_timeout_ms: 10000
  call_timeout_ms: 30000
  shutdown_grace_ms: 3000
  servers:
    gmail:
      transport:
        type: stdio
        command: ./mcp-gmail
        args: []
      env:
        GMAIL_TOKEN: ${file:./secrets/gmail_token.json}
    workflow:
      transport:
        type: http
        url: https://mcp.example.com/workflow
        mode: auto          # streamable_http | sse | auto
        headers:
          Authorization: Bearer ${WORKFLOW_TOKEN}
  resource_cache:
    enabled: true
    ttl: 30s
    max_entries: 256
  resource_uri_allowlist: []   # empty = permissive
  strict_root_paths: false
  context:
    passthrough: true
  sampling:
    enabled: false
  watch:
    enabled: false
    debounce_ms: 200

Transports

stdio

Child process per server. Line-delimited JSON-RPC 2.0 over stdin/stdout. stderr is routed to the agent's tracing output.

sequenceDiagram
    participant M as McpRuntimeManager
    participant S as Server (child process)

    M->>S: spawn Command(cmd, args, env)
    M->>S: {"method":"initialize","id":0, ...}
    S-->>M: capabilities + serverInfo
    M->>S: notifications/initialized (no-reply)
    Note over M,S: steady state — tools/list, tools/call, resources/*
    M->>S: notifications/cancelled (per in-flight id)<br/>then shutdown_grace

HTTP — streamable vs SSE

Three modes selectable per server:

modeBehavior
streamable_httpMCP 2024-11-05 spec — modern
sseLegacy Server-Sent Events fallback
auto (default)Try streamable_http; on 404/405/415, fall back to SSE

Each connection gets an mcp-session-id header. Additional headers (auth, routing) pass through a HeaderMap; values are env-resolved at config load.

Session runtime

A single McpRuntimeManager lives per process. Inside, a SessionMcpRuntime per conversation session keeps its own map of live MCP clients:

flowchart TB
    MGR[McpRuntimeManager<br/>one per process]
    MGR --> SENT[Sentinel session<br/>UUID = nil<br/>shared by all agents]
    MGR --> S1[session A runtime]
    MGR --> S2[session B runtime]
    SENT --> C1[mcp client: gmail]
    SENT --> C2[mcp client: workflow]
    S1 --> CX[session-scoped clients<br/>for stateful servers]
  • Sentinel session (UUID = nil) is the default shared namespace — all agents see the same clients, avoiding duplicate child processes for servers that don't need per-session isolation
  • Per-session runtimes are spawned when a server genuinely needs independent state (example: a workflow engine that tracks its own context per user)
  • Idle reap — every idle_reap_interval, the manager disposes sessions unused for longer than session_ttl, shutting their clients down gracefully
  • Config fingerprinting — changes to the servers set produce a new fingerprint; runtimes are rebuilt on request; concurrent requests de-dupe so only one rebuild happens

Tool catalog

McpToolCatalog::build() calls tools/list on every configured server in parallel and merges the results:

flowchart LR
    LIST[tools/list per server<br/>parallel] --> PREFIX[prefix names:<br/>server_toolname]
    PREFIX --> MERGE[merge into ToolRegistry]
    MERGE --> LLM[tools visible to LLM]
    LIST -.->|single-server error| ERR[non-fatal:<br/>server visible with error=...]
  • Names are always prefixed {server_name}_{tool_name} so collisions across servers can't happen
  • Duplicates within the same server → first wins, warn log
  • input_schema is passed through verbatim
  • Server capability resources unlocks two meta-tools for reading resources

Tool call flow

sequenceDiagram
    participant A as Agent
    participant C as McpCatalog tool
    participant R as SessionMcpRuntime
    participant S as MCP server
    participant CB as CircuitBreaker

    A->>C: invoke gmail_list_messages(...)
    C->>R: call(server=gmail, tool=list_messages, args)
    R->>CB: allow?
    CB-->>R: yes
    R->>S: tools/call {name, args, _meta}
    S-->>R: content blocks
    R-->>C: content
    C-->>A: result

Every RPC goes through a per-server CircuitBreaker. If the breaker is open, the call fails fast instead of hanging on a dead server.

Context passthrough

When mcp.context.passthrough: true, tools/call injects:

{ "_meta": { "agent_id": "ana", "session_id": "..." }, ...args }

Server-side code can use this to scope state per agent without the schema leaking that concern.

Resources

Servers advertising resources capability unlock:

  • resources/list (paginated via cursor, max 64 pages)
  • resources/read (optionally cached via LRU)
  • resources/templates/list (URI templates)

Cache config:

resource_cache:
  enabled: true
  ttl: 30s
  max_entries: 256

Cache invalidates on notifications/resources/list_changed. Optional per-scheme allowlist (resource_uri_allowlist: ["file", "db"]) rejects unknown URI schemes before dispatch.

Hot reload (phase 12.8)

flowchart LR
    S[server notifies<br/>tools/list_changed] --> DBC[200 ms debounce]
    DBC --> REL[catalog rebuild]
    REL --> REG[ToolRegistry re-populated<br/>with new schema]

Same flow for resources. Agents in flight at the moment of the rebuild keep their references to the old tool definitions — next turn uses the refreshed registry.

Gotchas

  • One MCP child per server by default. Turn on per-session isolation only for servers that genuinely need it; spawning a child per session multiplies resource cost.
  • notifications/initialized is fire-and-forget. If the server insists on acknowledging it, you have a broken server.
  • SSE is a last resort. It's in auto for compatibility; new server deployments should speak streamable HTTP.
  • Circuit breakers are per-server. One bad server doesn't freeze the catalog; but a flapping one still slows the agent loop via backoff waits.

Agent as MCP server

Expose the agent's tools over MCP so Claude Desktop, Cursor, Zed, or any other MCP-speaking client can use them. Stdio transport; the agent runs as a child process of the consuming client.

Source: crates/mcp/src/server/, crates/core/src/agent/mcp_server_bridge.rs.

Config

# config/mcp_server.yaml
enabled: true
name: agent
allowlist: []            # empty = every native tool; populated = strict allowlist
expose_proxies: false    # set true to also expose ext_* and mcp_* proxy tools
auth_token_env: ""       # optional env var holding a shared bearer token
FieldDefaultPurpose
enabledfalseMust be true for the server subcommand to start.
name"agent"Reported as serverInfo.name in handshake.
allowlist[]Empty = all native tools. Populated = only these names reach the MCP client. Globs (memory_*) supported.
expose_proxiesfalseWhether ext_* (extension) and mcp_* (upstream MCP) proxy tools are surfaced.
auth_token_env""If set, the initialize request must present this token; unauthenticated clients get rejected.

Running it

agent mcp serve --config ./config

The process reads JSON-RPC from stdin and writes responses to stdout — exactly the shape Claude Desktop, Cursor, etc. expect.

Claude Desktop example

~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "nexo": {
      "command": "/usr/local/bin/agent",
      "args": ["mcp", "serve", "--config", "/srv/nexo-rs/config"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

The Anthropic client spawns the agent, handshakes, and then every agent tool shows up in the conversation's tool list.

Wire flow

sequenceDiagram
    participant IDE as MCP client (Claude Desktop)
    participant A as agent mcp serve
    participant TR as ToolRegistry
    participant AG as Agent tools

    IDE->>A: initialize (auth_token if configured)
    A-->>IDE: capabilities + serverInfo (name, version)
    IDE->>A: notifications/initialized
    loop every turn
        IDE->>A: tools/list
        A->>TR: filtered by allowlist + expose_proxies
        A-->>IDE: tool defs
        IDE->>A: tools/call {name, args}
        A->>AG: invoke tool
        AG-->>A: result
        A-->>IDE: content blocks
    end

Tool exposure rules

flowchart TD
    ALL[every tool registered in ToolRegistry]
    ALL --> FILT1{allowlist<br/>empty?}
    FILT1 -->|yes| NATIVE[keep native tools only]
    FILT1 -->|no| GLOB[keep tools matching allowlist]
    NATIVE --> FILT2{expose_proxies?}
    GLOB --> FILT2
    FILT2 -->|yes| OUT[include ext_* and mcp_* too]
    FILT2 -->|no| SKIP[drop ext_* and mcp_*]
    OUT --> EMIT[tools/list response]
    SKIP --> EMIT
  • Native toolsmemory_*, whatsapp_*, telegram_*, browser_*, forge_*, etc.
  • Proxy toolsext_<id>_<tool> for extensions, <server>_<tool> for upstream MCP. Hidden by default to avoid proxying an external server through to another external client.

Capabilities advertised

  • tools — always
  • resources — advertised only if the agent exposes any via the server handler (phase 12.5 puts the groundwork in, consumer features follow)
  • prompts — reserved, not advertised yet
  • logging — conditional on handler implementation

Auth

When auth_token_env is set, the initialize request must present the token (via a server-specific header convention or as an _meta field). Clients that don't know the token get rejected before anything else happens. Useful when the agent is launched through a shared-host proxy rather than a local command: spawn.

Security model

  • Read-only by default? No — the server exposes whatever the allowlist permits. Model it explicitly:
    allowlist:
      - memory_recall    # read memory
      - memory_store     # write memory  (remove for read-only)
    
  • Outbound channels (whatsapp_send_message, telegram_send_message) will send real messages from the agent's configured accounts. Include them in the allowlist only if the IDE user should be able to do that.
  • expose_proxies: true is transitive power. It gives the IDE the full tool set of every extension and upstream MCP server too.

Gotchas

  • Allowlist globs match tool names, not prefixes. memory_* matches memory_recall and memory_store but not memory_history (phase 10.9 tool). Write the pattern to match the real set.
  • No per-IDE-user identity. The server has one identity = the agent's configured credentials. If multiple humans share the IDE, they share the agent's blast radius.
  • Proxies forward the agent's rate limits. Calling whatsapp_send_message through the MCP server is the same as an agent calling it — counts against the same WhatsApp rate bucket.

Skills catalog

nexo-rs uses "skill" to mean two different things. Both are covered on this page; gating semantics for each live in Gating by env / bins.

  1. Extension skills — shipped under extensions/ in the repo, discovered and spawned like any other stdio extension. 22 of them landed in Phase 13.
  2. Local skills — markdown files under an agent's skills_dir/ that get injected into the system prompt at turn start.

The two overlap in name but not in mechanism:

Extension skillLocal skill
Where it livesextensions/<id>/ with plugin.tomlskills/<name>/SKILL.md
How it's loadedExtension discovery → stdio spawnSkillLoader at turn time
What it producesTools in ToolRegistryText injected into the prompt
GatingWarn + continue, tools still registeredWarn + skip entirely

Extension skills (Phase 13)

All shipped as stdio extensions written in Rust. _common is a shared Rust library (circuit-breaker primitives), not an extension itself.

Core utilities

IdPurposeRequires
weatherCurrent + forecast via Open-Meteo (no auth).
openstreetmapForward / reverse geocoding via Nominatim.
wikipediaArticle search + summaries.
fetch-urlHTTP GET / POST with SSRF guard, retries, circuit breaker.
rssFetch & parse RSS / Atom / JSON feeds.
dns-toolsA/AAAA/MX/TXT/NS/SOA/SRV + reverse + whois.
endpoint-checkHTTP probe (status + latency) + TLS cert inspection.
pdf-extractExtract text from PDFs.
translateLibreTranslate self-hosted or DeepL API.
summarizeChat-based text/file summary via OpenAI-compat endpoint.
openai-whisperAudio transcription via OpenAI-compat /audio/transcriptions.

Search & knowledge

IdPurposeRequires
brave-searchWeb search.env BRAVE_SEARCH_API_KEY
goplacesGoogle Places text search + details.
wolfram-alphaComputational queries (short + full pods).env WOLFRAM_APP_ID

Infra & ops

IdPurposeRequiresWrite-gate
githubREST API: PRs, checks, issues.env GITHUB_TOKEN
cloudflareDNS, zones, cache purge.env CLOUDFLARE_API_TOKEN
docker-apips, inspect, logs, stats, start, stop, restart.bin dockerenv DOCKER_API_ALLOW_WRITE
proxmoxProxmox VE: nodes, VMs, containers, lifecycle.env PROXMOX_TOKENenv PROXMOX_ALLOW_WRITE, env PROXMOX_INSECURE_TLS for self-signed certs
onepassword1Password secrets metadata; reveal gated.bin op, env OP_SERVICE_ACCOUNT_TOKENenv OP_ALLOW_REVEAL
ssh-execRemote command execution with host allowlist.bin ssh, scphost allowlist in config
tmux-remoteDrive tmux sessions (create, send keys, capture, kill).bin tmux

Media & content

IdPurposeRequires
msedge-ttsText-to-speech via Edge Read Aloud.
rtsp-snapshotFrames / clips from RTSP or HTTP camera streams.bin ffmpeg
video-framesExtract frames + audio from videos.bin ffmpeg, ffprobe
tesseract-ocrOCR with language packs + PSM modes.bin tesseract
yt-dlpDownload video / audio / metadata.bin yt-dlp
spotifyNow-playing, search, play, pause, skip.env SPOTIFY_ACCESS_TOKEN

Google (phase 13.18)

Single google extension covering 32 tools across Gmail, Calendar, Tasks, Drive, People, and Photos. Uses OAuth refresh-token flow. Writes gated by five independent env flags:

  • GOOGLE_ALLOW_SEND — Gmail send
  • GOOGLE_ALLOW_CALENDAR_WRITE
  • GOOGLE_ALLOW_DRIVE_WRITE
  • GOOGLE_ALLOW_TASKS_WRITE
  • GOOGLE_ALLOW_PEOPLE_WRITE

See Plugins — Google for the OAuth setup and the generic google_call tool that fronts the extension.

LLM providers (phase 13.19)

anthropic and gemini are native LLM clients living under crates/llm/, not extensions. See LLM providers and children.

Templates

IdPurposeLanguage
template-rustCopy-and-edit skeleton (ping, add).Rust
template-pythonstdlib-only skeleton.Python

See Extensions — Templates.

Local skills

Local skills are markdown files loaded by SkillLoader and injected into the system prompt at turn time. Defined in the agent config:

# agents.yaml
agents:
  - id: kate
    skills_dir: ./skills
    skills:
      - weather
      - github
      - summarize
      - google-auth

Each entry resolves to <skills_dir>/<name>/SKILL.md:

---
name: "Weather"
description: "Current conditions and forecasts"
requires:
  bins: ["curl"]
  env: ["WEATHER_API_KEY"]
max_chars: 5000
---
# Weather skill

Call `weather_forecast(city)` to get a 3-day forecast.
Use metric units. Default to the user's locale when unspecified.

Loading flow

flowchart TD
    CFG[agents.yaml skills: list] --> LOOP[for each name]
    LOOP --> READ[read skills_dir/name/SKILL.md]
    READ --> FM[parse YAML frontmatter]
    FM --> GATE{bins on PATH<br/>AND env set?}
    GATE -->|no| SKIP[warn + skip<br/>not injected]
    GATE -->|yes| RENDER[render into prompt:<br/>heading + blockquote + body]
    RENDER --> TRUNC[truncate to max_chars]
    TRUNC --> INJECT[inject into system prompt]

Why local skills skip-on-miss (vs extensions warn-and-continue)

A local skill is a text instruction to the LLM describing a capability. If the backing bin/env isn't available the tool will fail — but worse, the LLM was told the capability exists and will repeatedly try to use it. Skipping the skill prevents lying to the model.

An extension is a registered tool. If the LLM invokes it and the backing bin is missing, the tool returns an error — the LLM observes and adapts. Warn-and-continue is fine.

See Gating for the full semantics.

How to pick

  • Need the LLM to know how to do something (usage pattern, style rules, examples)? → local skill.
  • Need the LLM to do something (make a call, return data)? → extension skill.
  • Both? → ship the extension and write a local skill next to it that explains when to use it.

Gating by env / bins

Both kinds of skills (extension skills under extensions/ and local skills under skills_dir) declare what they need to work. The runtime checks those preconditions at load time and reacts differently depending on skill kind.

The declaration

Both kinds use the same shape. For an extension, it lives in plugin.toml:

[requires]
bins = ["ffmpeg", "ffprobe"]
env  = ["OPENAI_API_KEY"]

For a local skill it lives in the YAML frontmatter of SKILL.md:

---
name: "Whisper transcription"
requires:
  bins: ["ffmpeg"]
  env: ["OPENAI_API_KEY"]
---

Check semantics (source: crates/extensions/src/manifest.rs Requires::missing(), crates/core/src/agent/skills.rs):

  • bins — each name looked up on $PATH. On Windows also <bin>.exe.
  • env — each name must be set and non-empty.

Two reactions, one mechanism

flowchart TD
    CHECK[Requires::missing] --> ANY{missing bin<br/>or env?}
    ANY -->|no| OK[proceed]
    ANY -->|yes| KIND{skill kind}
    KIND -->|extension| WARN[warn<br/>continue<br/>tools still registered]
    KIND -->|local skill| SKIP[warn<br/>skip<br/>not injected into prompt]
Skill kindOn missing preconditions
ExtensionWarn log, still spawn + register tools. A subsequent tool call will fail visibly when the bin/env is absent.
Local skillWarn log, do not inject into the system prompt. The LLM never hears the skill existed.

Why the difference

A local skill is a description the LLM reads and internalizes — "you have a transcription skill, call whisper_transcribe." If the backing binary is missing, the tool call will fail. But the LLM was told the capability exists, so it will keep trying. Not injecting the skill prevents promising capabilities that can't be delivered.

An extension tool is observable: the LLM calls it, gets a concrete error back ("command tesseract not found on PATH"), and can adapt in the same turn. Warn-and-continue is the friendlier behavior — the operator sees the warning and can fix the config without the agent crash-looping.

Where this is logged

Both kinds emit the same structured warn log fields:

WARN skill=weather missing_bins=[] missing_env=[WEATHER_API_KEY]
     "skill disabled: required env vars unset or empty"
WARN extension=docker-api missing_bins=[docker] missing_env=[]
     "extension preflight: declared requires not satisfied (continuing anyway)"

Filter on missing_env or missing_bins to alert proactively.

Pre-deploy verification

Use the CLI:

agent ext doctor --runtime

This runs Requires::missing() for every discovered extension, and with --runtime actually spawns each stdio extension to run the handshake. Nothing is left to chance.

For local skills, a failing agent turn logs all skipped skills — a dry run against the smallest scripted input gives you the same signal without needing a separate command.

Reserved env for secrets

Extensions receive a filtered copy of the host's env. Names matching the secret-like patterns below are stripped before spawn (crates/extensions/src/runtime/stdio.rs):

  • Suffixes: _TOKEN, _KEY, _SECRET, _PASSWORD, _PASSWD, _PWD, _CREDENTIAL, _CREDENTIALS, _PAT, _AUTH, _APIKEY, _BEARER, _SESSION
  • Substrings: PASSWORD, SECRET, CREDENTIAL, PRIVATE_KEY

Declaring an env in requires.env whitelists it past the blocklist. That's the only supported way for an extension to receive a secret env var. Gating and whitelisting come from the same field — preconditions you declare travel alongside the value you want.

Write-gating in practice

Some shipped extensions gate destructive operations behind dedicated flags — separate from requires.env:

ExtensionWrite gate env var
docker-apiDOCKER_API_ALLOW_WRITE
proxmoxPROXMOX_ALLOW_WRITE
onepasswordOP_ALLOW_REVEAL (reveal vs metadata-only)
googleGOOGLE_ALLOW_SEND, GOOGLE_ALLOW_CALENDAR_WRITE, GOOGLE_ALLOW_DRIVE_WRITE, GOOGLE_ALLOW_TASKS_WRITE, GOOGLE_ALLOW_PEOPLE_WRITE

These are not handled by the generic gating layer — the extension reads them itself and refuses destructive methods when unset. Good pattern to adopt when your own extension wraps an API with destructive endpoints.

Gotchas

  • Empty env counts as missing. EXAMPLE_KEY= is treated the same as EXAMPLE_KEY unset. This is intentional — empty strings rarely mean "use the default" for a secret.
  • requires.bins checks $PATH at discovery. A binary installed after the agent starts won't be picked up until restart — or until you run agent ext doctor --runtime as a secondary gate.
  • Local-skill skip is silent to the LLM. If you expected a skill to be present and you don't see it in the system prompt, check the warn logs for the skip reason before debugging agent behavior.

Dependencies — modes and bin versions

A skill that depends on a CLI tool or an environment variable can declare those needs in requires. The runtime resolves the declarations at load time and decides whether to expose the skill, hide it, or expose it with a visible warning the LLM can see.

---
name: ffmpeg-tools
requires:
  bins: [ffmpeg]
  env:  [TRANSCODE_OUTPUT_DIR]
  bin_versions:
    ffmpeg: ">=4.0"
  mode: strict          # default
---

Modes

ModeWhen deps are missingLLM sees the skill?
strict (default)Skill is droppedNo
warnSkill loads with a > ⚠️ MISSING DEPS … banner prepended to its bodyYes — with the warning inline
disableSkill is always dropped, even when deps are satisfiedNo

Per-agent override

Operators override a skill's declared mode without editing the skill file:

agents:
  - id: kate
    skills: [ffmpeg-tools]
    skill_overrides:
      ffmpeg-tools: warn

Resolution order:

  1. agents.<id>.skill_overrides[<name>] (operator wins)
  2. Skill frontmatter requires.mode
  3. strict (built-in default)

Bin versions

requires.bin_versions adds a semver constraint on top of mere bin presence. Failing the constraint is treated like a missing dep — the active mode decides whether to skip or warn.

Constraint syntax

semver request strings:

WantConstraint
At least 4.0">=4.0"
Any 4.x compatible release"^4.0"
4.x but no 5">=4.0, <5.0"
Exact 4.2.1"=4.2.1"
Patch-compatible to 5.1.3"~5.1.3"

Versions like 4.2 are normalized to 4.2.0 before comparison so constraint matching works against partial outputs.

Custom probe

Defaults: <bin> --version, regex \d+\.\d+(?:\.\d+)?. Override when a tool emits something idiosyncratic:

requires:
  bin_versions:
    curl:
      constraint: ">=8.0"
      command: "--help"
      regex: 'curl (\d+\.\d+(?:\.\d+)?)'

The shorthand form bin: ">=4.0" and the long form bin: { constraint: …, command: …, regex: … } are both accepted.

Probe fail modes

ReasonWhen
bin_not_foundBinary not on PATH
probe_failedSpawn errored or timed out (5 s cap)
parse_failedThe default regex (or override) didn't match
constraint_unsatisfiedFound version doesn't match the constraint
invalid_constraintConstraint string couldn't be parsed as semver

Invalid constraints log at error level; the skill is treated as having a missing dep — boot continues so a typo in one skill doesn't take the whole agent down. Probes are cached process-wide by absolute path so a bin shared across skills only spawns once.

When mode: warn and any dep is missing, the skill body is rendered to the LLM with this prefix:

> ⚠️ MISSING DEPS for skill `ffmpeg-tools`:
>   - bin not found: ffmpeg
>   - env unset: TRANSCODE_OUTPUT_DIR
>   - version mismatch: ffmpeg requires >=4.0 (found 3.4.2)
> Calls into this skill may fail.

The LLM treats this like any other markdown context, so it has the information it needs to either avoid the skill or report a useful error to the user when a tool call fails.

Backwards compatibility

Skills without requires.mode, requires.bin_versions, or agents.<id>.skill_overrides keep the prior behavior (strict, no version checks). The defaults are chosen so an unmodified skill catalog and existing agents.yaml continue to work unchanged.

TaskFlow model

TaskFlow is a durable, multi-step flow runtime that survives process restarts and external waits. It's designed for work that spans more LLM turns than a single conversation buffer can hold — approvals, data pipelines, delegated subtasks, scheduled actions.

Source: crates/taskflow/ (types.rs, store.rs, engine.rs).

When to use it

Use TaskFlow when any of the following apply:

  • A task needs to pause and resume later (hours, days)
  • Multiple agents collaborate on one outcome
  • You need a full audit trail of what happened and when
  • You need recovery from a crash mid-task

If it's a one-shot turn, don't reach for TaskFlow — the runtime's normal session buffer is enough.

Flow shape

A flow is an opaque state_json (free-form JSON) plus metadata:

FieldPurpose
idUUID generated on creation.
controller_idString label identifying the flow definition (e.g. kate/inbox-triage).
goalHuman-readable statement of intent.
owner_session_keyagent:<id>:session:<session_id> — hard tenancy gate.
requester_originWho asked (user id, external system id).
current_stepString label for the current phase ("classify", "await_approval", …).
state_jsonFree-form JSON owned by the flow — the LLM mutates this over time.
wait_jsonCurrent wait condition while status = Waiting.
statusSee state machine below.
cancel_requestedSticky flag that forces the next valid transition to Cancelled.
revisionMonotonic integer; increments on every update. Used for optimistic concurrency.
created_at / updated_atTimestamps.

state_json is shallow-merged on updates: a patch { "foo": 1 } replaces only the foo key, everything else is preserved.

State machine

stateDiagram-v2
    [*] --> Created
    Created --> Running: start_running
    Running --> Waiting: set_waiting(condition)
    Waiting --> Running: resume
    Running --> Finished: finish
    Running --> Failed: fail
    Waiting --> Failed: fail
    Created --> Cancelled: cancel
    Running --> Cancelled: cancel
    Waiting --> Cancelled: cancel
    Finished --> [*]
    Failed --> [*]
    Cancelled --> [*]
  • Terminal states: Finished, Failed, Cancelled. No further transitions allowed.
  • Sticky cancel: cancel_requested = true forces the next allowed transition to land on Cancelled. The flag survives restart and is idempotent — multiple cancel requests converge on the same outcome.

Persistence

SQLite-backed via sqlx, pool size 5. Default path ./data/taskflow.db, override with TASKFLOW_DB_PATH.

Tables

CREATE TABLE flows (
  id                  TEXT PRIMARY KEY,
  controller_id       TEXT,
  goal                TEXT,
  owner_session_key   TEXT,
  requester_origin    TEXT,
  current_step        TEXT,
  state_json          TEXT,
  wait_json           TEXT,
  status              TEXT,
  cancel_requested    BOOLEAN,
  revision            INTEGER,
  created_at          INTEGER,
  updated_at          INTEGER
);

CREATE TABLE flow_steps (
  id                  TEXT PRIMARY KEY,
  flow_id             TEXT NOT NULL,
  runtime             TEXT,              -- Managed | Mirrored
  child_session_key   TEXT,
  run_id              TEXT,
  task                TEXT,
  status              TEXT,
  result_json         TEXT,
  created_at          INTEGER,
  updated_at          INTEGER,
  UNIQUE (flow_id, run_id)
);

CREATE TABLE flow_events (
  id          INTEGER PRIMARY KEY AUTOINCREMENT,
  flow_id     TEXT NOT NULL,
  kind        TEXT,
  payload_json TEXT,
  at          INTEGER
);
  • flows.revision drives optimistic concurrency (see FlowManager).
  • flow_events is append-only — every transition leaves a trail.
  • flow_steps.(flow_id, run_id) UNIQUE catches duplicate observations at the DB layer, not in a race-prone managerial check.

Wait conditions

Persisted in wait_json while status = Waiting.

#![allow(unused)]
fn main() {
enum WaitCondition {
    Timer { at: DateTime<Utc> },                        // auto-resume at time
    ExternalEvent { topic: String, correlation_id: String }, // resume when matching event arrives
    Manual,                                              // resume only via explicit call
}
}
ConditionResumed by
TimerWaitEngine::tick() when now >= at
ExternalEventtry_resume_external(flow_id, topic, correlation_id, payload)
ManualFlowManager::resume(id, patch) — typically via CLI or a deliberate LLM turn

There is no timeout built into the wait itself — you timeout by pairing any wait with a Timer fallback (e.g. fan out "wait for approval OR 24 h elapsed") via orchestration in the flow's step logic.

Audit trail

Every transition writes a flow_events row with:

  • kind: created, started, waiting, resumed, finished, failed, cancelled, state_updated, step_observed, ...
  • payload_json: contextual data (wait condition, result, reason, step info)
  • at: timestamp

The audit append happens inside the same SQLite transaction as the state update — you can never see a flow state that doesn't have a matching audit event, even after a crash mid-operation.

Mirrored flows

Beyond Managed flows (owned by FlowManager), you can create Mirrored flows that just observe externally-driven work:

  • create_mirrored(input) inserts a flow already in Running state
  • record_step_observation(StepObservation) upserts into flow_steps by (flow_id, run_id) — new observations merge with existing rows
  • Emits step_observed audit events

Useful for tracking tasks executed elsewhere — a delegation to another agent, a subprocess spawned out-of-band — while keeping one unified audit surface.

Next

  • FlowManager — the mutation API, revision retry, and agent-facing tools

FlowManager, tools, and CLI

FlowManager owns the mutation API for flows. It wraps the FlowStore with revision-checked atomic updates, the agent-facing taskflow tool, the WaitEngine, and the agent flow CLI.

Source: crates/taskflow/src/manager.rs, crates/taskflow/src/engine.rs, crates/core/src/agent/taskflow_tool.rs.

Responsibilities

flowchart LR
    subgraph FM[FlowManager]
        CREATE[create_managed<br/>create_mirrored]
        RUN[start_running<br/>set_waiting<br/>resume<br/>finish<br/>fail<br/>cancel]
        PATCH[update_state<br/>request_cancel]
        QUERY[get / list_by_owner / list_by_status / list_steps]
        OBS[record_step_observation]
    end
    FM --> STORE[FlowStore<br/>SQLite]
    FM --> ENG[WaitEngine]
    TOOL[taskflow tool<br/>agent-facing] --> FM
    CLI[agent flow CLI] --> FM
    ENG --> STORE

One manager per store — typically one per process. Same database file can be opened by multiple managers safely as long as each goes through the revision protocol.

Optimistic concurrency

Every mutation follows this loop:

flowchart TD
    START[mutation requested] --> FETCH[fetch current flow]
    FETCH --> APPLY[apply closure:<br/>transition, patch, etc.]
    APPLY --> SAVE[store.update_and_append<br/>WHERE id=? AND revision=?]
    SAVE --> RES{result}
    RES -->|ok| DONE([return updated flow])
    RES -->|RevisionMismatch| REFETCH[refetch + retry]
    REFETCH --> LIMIT{attempts >= 2?}
    LIMIT -->|no| APPLY
    LIMIT -->|yes| ERR([surface RevisionMismatch])
  • revision is a monotonic integer on every flow
  • Update runs UPDATE ... WHERE id=? AND revision=? — only one writer wins per revision
  • Retry budget is 2 attempts (1 fetch + 1 refetch); persistent conflict bubbles up to the caller
  • Update and audit-event append happen inside a single SQLite transaction — crash mid-operation cannot produce a desync between state and audit trail

WaitEngine

Broker-agnostic scheduler. Pull-based tick() advances any flow whose wait condition has fired.

flowchart LR
    TICK[WaitEngine::tick_at] --> SCAN[scan all Waiting flows]
    SCAN --> EVAL{evaluate wait}
    EVAL -->|Timer expired| RESUME1[resume]
    EVAL -->|still future| STAY1[stay waiting]
    EVAL -->|ExternalEvent / Manual| STAY2[stay waiting]
    EVAL -->|cancel_requested| CAN[transition to Cancelled]
    EXT[try_resume_external<br/>topic + correlation_id] --> MATCH{wait condition<br/>matches?}
    MATCH -->|yes| RESUME2[resume + merge payload into<br/>state.resume_event]
    MATCH -->|no| NOOP[no-op]
  • tick_at(now) — a single scan. Returns a TickReport with counters: scanned, resumed, cancelled, still waiting, errors.
  • run(interval, shutdown_token) — long-running loop; drive from heartbeat or a dedicated tokio task.
  • try_resume_external(flow_id, topic, correlation_id, payload) — called by a NATS subscriber or the CLI when an external event arrives; matches against the flow's persisted wait_json and resumes if it fits.

Correlation ids are caller-chosen strings. Typical pattern: when a flow delegates to another agent via agent.route.<target_id>, include the flow's id or a fresh UUID as the correlation id, and have the receiver echo it on reply.

Agent-facing tool

Single taskflow tool with dispatch by action:

ActionParamsResult
startcontroller_id, goal, optional current_step (default "init"), optional state{ok, flow} — auto-transitions Created → Running
statusflow_id{ok, flow} or {ok:false, error:"not_found"}
advanceflow_id, optional patch, optional current_step{ok, flow} with merged state
cancelflow_id{ok, flow}
list_mine{ok, count, flows: [...]}

Session tenancy

Every call derives owner_session_key = "agent:<id>:session:<session_id>". The manager rejects any mutation whose owner does not match the flow's — "belongs to a different session" error. Cross-session access from the LLM is not possible.

Revision hidden from the LLM

The tool fetches the flow before every mutation and uses the live revision internally. The LLM never sees or reasons about revision numbers — fewer tokens, fewer mistakes.

CLI

agent flow list          [--json]
agent flow show <id>     [--json]
agent flow cancel <id>
agent flow resume <id>
  • list prints a table sorted by updated_at DESC
  • show prints the flow plus every recorded step
  • cancel calls manager.cancel(id)
  • resume is a manual unblock for Manual or ExternalEvent waits — useful in ops / testing when an expected event never arrived

All commands honor TASKFLOW_DB_PATH (default ./data/taskflow.db).

End-to-end example

From crates/taskflow/tests/e2e_test.rs:

#![allow(unused)]
fn main() {
// 1. Create + run + park.
let f = manager.create_managed(input).await?;
let f = manager.start_running(f.id).await?;
let f = manager.set_waiting(f.id, json!({"kind": "manual"})).await?;

// 2. Process exits. Reopen the SAME database file from a fresh manager.
let reloaded = manager.get(f.id).await?.unwrap();
assert_eq!(reloaded.status, FlowStatus::Waiting);
assert_eq!(reloaded.state_json["verses_done"], 10);  // partial work survived

// 3. Resume picks up where we left off.
let resumed = manager.resume(reloaded.id, None).await?;
assert_eq!(resumed.status, FlowStatus::Running);
}

Shipped shape of CreateManagedInput:

{
  "controller_id": "kate/inbox-triage",
  "goal": "triage inbox",
  "owner_session_key": "agent:kate:session:abc",
  "requester_origin": "user-1",
  "current_step": "classify",
  "state_json": { "messages": 10, "processed": 0 }
}

There is no YAML flow-definition format — flows are built in code (or driven by the taskflow tool's start action).

Garbage collection

store.prune_terminal_flows(retain_days) deletes flows whose terminal state is older than the retention window. Wire this into a scheduled job when your flows pile up — audit trails accumulate forever otherwise.

Gotchas

  • state_json is shallow-merged. Nested updates require the caller to build the full replacement object for the key being changed.
  • revision conflicts retry only twice. If two callers are fighting over a flow continuously, the second persistently surfaces RevisionMismatch — treat that as a signal that you should either serialize at a higher level, or have the loser retry at the app layer.
  • No flow-level mutex. The DB-level UNIQUE (flow_id, run_id) on steps keeps step-observation races safe; revision checks keep mutation races safe. But two observers can read a flow simultaneously — don't rely on read-time consistency for decisions.
  • wait_json is cleared on resume. If you need to remember the wait condition for audit purposes, the flow_events table has it.

Wait / resume

Durable flows can park themselves between steps. The runtime drives parked flows back to Running either on a wall-clock deadline (timer), when an external signal arrives (NATS), or when an operator resumes them by hand (manual).

Two pieces wire this together:

  • WaitEngine — single global tokio task. Every tick_interval it scans Waiting flows and resumes any whose timer has fired or whose cancel intent has been set.
  • taskflow.resume bridge — single broker subscriber that translates incoming events into WaitEngine::try_resume_external calls.

Source: crates/taskflow/src/engine.rs, src/main.rs::spawn_taskflow_resume_bridge.

Wait conditions

The wait_json column on a flow stores one of:

KindShapeResumed by
timer{kind:"timer", at:"<RFC3339>"}WaitEngine.tick() once now >= at
external_event{kind:"external_event", topic:"…", correlation_id:"…"}taskflow.resume bridge with matching (topic, correlation_id)
manual{kind:"manual"}Explicit manager.resume(...) (CLI / ops)

Timer.at is validated by the tool against taskflow.timer_max_horizon (default 30 days). Past deadlines and topics/correlation_ids that are empty are rejected before the flow ever enters Waiting.

Tool actions

The taskflow tool exposes the LLM-facing surface. Beyond the existing start | status | advance | cancel | list_mine, three actions drive the wait/resume lifecycle:

wait

{
  "action": "wait",
  "flow_id": "…uuid…",
  "wait_condition": {"kind": "timer", "at": "2026-04-26T09:00:00Z"}
}

Move flow Running → Waiting. Validates wait_condition shape and guardrails before persisting.

finish

{
  "action": "finish",
  "flow_id": "…uuid…",
  "final_state": {"result": "ok"}
}

Move flow → Finished. final_state (optional) is shallow-merged into state_json before transition.

fail

{
  "action": "fail",
  "flow_id": "…uuid…",
  "reason": "downstream-error"
}

Move flow → Failed. reason is required. The reason is stamped under state_json.failure.reason and recorded in the audit event.

NATS resume bridge

A single subscriber lives at taskflow.resume. Anything that wants to wake a parked flow publishes a JSON message there:

{
  "flow_id": "f5e0…",
  "topic": "agent.delegate.reply",
  "correlation_id": "corr-42",
  "payload": {"answer": 42}
}

The bridge calls WaitEngine::try_resume_external(flow_id, topic, correlation_id, payload). If the flow is Waiting with a matching external_event condition, it resumes; the payload (if any) is merged into state_json.resume_event. Mismatches and unknown flow ids are silent debug logs.

Example with the nats CLI:

nats pub taskflow.resume '{
  "flow_id": "f5e0…",
  "topic": "agent.delegate.reply",
  "correlation_id": "corr-42",
  "payload": {"answer": 42}
}'

Single subject (no flow_id in suffix) is intentional — it keeps the subject namespace flat and avoids per-flow subscription churn. Volume is expected to be low (<10/s); if that ever changes, the bridge can shard internally without protocol changes.

Configuration

config/taskflow.yaml (optional; absent → defaults):

tick_interval: 5s        # WaitEngine cadence
timer_max_horizon: 30d   # max future Timer.at allowed by tool
db_path: ./data/taskflow.db   # also honored via TASKFLOW_DB_PATH

agents.yaml enables the tool per agent:

agents:
  - id: kate
    plugins: [taskflow, memory]

Without taskflow in plugins, the agent does not see the tool — the engine and bridge still run process-wide.

Tick interval guidance

  • 5s (default) is plenty for human-scale timers.
  • Bring it down to 1s only if you have sub-minute timers and care about the worst-case lag.
  • The tick is idempotent and pull-based; missing a tick is harmless.

Telemetry

Each tick logs at debug level when scanned > 0:

DEBUG wait engine tick scanned=3 resumed=1 cancelled=0 still_waiting=2 errors=0

The bridge logs at info on each successful resume:

INFO taskflow resumed via NATS flow_id=… topic=…

Identity & workspace

Every agent has a workspace directory — a small set of markdown files that describe who it is, what it knows, and how it's meant to behave. The runtime loads those files at session start and injects them into the system prompt. The agent reads them; some of them, the agent also writes back to.

Source: crates/core/src/agent/workspace.rs, crates/core/src/agent/self_report.rs.

Workspace files

<workspace>/
├── IDENTITY.md        # 10.1 — persona facts (name, vibe, emoji)
├── SOUL.md            # 10.2 — prompt-like character document
├── USER.md            # who the human is (if single-user)
├── AGENTS.md          # peers this agent knows about
├── MEMORY.md          # 10.3 — self-curated facts index
├── DREAMS.md          # dreaming diary (10.6)
├── notes/             # per-day notes
└── .git/              # 10.9 — per-agent repo for forensics

Configured per agent:

agents:
  - id: kate
    workspace: ./data/workspace/kate
    workspace_git:
      enabled: true

IDENTITY.md (phase 10.1)

Short, structured. Five optional fields parsed from a markdown bullet list:

- **Name:** Kate
- **Creature:** octopus
- **Vibe:** warm but sharp
- **Emoji:** 🐙
- **Avatar:** https://.../kate.png

The parser:

  • Silently skips template placeholders in parens (e.g. _(pick something)_) so the bootstrap template never leaks into the persona
  • Produces an AgentIdentity { name, creature, vibe, emoji, avatar } struct, all fields Option<String>

Rendered into the system prompt as a single line:

# IDENTITY
name=Kate, emoji=🐙, vibe=warm but sharp

SOUL.md (phase 10.2)

Free-form markdown. No parsing. Injected verbatim after the IDENTITY block. This is where long-form character, operating principles, tone, and hard rules live.

Loaded on every session start. Main and shared sessions both see SOUL.md — the privacy boundary is MEMORY.md, not SOUL.md (shared groups should never leak private memories, but the persona is fine to surface).

MEMORY.md (phase 10.3)

The agent's self-curated index of things it remembers. Markdown sections with bullet lists — no special schema:

## People

- Luis prefers Spanish but is fine switching to English.
- Ana uses a Samsung, not an iPhone.

## Dreamed 2026-04-23 03:00 UTC

- User's timezone is America/Bogota _(score=0.42, hits=5, days=3)_
- Prefers short replies on WhatsApp _(score=0.38, hits=4, days=2)_

## Open questions

- What phone carrier does Luis use?

Scope rules:

  • Loaded only in main (DM-style) sessions. Group and broadcast sessions never see MEMORY.md — per-user facts must not leak into multi-user chats.
  • Appended automatically by dreaming sweeps (Phase 10.6)
  • Truncation: 12 000 chars per file cap (whole workspace total budget: 60 000 chars). Exceeding files get a [truncated] marker.

USER.md and AGENTS.md

  • USER.md — who this agent is talking to. Loaded in main sessions only.
  • AGENTS.md — which peers this agent can delegate to. Pairs with allowed_delegates in agents.yaml.

Both are free-form markdown read into the prompt.

Transcripts (phase 10.4)

Per-session, append-only JSONL files in transcripts_dir:

{"type":"session","version":1,"id":"<uuid>","timestamp":"2026-04-24T...","agent_id":"kate","source_plugin":"telegram"}
{"type":"entry","timestamp":"...","role":"user","content":"hello","message_id":"...","source_plugin":"telegram","sender_id":"user123"}
{"type":"entry","timestamp":"...","role":"assistant","content":"hello Luis","source_plugin":""}
  • One file per session at <transcripts_dir>/<session_id>.jsonl
  • No time-based rotation (session close = file close)
  • First line is a session header with metadata, every subsequent line is a turn

Transcripts are write-only from the runtime's point of view — they're for replay, audit, and human review, not read-back into the prompt.

Self-report tools (phase 10.8)

Four tools let the agent inspect its own state:

ToolReturnsUse
who_am_i{agent_id, model, workspace_dir, identity{…}, soul_excerpt}When asked "who are you?"
what_do_i_know{sections: [{heading, bullets}], truncated} with optional filterSearch MEMORY.md by section name
my_stats{sessions_total, memories_stored, memories_promoted, last_dream_ts, recall_events_7d, top_concept_tags_7d, workspace_files_present}Meta-awareness
session_logs{ok, sessions/entries/hits, …} — actions: list_sessions, read_session, search, recentInspect own JSONL transcripts for self-reflection, debugging, cross-session search

The first three return concise JSON designed for the LLM to consume in one turn. Soul excerpt in who_am_i is truncated to 2 048 chars; what_do_i_know caps at 6 144 bytes serialized with at most 10 bullets per section.

session_logs is registered automatically when the agent has a non-empty transcripts_dir. It is scoped to that directory — agents cannot read each other's transcripts. Default limits: 50 entries per call (max 500), 200 chars per content preview (max 4 000). When recent is invoked without session_id, it defaults to the current session. If the agent's allowed_tools patterns exclude session_logs, it is filtered after registration like every other tool.

Load flow

flowchart TD
    SESSION[new session] --> LOADER[WorkspaceLoader.load scope]
    LOADER --> SCOPE{scope}
    SCOPE -->|Main| FULL[load IDENTITY + SOUL + USER +<br/>AGENTS + daily notes + MEMORY]
    SCOPE -->|Shared| SHARED[load IDENTITY + SOUL +<br/>AGENTS only]
    FULL --> TRUNC[enforce 12k/file, 60k total]
    SHARED --> TRUNC
    TRUNC --> RENDER[render_system_blocks<br/>into prompt]
    RENDER --> PROMPT[# IDENTITY<br/># SOUL<br/># USER<br/># AGENTS<br/># MEMORY]

Next

  • MEMORY.md — write cadence and promotion rules
  • Dreaming — how sleeps turn recall signals into MEMORY.md entries

MEMORY.md + recall signals + workspace-git

This page covers everything about how what the agent knows evolves over time: the MEMORY.md index, the recall signals that drive dreaming, how concept tags are derived, and how the workspace-git repo captures a full audit history.

For the underlying storage mechanics (tables, queries, vector index), see Memory — long-term.

What goes where

flowchart LR
    subgraph DB[SQLite data/memory.db]
        MEM[memories]
        FTS[memories_fts]
        REC[recall_events]
        PROM[memory_promotions]
    end
    subgraph WS[workspace dir]
        MD[MEMORY.md]
        DRM[DREAMS.md]
        GIT[.git]
    end

    TOOL[memory.remember] --> MEM
    TOOL --> FTS
    MEM -. recall hits .-> REC
    REC --> DRM2[dream sweep]
    DRM2 --> PROM
    DRM2 --> MD
    DRM2 --> DRM
    CHK[forge_memory_checkpoint] --> GIT
    DRM2 --> GIT

Three layers, each with a different update cadence:

LayerWrite triggerConsumer
memories tableAgent calls memory.rememberNext turn's memory.recall
recall_events tableEvery memory.recall hitDream sweep (10.6)
memory_promotions tablePromotion during dreamPrevents double-promote across sweeps
MEMORY.mdDream sweep (10.6)Next session's system prompt (main scope only)
DREAMS.mdDream sweep (10.6)Historical diary for humans + my_stats
.gitDream finish, session close, forge_memory_checkpointmemory_history tool, post-mortem via git log

Recall signals (phase 10.5)

The recall_events table captures every hit of memory.recall:

CREATE TABLE recall_events (
  id         INTEGER PRIMARY KEY AUTOINCREMENT,
  agent_id   TEXT,
  memory_id  TEXT,
  query      TEXT,  -- the search string that surfaced this memory
  score      REAL,  -- relevance score from the recall call
  ts_ms      INTEGER
);

Aggregation over a per-memory window produces the signals struct consumed by dreaming:

SignalMeaning
frequencyLog-normalized count of hits
relevanceMean score across hits
recencyExponential decay from last-hit timestamp
diversityDistinct query strings, normalized (saturates at 5+)
recall_countRaw hit count — used by gates
unique_daysDistinct UTC days the memory was surfaced

Each weighted and summed into the score that drives promotion (see Dreaming).

Concept tags (phase 10.7)

Every memory row has a concept_tags JSON column populated at insert time — not via TF-IDF but via a deterministic pipeline:

  1. Glossary match. Hard-coded list of protected tech terms (multilingual) — backup, openai, migration, etc.
  2. Compound tokens. Regex preserves file paths and identifiers (src/main.rs, camelCaseNames).
  3. Unicode word segmentation. UAX #29 word boundaries split the rest.
  4. Per-token rules:
    • NFKC normalization + lowercase
    • 32-char max; 3-char min for Latin, 2-char min for CJK
    • Reject pure digits, ISO dates, and 100+ shared stop-words across English, Spanish, and path noise
    • Underscores converted to dashes

Output capped at 8 tags per memory. Stored as JSON array on the memories row; expanded into keyword recall searches as part of the FTS5 MATCH query.

Dream sweeps backfill tags for older memories that were created before the tagging pipeline existed.

MEMORY.md write cadence

Dreaming sweeps append blocks:

## Dreamed 2026-04-24 03:00 UTC

- Luis lives in Bogota and prefers Spanish _(score=0.42, hits=5, days=3)_
- Kate should default to short WhatsApp replies _(score=0.38, hits=4, days=2)_
  • One block per sweep
  • Promoted memories shown as bullets with score, hit count, unique days
  • Existing sections preserved; the file is only ever appended to (manual editing by humans is fine — the dream sweep appends a new block rather than rewriting anything)

Privacy rules:

  • MEMORY.md is injected into main-scope sessions only. Groups / broadcasts never see it.
  • transcripts_dir is separate from workspace and is not committed to workspace-git by default.

Workspace-git (phase 10.9)

When workspace_git.enabled: true, the agent's workspace directory is a git repo. Commits happen automatically at three moments:

flowchart LR
    T1[dream sweep finishes] --> C[commit_all promote]
    T2[session close<br/>on_expire callback] --> C2[commit_all session-close]
    T3[forge_memory_checkpoint<br/>tool call] --> C3[commit_all checkpoint:note]
    C --> LOG[.git history]
    C2 --> LOG
    C3 --> LOG

Mechanics (crates/core/src/agent/workspace_git.rs):

  • Staged: every non-ignored file (respects auto-generated .gitignore)
  • Skipped: files larger than 1 MiB (MAX_COMMIT_FILE_BYTES)
  • Idempotent: no-op commit when the tree is clean
  • Author: {agent_id} <agent@localhost> (configurable via workspace_git.author_name / author_email)
  • Auto .gitignore excludes transcripts/, media/, *.tmp, *.swp, .DS_Store
  • No remote configured by default; operators add one if forensic archival matters

Tools that touch git

ToolPurposeReturns
forge_memory_checkpoint(note)Commit right now with checkpoint: <note> subject{ok, oid(short), subject, skipped}
memory_history(limit?, include_diff?)git log of the last limit commits (max 100); optional unified diff oldest→HEAD{commits: [...], diff?}

Good uses of explicit checkpoints:

  • Before a risky update sequence the agent is about to perform
  • After receiving a non-obvious instruction from the user
  • As bookends around a taskflow step boundary

Gotchas

  • MEMORY.md can grow unbounded over years. Workspace-git keeps the history; but the in-prompt view is truncated at 12 KB. Keep an eye on size, prune old ## Dreamed blocks if they stop being useful.
  • Concept-tag derivation is deterministic per content. Editing a memory's content in-place does not re-derive tags — the tags that were computed at insert stick. Re-insert to refresh.
  • git log replays tell the truth. If you're debugging a surprising agent behavior, memory_history --include-diff is the fastest way to see what the agent wrote to itself and when.

Dreaming

"Dreaming" is a scheduled offline sweep that consolidates an agent's memory. It reads recall signals, scores each memory that was recently surfaced, promotes the strongest ones into MEMORY.md, and commits the workspace-git repo.

Source: crates/core/src/agent/dreaming.rs.

When it runs

# agents.yaml
agents:
  - id: kate
    heartbeat:
      enabled: true
      interval: 30s
    dreaming:
      enabled: false
      interval_secs: 86400        # 24 h
      min_score: 0.35
      min_recall_count: 3
      min_unique_queries: 2
      max_promotions_per_sweep: 20
      weights:
        frequency: 0.24
        relevance: 0.30
        recency: 0.15
        diversity: 0.15
        consolidation: 0.10

Dreaming is heartbeat-driven: it ticks inside the heartbeat loop and actually sweeps when interval_secs has elapsed since the last sweep. Disable the heartbeat and dreaming stops firing.

Default interval_secs: 86400 (24 hours). Run nightly or tune down for high-throughput agents.

Three phases (Light / REM / Deep)

Conceptually borrowed from the OpenClaw design, nexo-rs ships Light → Deep:

flowchart LR
    START[sweep tick] --> LIGHT[Light:<br/>gather memories with<br/>>=1 recall event]
    LIGHT --> DEEP[Deep:<br/>score + gate + promote]
    DEEP --> WRITE[append MEMORY.md block]
    WRITE --> DIARY[append DREAMS.md entry]
    DIARY --> GIT[commit workspace]

(REM — thematic summarization with an LLM — is intentionally deferred.)

Scoring

For each candidate memory:

score = w.frequency × frequency
      + w.relevance × relevance
      + w.recency   × recency
      + w.diversity × diversity
      + w.consolidation × consolidation

Where the signals come from recall_events.

Consolidation is a modest bias toward memories that recurred in diverse queries over multiple days — taking the memory from "hit once" to "actually load-bearing."

Gates

A candidate is promoted only if all of these hold:

GateDefaultMeaning
recall_count >= min_recall_count3Surfaced at least 3 times
unique_days >= 11Not all hits on the same day
distinct_queries >= min_unique_queries2More than one query style hit it
score >= min_score0.35Weighted composite over the threshold
!is_promoted(memory_id)Not already promoted in a prior sweep

Up to max_promotions_per_sweep (default 20) promoted per run; ordered by descending score.

Outputs

MEMORY.md append

## Dreamed 2026-04-24 03:00 UTC

- Luis lives in Bogota and prefers Spanish _(score=0.42, hits=5, days=3)_
- Kate should default to short WhatsApp replies _(score=0.38, hits=4, days=2)_

Only memories promoted this sweep appear in the block.

DREAMS.md diary

A longer-form diary entry the agent can read back in my_stats().last_dream_ts context. One per sweep.

Side effects

  • memory_promotions row per promoted memory (prevents double-promote across sweeps)
  • concept_tags backfilled on older memories that were created before the tagging pipeline landed
  • workspace_git.commit_all("promote", <body with delta>) captures the full change

Idempotency

Re-running a sweep during the same interval is a no-op:

  • Promotions consult memory_promotions before writing
  • MEMORY.md is appended to, not rewritten
  • Git commit returns cleanly with skipped: true when the tree is unchanged

You can safely call a manual "dream now" during a stuck session (currently via restart with a lowered interval_secs) without corrupting state.

Safety rails

  • Shutdown cancellation. Dream sweeps run under a cancellation token tied to the shutdown sequence. Partial sweeps don't leave inconsistent state — the atomic trio (DB row + MEMORY.md append
    • git commit) runs after all candidates are scored and gated.
  • Heartbeat-only. Dreaming never fires from a user message turn, so a long sweep cannot block a user response.
  • Read-mostly. Sweep reads from recall_events; the only writes are memory_promotions, MEMORY.md append, DREAMS.md append, and git commit. Existing memory rows are untouched except for tag backfill.

What dreaming is not

  • Not a summarizer. It does not rewrite content.
  • Not a deduplicator. Two similar memories remain two memories; the recall layer will simply surface both and let the LLM pick.
  • Not an LLM call. The whole sweep is deterministic — no model inference, no per-sweep cost.

Tuning

SituationChange
Memories stay too cold to promoteLower min_score (e.g. 0.25)
Too many noise promotionsRaise min_recall_count to 5
MEMORY.md grows too fastLower max_promotions_per_sweep
Very chatty agentIncrease interval_secs — 24 h is already safe

Observability

Every sweep emits a summary log line with:

  • candidates scanned
  • candidates promoted
  • skipped (already promoted)
  • score range of the promoted set
  • workspace-git commit OID (or "clean tree")

Wire it into Prometheus via log scraping if you want time-series counters — no dedicated metric is exposed yet.

Gotchas

  • Turning dreaming on with min_score default produces a long first sweep. If the agent has been running for weeks without dreaming, there are a lot of candidates. Expect the first sweep to promote near the cap and subsequent sweeps to tail off.
  • Concept-tag backfill is O(candidates). Large backlogs will show first-sweep latency proportional to the candidate count. Not a bug — run the first sweep in a maintenance window if the backlog is large.
  • interval_secs is measured from last completed sweep. A failed sweep does not reset the clock — a retry will fire on the next heartbeat tick regardless.

CLI reference

Single source of truth for every agent subcommand, flag, exit code, and env var. agent is the one binary you'll ever run in production — this is everything it can do.

Source: src/main.rs (Mode enum + parse_args), crates/extensions/src/cli/, crates/setup/src/.

Invocation

agent [--config <dir>] [<subcommand> ...]
  • Arg parser: hand-rolled, not clap. --help / -h work; -c is not an alias for --config (case-sensitive exact match).
  • No subcommand → run the daemon (default).
  • Global flag: --config <dir> (default ./config).

Global environment variables

VariableValuesPurpose
RUST_LOGtracing-subscriber filterLog level (e.g. info,agent=debug). Default info.
AGENT_LOG_FORMATpretty | compact | jsonLog format. Default pretty.
AGENT_ENVproduction (or prod)Triggers JSON logs unless AGENT_LOG_FORMAT overrides.
TASKFLOW_DB_PATHfile pathFlow CLI DB (default ./data/taskflow.db).
CONFIG_SECRETS_DIRdir pathWhitelists an extra root for ${file:...} YAML refs.

Exit codes (generic)

CodeMeaning
0Success
1General failure (not found, config invalid, connection refused)
2Warnings-only outcome (currently only --check-config non-strict)

Ext subcommand has its own richer code table — see below.

Subcommand index

SubcommandPurpose
(default)Run the agent daemon
setupInteractive credential wizard
statusQuery running agent instances
dlqDead-letter queue inspection
extExtension management
flowTaskFlow operations
mcp-serverRun as MCP stdio server
adminRun the web admin UI behind a Cloudflare quick tunnel
reloadTrigger config hot-reload on a running daemon
--check-configPre-flight config validation
--dry-runLoad config and print the plan

Daemon (default)

agent [--config ./config]

Boots every configured agent runtime, connects to the broker (NATS or local fallback), starts metrics (:9090), health (:8080), and admin (:9091 loopback) servers.

Exit codes:

  • 0 — clean shutdown via SIGTERM / Ctrl+C
  • 1 — config load failed, broker unreachable at startup, plugin failed to initialize

Logs to: stderr. See Logging.


setup

Interactive credential wizard. Launches a prompt-driven flow for every service you want to enable — LLM keys, WhatsApp QR, Telegram bot token, Google OAuth, etc.

agent setup                    # full interactive wizard
agent setup list               # list installable service ids
agent setup <service>          # configure one service (e.g. minimax, whatsapp)
agent setup doctor             # validate every credential / token (also runs the Phase 70.6 pairing-store audit)
agent setup telegram-link      # print Telegram bot link-to-chat URL

Exit codes: 0 on completion; 1 on error.

See Setup wizard for the step-by-step.


status

Query the running daemon via the loopback admin console.

agent status                                   # every agent, table
agent status ana                               # one agent, table
agent status --json                            # raw JSON
agent status --endpoint http://remote:9091     # override endpoint

Table output columns: ID | MODEL | BINDINGS | DELEGATES | DESCRIPTION

Exit codes:

  • 0 — query succeeded
  • 1 — endpoint unreachable or agent id not found

dlq

Dead-letter queue inspection. See DLQ operations for the full picture.

agent dlq list                 # plain-text table, up to 1000 entries
agent dlq replay <id>          # move back to pending_events for retry
agent dlq purge                # drop every entry (destructive)

Exit codes: 0 success; 1 failure (entry not found, DB error).

list columns: id | topic | failed_at | reason.


ext

Extension management. See Extensions — CLI for details and workflows.

agent ext list                         [--json]
agent ext info <id>                    [--json]
agent ext enable <id>
agent ext disable <id>
agent ext validate <path>
agent ext doctor                       [--runtime] [--json]
agent ext install <path>               [--update] [--enable] [--dry-run] [--link] [--json]
agent ext uninstall <id> --yes         [--json]

Flags:

FlagWherePurpose
--jsonlist / info / doctor / install / uninstallMachine-readable output
--runtimedoctorAlso spawn stdio extensions to verify handshake
--updateinstallOverwrite if already installed
--enableinstallFlip to enabled: true in extensions.yaml
--linkinstallSymlink source (absolute path required) instead of copy
--dry-runinstallValidate without writing
--yesuninstallRequired confirmation

Exit codes (extension-specific):

CodeMeaning
0Success
1Extension not found / --update target missing
2Invalid manifest / invalid source / --link needs absolute path
3Config write failed
4Invalid id (reserved or empty)
5Target exists (use --update)
6Id collision across roots
7uninstall missing --yes confirmation
8Copy / atomic swap failed
9Runtime check(s) failed (doctor --runtime)

flow

TaskFlow operations. See TaskFlow — FlowManager.

agent flow list                [--json]
agent flow show <id>           [--json]
agent flow cancel <id>
agent flow resume <id>

Env var: TASKFLOW_DB_PATH (default ./data/taskflow.db).

Exit codes: 0 success; 1 on error (flow not found, wrong state, DB inaccessible).

list sorts by updated_at DESC; show includes every recorded step; resume only works on Manual or ExternalEvent waits.


mcp-server

Run the agent as an MCP stdio server so MCP clients (Claude Desktop, Cursor, Zed) can consume its tools.

agent mcp-server
  • Reads JSON-RPC from stdin, writes responses to stdout
  • Does not boot a daemon or broker
  • Requires config/mcp_server.yaml with enabled: true

Exit codes: 0 on clean exit; 1 if mcp_server.yaml disabled.

See MCP — Agent as MCP server for deployment recipes (Claude Desktop config, allowlist, auth token).


admin

Run the web admin UI behind a fresh Cloudflare quick tunnel. A new ephemeral trycloudflare.com URL is minted on every launch — no account, no DNS, no TLS setup.

agent admin                  # listen on 127.0.0.1:9099 (default)
agent admin --port 9199      # pick a different loopback port
agent admin --port=9199      # same thing, equals form

What happens on launch:

  1. Install cloudflared if missing. The tunnel crate detects the host OS/arch and downloads the matching cloudflared binary into the platform data dir. Subsequent launches reuse the cached copy.
  2. Mint a fresh random password. 24 URL-safe characters from the OS RNG. Printed once to stdout — copy it now; there is no recovery short of relaunching agent admin.
  3. Start a loopback HTTP server. Listens on 127.0.0.1:<port> and serves the React bundle embedded at Rust compile time (see admin-ui/) behind HTTP Basic Auth. A bundle-missing fallback page is served if admin-ui/dist/ was empty when cargo build ran.
  4. Open a quick tunnel. cloudflared tunnel --url http://127.0.0.1:<port> returns an ephemeral https://…trycloudflare.com URL, which the command prints to stdout alongside the username (admin) and the freshly-minted password.
  5. Wait for Ctrl+C / SIGTERM. Graceful shutdown kills the cloudflared child and stops the HTTP listener.

Exit codes:

  • 0 — clean shutdown
  • 1 — cloudflared install failed, port already bound, or tunnel negotiation failed

Notes:

  • URL is re-generated every launch. If you need a stable URL, switch to a named Cloudflare tunnel (requires an account and wrangler config — out of scope for this command).
  • Auth is HTTP Basic for now; the browser prompts for admin / <password> on first load. Username is fixed; password is fresh every launch. Keep the shell scrollback if you need to re-paste it.
  • The password is never persisted — losing it means stopping agent admin and starting again (which also rotates the tunnel URL).

reload

Triggers a config hot-reload on a running daemon. Publishes control.reload on the broker the daemon is listening to (resolved from broker.yaml), subscribes-before-publish to control.reload.ack, waits up to 5 s, and prints the outcome.

agent reload                 # human-readable summary
agent reload --json          # serialized ReloadOutcome

Example output:

$ agent reload
reload v7: applied=2 rejected=0 elapsed=18ms
  ✓ ana
  ✓ bob

Exit codes:

  • 0 — at least one agent reloaded
  • 1 — no ack within 5 s (daemon not running)
  • 2 — every agent rejected

Full semantics — what's reloaded, apply-on-next-message, failure modes — in Config hot-reload.


--check-config

Pre-flight validation. Loads every YAML file, resolves env vars, checks schema, validates credentials. No broker, no daemon. Meant for CI.

agent --check-config                    # warnings-only mode
agent --check-config --strict           # warnings become errors

Exit codes:

  • 0 — all clear
  • 1 — hard errors (missing required creds, invalid schema)
  • 2 — warnings only (non-strict mode)

--dry-run

Load the config and print a plan. Doesn't connect to the broker or start any runtime task.

agent --dry-run
agent --dry-run --json

Output (plain text):

  • Config directory
  • Broker kind (nats | local)
  • Plugin list
  • Agent directory table (id, model, bindings, delegates, description)

Exit codes: 0 valid; 1 on error.

Daemon admin endpoints

Reference for status --endpoint and anyone wiring a custom dashboard:

EndpointMethodBindPurpose
/admin/agentsGET127.0.0.1:9091List every agent (JSON)
/admin/agents/<id>GET127.0.0.1:9091Single agent (JSON)
/admin/tool-policyGET127.0.0.1:9091Tool policy queries
/admin/credentials/reloadPOST127.0.0.1:9091Phase 17 — re-read agents/plugins YAML and atomically swap the credential resolver. Returns ReloadOutcome JSON. See config/credentials.md.
/healthGET0.0.0.0:8080Liveness probe
/readyGET0.0.0.0:8080Readiness probe
/metricsGET0.0.0.0:9090Prometheus
/whatsapp/pair*GET0.0.0.0:8080WhatsApp pairing QR (first instance)
/whatsapp/<instance>/pair*GET0.0.0.0:8080Multi-instance WhatsApp pairing

Gotchas

  • Hand-rolled parser. Unexpected flag ordering can produce "unknown argument" errors that are less forgiving than clap-based CLIs. Stick to the form shown in each subcommand.
  • Global --config must come before the subcommand. agent --config ./x ext list works; agent ext list --config ./x does not.
  • Admin console is loopback-only. status --endpoint against a remote host requires a tunnel; it won't listen publicly.

Docker

Production deployment as a compose stack: nats broker + nexo runtime, Docker secrets for credentials, persistent volumes for SQLite data and the disk queue.

Source: docker-compose.yml, Dockerfile, config/docker/.

Pre-built image at GHCR

Every push to main and every v* tag publishes a multi-arch image (linux/amd64 + linux/arm64) at:

ghcr.io/lordmacu/nexo-rs:latest          # latest tagged release
ghcr.io/lordmacu/nexo-rs:v0.1.1          # exact version
ghcr.io/lordmacu/nexo-rs:edge            # latest main commit
ghcr.io/lordmacu/nexo-rs:main-<sha>      # pinned to a specific commit

Pull and run:

docker pull ghcr.io/lordmacu/nexo-rs:latest
docker run --rm \
  -v $(pwd)/config:/app/config:ro \
  -v $(pwd)/data:/app/data \
  -p 8080:8080 -p 9090:9090 \
  ghcr.io/lordmacu/nexo-rs:latest

Build pipeline: .github/workflows/docker.yml. Tags + labels follow OCI image spec and are generated by docker/metadata-action. Image carries SBOM and SLSA provenance attestations (verify with docker buildx imagetools inspect).

Compose layout

flowchart LR
    subgraph STACK[docker-compose]
        NATS[nats:2.10<br/>:4222 client<br/>:8222 monitoring]
        AG[nexo<br/>:8080 health<br/>:9090 metrics]
    end
    AG --> NATS

    VOL1[(./config RO)] --> AG
    VOL2[(./data RW)] --> AG
    VOL3[(./extensions RO)] --> AG
    SEC[/run/secrets/...] --> AG

    IDE[MCP clients] -.->|port 8080| AG
    PROM[Prometheus] -.->|port 9090| AG

docker-compose.yml

Two services, healthchecks on both, shared volumes:

  • natsnats:2.10-alpine, exposes :4222 for agent clients and :8222 for monitoring (healthcheck hits :8222/healthz)
  • nexo — the main runtime
    • Ports: :8080 (health), :9090 (metrics)
    • Environment: RUST_LOG=info, AGENT_ENV=production
    • shm_size: 1gb — required for Chrome processes (browser plugin)
    • Bind mounts: ./config:/app/config:ro, ./data:/app/data:rw, ./extensions:/app/extensions:ro
    • depends_on: { nats: { condition: service_healthy } }

Dockerfile

Multi-stage:

  1. Builder — Rust cargo build --release --locked
  2. Runtimedebian:bookworm-slim with operational tools baked in:
    • ca-certificates, libsqlite3-0
    • Python + ffmpeg + tmux + yt-dlp + tesseract (for skills that need them)
    • Google Chrome on amd64 (OAuth + Widevine work); falls back to Chromium on arm64
    • cloudflared (downloaded per TARGETARCH at build time)
    • dumb-init as PID 1

Entry point: /usr/local/bin/nexo --config /app/config.

Exposed ports: 8080, 9090.

Config overrides — config/docker/

Mirrors the main config layout. The compose service mounts the production overrides path:

command: ["nexo", "--config", "/app/config/docker"]

Key differences in the docker overrides:

  • broker.yaml — NATS URL points at the Docker service name (nats://nats:4222); persistence at /app/data/queue/broker.db
  • llm.yaml — reads API keys from /run/secrets/<name>
  • Other files (agents.yaml, memory.yaml, extensions.yaml) override defaults for container paths

Secrets

The compose file declares Docker secrets and the config overrides reference them:

services:
  nexo:
    secrets:
      - minimax_api_key
      - minimax_group_id
      - google_client_id
      - google_client_secret
secrets:
  minimax_api_key:
    file: ./secrets/minimax_api_key.txt
  minimax_group_id:
    file: ./secrets/minimax_group_id.txt
  ...

Config reads them via the ${file:/run/secrets/...} syntax. Secrets appear as mode-0400 files inside the container — nothing ever touches env vars.

See Configuration — layout.

Operating the stack

docker compose up -d           # start
docker compose logs -f nexo   # follow logs
docker compose exec nexo nexo ext list
docker compose exec nexo nexo dlq list
docker compose restart nexo   # rolling reload (SIGTERM → 5 s grace)
docker compose down            # stop (preserves volumes)

Scaling

  • Horizontal scaling needs an external NATS cluster. Running the compose with two agent replicas pointed at a single NATS server works for isolated workloads but duplicate-delivery across agents on the same topic is not avoided by the compose itself — the single-instance lockfile (see Fault tolerance) assumes one agent process per data directory.
  • For real scale: one NATS cluster + N agent processes, each with its own ./data/ volume.

Health checks for orchestration

services:
  nexo:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://127.0.0.1:8080/ready"]
      interval: 10s
      timeout: 3s
      retries: 3
      start_period: 30s

Readiness gate is /ready (covered in metrics + health). start_period needs to cover first-boot extension discovery + all agent runtimes attaching to their topics.

Gotchas

  • Volume ownership. Don't mount ./data as root-owned if your container runs as non-root. The runtime will fail to write the SQLite files and you'll only see cryptic readonly database errors.
  • Chrome needs /dev/shm space. The shm_size: 1gb is not optional when the browser plugin is active — Chrome processes silently corrupt their state if starved.
  • config/docker/ is committed, secrets are not. ./secrets/ is gitignored. Populate it before the first compose up.

Metrics & health

Prometheus metrics on :9090/metrics, health/readiness on :8080, admin console on 127.0.0.1:9091. Everything an operator or orchestrator needs to decide "is the agent healthy?" without reading logs.

Source: crates/core/src/telemetry.rs, src/main.rs.

Ports at a glance

PortBindingPurpose
:90900.0.0.0Prometheus /metrics scrape
:80800.0.0.0Health /health, readiness /ready, WhatsApp pairing pages
:9091127.0.0.1Admin console (loopback only)

Ports are not configurable yet — if you need to remap, port-forward outside the agent (Docker, k8s service).

/metrics (Prometheus)

Exposed metrics:

NameTypeLabelsWhat
llm_requests_totalcounteragent, provider, modelEvery LLM completion request
llm_latency_mshistogramagent, provider, modelBuckets 50, 100, 250, 500, 1000, 2500, 5000, 10000 ms
messages_processed_totalcounteragentInbound messages that reached an agent
nexo_extensions_discoveredcounterstatus={ok,disabled,invalid}Emitted on every discovery sweep
nexo_tool_calls_totalcounteragent, outcome={ok,error,blocked,unknown}, toolTool invocations
nexo_tool_cache_events_totalcounteragent, event={hit,miss,put,evict}, toolTool-level memoization
nexo_tool_latency_mshistogramagent, toolPer-tool latency
circuit_breaker_stategaugebreaker0 = Closed, 1 = Open; always includes nats
credentials_accounts_totalgaugechannelPer-channel labelled instance count (Phase 17)
credentials_bindings_totalgaugeagent, channel1 when the agent has a credential bound, 0 otherwise
channel_account_usage_totalcounteragent, channel, direction={inbound,outbound}, instanceEvery credential use
channel_acl_denied_totalcounteragent, channel, instanceOutbound calls rejected by allow_agents
credentials_resolve_errors_totalcounterchannel, reasonResolver failures (unbound, not_found, not_permitted)
credentials_breaker_stategaugechannel, instance0=closed, 1=half-open, 2=open. Per-(channel, instance) circuit breaker — a 429 from one number cannot trip the breaker for a sibling account.
credentials_boot_validation_errors_totalcounterkindGauntlet errors by kind at boot
credentials_insecure_paths_totalgaugeCredential files with lax permissions at boot
credentials_google_token_refresh_totalcounteraccount_fp, outcome={ok,err}Google OAuth refresh attempts (fp = sha256[..8], not raw email)
pairing_inbound_challenged_totalcounterchannel, result={delivered_via_adapter,delivered_via_broker,publish_failed,no_adapter_no_broker_topic}DM-challenge dispatch attempts (Phase 26.x)
pairing_approvals_totalcounterchannel, result={ok,expired,not_found}nexo pair approve outcomes (Phase 26.y)
pairing_codes_expired_totalcounterSetup codes pruned past TTL or rejected as expired on approve
pairing_bootstrap_tokens_issued_totalcounterprofileBootstrap tokens minted by BootstrapTokenIssuer::issue
pairing_requests_pendinggaugechannelPending pairing requests (push-tracked; PairingStore::refresh_pending_gauge exposed for drift recovery after a daemon restart)

Circuit-breaker state for the nats breaker is sampled at scrape time from broker readiness, so a stalled publish path shows up in the next scrape without needing an eager push.

The credentials_* and channel_* series are documented with full schema examples in config/credentials.md. account_fp is always an 8-byte sha256 fingerprint of the account id, never the raw JID or email, so scraped metrics stay safe to share.

Useful alerts

LLM provider flapping

- alert: LlmError5xxHigh
  expr: sum(rate(llm_requests_total{outcome="error"}[5m])) by (provider) > 0.1
  for: 5m

NATS circuit open

- alert: NatsBreakerOpen
  expr: circuit_breaker_state{breaker="nats"} == 1
  for: 1m

Tool call failures

- alert: ToolErrorSpike
  expr: |
    sum(rate(nexo_tool_calls_total{outcome="error"}[5m])) by (tool) > 0.5
  for: 10m

Health endpoints

flowchart LR
    GET1[GET /health] --> OK[200 OK<br/>always<br/>{status:ok}]
    GET2[GET /ready] --> CHK{broker ready<br/>AND agents > 0?}
    CHK -->|yes| RDY[200 OK<br/>{status:ready,<br/>agents_running:N}]
    CHK -->|no| NOT[503 Service Unavailable<br/>{status:not_ready,<br/>broker_ready,<br/>agents_running}]
  • GET /health — liveness probe. Returns 200 as long as the process is accepting connections. Don't use this as a traffic gate.
  • GET /ready — readiness probe. Returns 200 only when the broker is ready and at least one agent runtime is attached to inbound topics. Returns 503 during boot, shutdown, or broker outage.
  • GET /whatsapp/* — QR pairing pages and the /whatsapp/pair tunnel endpoint; see WhatsApp plugin.

Kubernetes probes

livenessProbe:
  httpGet: { path: /health, port: 8080 }
  initialDelaySeconds: 10
  periodSeconds: 10
readinessProbe:
  httpGet: { path: /ready, port: 8080 }
  initialDelaySeconds: 30
  periodSeconds: 5

initialDelaySeconds: 30 for readiness covers extension discovery and every agent runtime attaching its subscriptions.

Admin console (:9091)

Loopback-only. Exposes:

PathPurpose
/admin/agentsAgent directory with live status, session counts
/admin/tool-policyQuery the tool-policy registry

The agent status [--endpoint URL] [--agent-id ID] [--json] CLI subcommand hits this endpoint and prints a table or JSON; good for scripting ops without grepping logs.

Remote access requires an explicit tunnel — the port is never exposed publicly by default.

Scrape config sample

# prometheus.yml
scrape_configs:
  - job_name: nexo-rs
    scrape_interval: 15s
    static_configs:
      - targets: ['agent:9090']

For Docker compose: the service name is agent. For k8s: use the service DNS.

Gotchas

  • circuit_breaker_state only labels per-breaker, not per-provider. Multiple LLM providers each have their own breaker instance, but they surface as distinct breaker label values. If you expected {provider="anthropic"} you'll need a label rename in your Prometheus relabel config.
  • Histograms are non-configurable. Buckets are compiled in. If your SLO requires fine-grained buckets below 50 ms, it is worth opening an issue.
  • /ready 503 during shutdown is expected. Don't alert on 5 s of 503 bursts — alert on rate(> 30 s).

Logging

tracing under the hood. Human-readable in dev, JSON in production, always to stderr (stdout is reserved for wire protocols like MCP JSON-RPC).

Source: src/main.rs::init_tracing.

Quick reference

Env varDefaultMeaning
RUST_LOGinfoEnvFilter syntax (nexo_core=debug,async_nats=warn,*=info)
AGENT_LOG_FORMATpretty (json in AGENT_ENV=production)pretty | compact | json
AGENT_ENVunsetSet to production to default to JSON logs

Levels

Pick the lowest verbosity that still surfaces the signal you care about:

LevelUse
errorUnrecoverable — operator action needed
warnDegraded but running (circuit open, retry budget burning)
infoLifecycle (startup, shutdown, reconnects)
debugPer-turn detail (tool invoked, session created)
tracePer-event firehose — only when chasing a bug

Log formats

pretty (dev default)

Coloured, multi-line. Good at the terminal, bad in log pipelines.

2026-04-24T17:22:13Z  INFO agent::runtime: agent runtime ready
    at src/main.rs:1243
    in agent_boot with agent="ana"

compact

One line per event. Middle ground.

2026-04-24T17:22:13Z INFO agent="ana" agent runtime ready

json

Structured. One JSON object per line. Default when AGENT_ENV=production.

{"ts_unix_ms":1714000000000,"level":"INFO","target":"agent::runtime","thread_id":"ThreadId(3)","file":"src/main.rs","line":1243,"spans":[{"name":"agent_boot","agent":"ana"}],"message":"agent runtime ready"}

Every entry carries:

  • ts_unix_ms — milliseconds since epoch (stable for ingestion)
  • level, target
  • thread_id, file, line — for pinpointing
  • spans — span hierarchy with attached fields
  • Any structured fields passed via tracing::info!(agent = %id, ...)

Correlating across agents

Cross-agent work lands on agent.route.<target_id> with a correlation_id. In logs, the correlation id shows up as a field on every event that happened inside a delegation span.

flowchart LR
    A[agent A<br/>info: tool_call agent.route.ops] --> MSG[NATS message<br/>correlation_id=req-123]
    MSG --> B[agent B<br/>info: handling agent.route with correlation_id=req-123]
    B --> REPLY[reply on agent.route.A<br/>correlation_id=req-123]
    REPLY --> A2[agent A<br/>info: delegation returned correlation_id=req-123]

Grep logs by correlation_id to see the whole fan-out+in as a single thread.

Structured-field conventions

Convention for fields that show up across the codebase:

FieldWhere
agentAny log tied to a specific agent runtime
sessionAny log inside a session context (usually UUID)
extension (or ext)Any log from extension runtimes
toolAny tool invocation log
provider, modelLLM client logs
correlation_idDelegation-related logs
topicBroker publish/subscribe logs

When adding new code, reuse these names — log pipelines can count on them.

Where stdout goes

stdout is reserved for:

  • MCP server mode (agent mcp serve) — JSON-RPC traffic
  • CLI subcommands that return data (agent ext list --json, agent flow show --json, agent dlq list)

Everything else, including normal log output, goes to stderr. Don't pipe agent … 2>&1 | jq unless you know the subcommand never writes non-JSON to stdout.

Practical setups

Local dev

export RUST_LOG=agent=debug,nexo_core=debug,info
cargo run --bin agent -- --config ./config

Production (Docker)

services:
  agent:
    environment:
      AGENT_ENV: production
      RUST_LOG: info,async_nats=warn

Everything lands on stderr → container runtime picks it up → your log pipeline ingests JSON directly.

Chasing a specific agent

export RUST_LOG=agent=info
# then grep by field
docker compose logs agent | jq 'select(.spans[].agent == "ana")'

Gotchas

  • tracing is compile-time filtered. If you grep logs for a debug-level event and see nothing, verify RUST_LOG covers the module.
  • JSON mode drops ANSI colors. Rightly so — but don't pipe it through a TTY colorizer and then be confused by escape sequences.
  • stderr ordering isn't guaranteed against stdout. Never assume a log line printed right after a println! happens in log order — pipes buffer independently.

Dead-letter queue operations

The DLQ is where events end up when they exhaust their retry budget or fail to deserialize at all. The runtime never silently drops an event — if it can't be delivered, it lands here for an operator to inspect or replay.

Source: crates/broker/src/disk_queue.rs, src/main.rs (agent dlq ... subcommands).

When items land there

flowchart LR
    PUB[publish event] --> NATS{NATS up?}
    NATS -->|yes| OK[delivered]
    NATS -->|no| DQ[pending_events]
    DQ --> DRAIN[disk queue drain]
    DRAIN -->|attempts < 3| DQ
    DRAIN -->|attempts >= 3| DLQ[dead_letters]
    DQ -.->|deserialization error| DLQ
  • 3 attempts (DEFAULT_MAX_ATTEMPTS) without success → row moves to dead_letters
  • Unparseable payload → moves immediately (a poison pill is not worth retrying)
  • Circuit-breaker-open on publish counts as an attempt — if the breaker stays open, the queue will eventually flush into DLQ

See Fault tolerance for the full retry flow.

The DeadLetter row

#![allow(unused)]
fn main() {
struct DeadLetter {
    id: String,          // UUID
    topic: String,       // NATS subject
    payload: String,     // JSON event body
    failed_at: i64,      // unix timestamp (ms)
    reason: String,      // error text
}
}

Storage: SQLite table dead_letters in the broker DB (typically ./data/queue/broker.db).

CLI

agent dlq list              # list up to 1000 entries
agent dlq replay <id>       # move one entry back to pending_events
agent dlq purge             # delete every entry

list output

Columns: id | topic | failed_at | reason. Plain text, one entry per line, suitable for grep / awk piping.

2f9c2e4a-...  plugin.inbound.whatsapp  2026-04-24T17:22:13Z  circuit breaker open
b1a3a9f5-...  plugin.outbound.telegram 2026-04-24T17:23:01Z  deserialization error: unexpected field `...`

replay

Moves the row back to pending_events with attempts = 0:

$ agent dlq replay 2f9c2e4a-...
replayed 2f9c2e4a-... → pending_events (next daemon drain will retry it)

The retry happens on the next drain() cycle of the running agent — replay itself does not attempt delivery. That way a running agent in a different shell picks it up; a stopped agent leaves the event safely in pending_events for its next startup.

purge

Destructive. Drops every row in dead_letters:

$ agent dlq purge
purged 42 dead-letter entries

Use with care — there is no per-topic filter. If you need a scoped purge, inspect with list, selectively replay what you want to keep, then purge the rest.

Exit codes

CodeMeaning
0Success
1Failure (event not found for replay, DB access error, etc.)

Common workflows

Post-outage triage

# See what piled up during the NATS outage
agent dlq list | wc -l

# Spot-check
agent dlq list | head
agent dlq list | awk '{print $2}' | sort | uniq -c

# If reasons look transient (circuit open, timeouts):
agent dlq list | awk '{print $1}' | while read id; do
  agent dlq replay "$id"
done

Poison-pill cleanup

If reason mentions deserialization errors, the payload is malformed — no amount of retry will help. Collect the offenders, fix the producer side, then:

agent dlq list | grep deserialization | awk '{print $1}' > /tmp/poison.txt
# ... verify they're truly poison ...
agent dlq purge

Preview without modifying

The CLI has no --dry-run flag today. Use agent dlq list to preview first; the DB rows are stable until you explicitly replay or purge.

Monitoring

There is no dedicated DLQ metric yet. Approximations:

  • A spike in circuit_breaker_state{breaker="nats"} == 1 time strongly predicts DLQ growth — alert on it.
  • Consider wrapping agent dlq list | wc -l in a cron job that pushes the count to Prometheus via the textfile collector if you want a direct gauge.

Gotchas

  • replay doesn't wake a stopped agent. If no agent is running against the same data directory, the row just moves back to pending_events and waits for the next startup drain.
  • No replay deduplication. Replaying an event that was already successfully delivered later will deliver it again. If your consumer isn't idempotent, spot-check downstream state before replaying.
  • purge is global. Scope it with list | replay selectively if you need to preserve a subset.

Config hot-reload

Operators rotate per-agent knobs (allowlists, model strings, prompts, rate limits, delegation gates) without restarting the daemon. Sessions currently handling a message finish their turn on the old snapshot; the next event picks up the new one (apply-on-next-message). Plugin configs (whatsapp.yaml, telegram.yaml, …) are not hot-reloadable yet — see limitations.

What triggers a reload

TriggerSource
File save under config/notify-based watcher, debounced 500 ms
agent reload CLIPublishes control.reload on the broker
Direct broker publishAny integration can emit control.reload

What's reloaded

Files watched by default (paths relative to the config dir):

  • agents.yaml
  • agents.d/ (recursive)
  • llm.yaml
  • runtime.yaml

Extra paths listed under runtime.reload.extra_watch_paths are appended to the list.

The fields that apply live without a restart:

FieldLocationEffect
allowed_tools (agent + binding)agents.d/*.yamlTool list visible to the LLM + per-call guard
outbound_allowlistsameDefense-in-depth in whatsapp_send_* / telegram_send_*
skillssameSkill blocks rendered into the system prompt
model.model (binding-level)sameLLM model string on next turn
system_prompt + system_prompt_extrasameSystem block composition
sender_rate_limitsamePer-binding token bucket
allowed_delegatessameDelegation ACL
providers.<name>.api_keyllm.yamlRotated via a fresh LlmClient on next turn

Fields that require a restart (logged as warn during reload):

  • id, plugins, workspace, skills_dir, transcripts_dir
  • heartbeat.enabled, heartbeat.interval
  • config.debounce_ms, config.queue_cap
  • model.provider (binding-level provider must match agent provider — the LlmClient is wired once per agent)
  • broker.yaml, memory.yaml, mcp.yaml, extensions.yaml

Adding or removing an agent also requires a restart in this release; see limitations.

Configuration

config/runtime.yaml is optional. Defaults:

reload:
  enabled: true           # master switch
  debounce_ms: 500        # notify-debouncer-full window
  extra_watch_paths: []   # appended to the built-in list

Set enabled: false to turn off the file watcher + the control.reload subscriber. The CLI agent reload still works — the daemon never opens a privileged socket, it just listens on the shared broker.

The reload pipeline

file save / CLI / broker
        │
        ▼
  debouncer (500 ms)
        │
        ▼
  AppConfig::load (YAML + env resolution)
        │
        ▼
  validate_agents_with_providers  ──fail──▶  log warn, bump
        │                                    config_reload_rejected_total,
        ▼                                    keep old snapshot
  RuntimeSnapshot::build (per agent)
        │
        ▼
  ArcSwap::store  (atomic per agent)
        │
        ▼
  events.runtime.config.reloaded

Validation failure never swaps. The daemon always serves a snapshot that passed its boot gauntlet.

CLI

# Human-readable output
$ agent reload
reload v7: applied=2 rejected=0 elapsed=18ms
  ✓ ana
  ✓ bob

# Machine-readable
$ agent reload --json
{
  "version": 7,
  "applied": ["ana", "bob"],
  "rejected": [],
  "elapsed_ms": 18
}

Exit codes:

  • 0 — at least one agent reloaded.
  • 1 — no control.reload.ack within 5 s (daemon not running).
  • 2 — every agent rejected (partial-fail signal for CI).

Broker contract

TopicDirectionPayload
control.reload→ daemon{requested_by: string}
control.reload.ack← daemonserialized ReloadOutcome

ReloadOutcome JSON shape:

{
  "version": 7,
  "applied": ["ana", "bob"],
  "rejected": [
    {"agent_id": "ana", "reason": "snapshot build: ..."}
  ],
  "elapsed_ms": 18
}

Telemetry

MetricTypeLabels
config_reload_applied_totalcounter
config_reload_rejected_totalcounter
config_reload_latency_mshistogram
runtime_config_versiongaugeagent_id

Scrape via the metrics endpoint (ops/metrics).

Apply-on-next-message semantics

A reload does not interrupt sessions that are currently handling a message. Specifically:

  • The LLM turn in flight keeps its captured Arc<RuntimeSnapshot> for the life of the turn — tool calls inside that turn all see the same policy, even if several reloads land during the turn.
  • The next event delivered to the agent reads the latest snapshot via snapshot.load() on the intake hot path.

If you need a "force-apply now" semantic (terminate in-flight sessions, respawn), use agent reload --kick-sessionsnot implemented yet, tracked in Phase 19.

Security model

  • control.reload topic has no application-level auth. Anyone with broker publish rights can trigger a reload. In production with NATS, restrict the control.> subject pattern via NATS account permissions; see NATS with TLS + auth. The local-broker fallback is in-process only — no remote attack surface.
  • File-watcher trust = filesystem write. Whoever can edit config/agents.d/*.yaml can change capability surface. Treat the config dir as a privileged resource: 0600 on YAML files, 0700 on the directory.
  • events.runtime.config.reloaded payload includes agent ids and rejection reasons. Subscribers see them. Single-process deployments are fine; in multi-tenant setups, gate the events.runtime.> pattern in NATS auth.
  • Outbound allowlist scope. The Phase 16 outbound allowlist governs WhatsApp + Telegram tools only. Google tools are gated by the OAuth scopes granted at credential creation (see Per-agent credentials) — there is no per-recipient list for Google.
  • Apply-on-next-message and tightening reloads. A reload that narrows an allowlist for security reasons does not affect in-flight sessions until they next receive an event. If you need the change to take effect immediately, restart the daemon (or wait for the upcoming agent reload --kick-sessions flag in Phase 19).

Failure modes

  • Bad YAML: AppConfig::load fails. Old snapshot keeps serving. config_reload_rejected_total bumps. The warn log names the file + line.
  • Validation errors: aggregate — every problem across every agent shows in one warn block. Fix them in one edit instead of restart-and-repeat.
  • Unknown provider: rejected at boot + at reload by KnownProviders check. Boot validation lists what's registered.
  • Missing tool in binding's allowed_tools: caught by the post-registry validation pass during reload.
  • Agent added / removed: Phase 18 rejects these with a clear message; restart the daemon to reshape the fleet.

Limitations

Intentional scope gaps for Phase 18, tracked for Phase 19:

  • Add / remove agent at runtime. The coordinator rejects new ids and left-over registered handles with an actionable message. Restart needed.
  • Plugin config hot-reload (whatsapp.yaml, telegram.yaml, browser.yaml, email.yaml). Plugin daemons own I/O (QR pairing, long-polling). Reshaping them live requires a dedicated lifecycle refactor.
  • config_reloaded hook for extensions to react. Pending.
  • SIGHUP trigger as an extra UX path. Deferred — use the broker topic or the CLI.

See also

Capability toggles

Several bundled extensions ship with dangerous capabilities off by default — write paths, secret reveal, cache purges. Each capability is gated by a single environment variable. The operator flips it on by exporting the var in the agent process's environment.

agent doctor capabilities enumerates every known toggle, its current state, and a hint for enabling it.

$ agent doctor capabilities
Capability toggles
──────────────────────────────────────────────────────────────────
EXT          ENV VAR                       STATE     RISK     EFFECT
onepassword  OP_ALLOW_REVEAL               disabled  HIGH     Reveal raw secret values…
onepassword  OP_INJECT_COMMAND_ALLOWLIST   disabled  HIGH     Allow `inject_template` to pipe…
cloudflare   CLOUDFLARE_ALLOW_WRITES       disabled  HIGH     Create / update / delete DNS…
cloudflare   CLOUDFLARE_ALLOW_PURGE        disabled  CRITICAL Purge zone cache…
docker-api   DOCKER_API_ALLOW_WRITE        disabled  HIGH     Start / stop / restart…
proxmox      PROXMOX_ALLOW_WRITE           disabled  CRITICAL VM / container lifecycle…
ssh-exec     SSH_EXEC_ALLOWED_HOSTS        disabled  HIGH     Allow `ssh_run` against…
ssh-exec     SSH_EXEC_ALLOW_WRITES         disabled  CRITICAL Allow `scp_upload`…

Pass --json for machine-readable output (admin UI, dashboards):

agent doctor capabilities --json

Toggle reference

Env varExtensionKindRiskEffect
OP_ALLOW_REVEALonepasswordboolhighReturns secret values verbatim instead of fingerprints
OP_INJECT_COMMAND_ALLOWLISTonepasswordallowlisthighEnables inject_template exec mode for the listed commands
CLOUDFLARE_ALLOW_WRITEScloudflareboolhighAuthorizes create_dns_record, update_dns_record, delete_dns_record
CLOUDFLARE_ALLOW_PURGEcloudflareboolcriticalAuthorizes purge_cache
DOCKER_API_ALLOW_WRITEdocker-apiboolhighAuthorizes start_container, stop_container, restart_container
PROXMOX_ALLOW_WRITEproxmoxboolcriticalAuthorizes VM/container lifecycle actions
SSH_EXEC_ALLOWED_HOSTSssh-execallowlisthighHosts the agent may target with ssh_run
SSH_EXEC_ALLOW_WRITESssh-execboolcriticalAuthorizes scp_upload

Boolean kinds accept true, 1, or yes (case-insensitive). Anything else — including unset — counts as disabled.

Allowlist kinds are comma-separated. Empty / whitespace-only inputs count as disabled. The agent never falls back to "anything goes" when the variable is unset.

When to enable

The default is off because every toggle moves the agent from "informational" to "consequential" — failures are no longer just a bad reply, they can mutate real systems or leak secrets.

Enable a toggle only when:

  1. The agent will provably need that capability for the next session.
  2. The operator (you) is present and the session is observed.
  3. There is a way to revert quickly — a wrapper script, a per-shell .envrc, or a systemd unit drop-in you can comment out.

Avoid enabling toggles globally in ~/.profile. Scope them to the specific shell or systemd unit that runs the agent.

How to revoke

  • Boolean: unset CLOUDFLARE_ALLOW_WRITES (or restart the shell / service).
  • Allowlist: unset OP_INJECT_COMMAND_ALLOWLIST to disable, or export OP_INJECT_COMMAND_ALLOWLIST= (empty string) to keep the intent visible while still treating the feature as disabled.

The agent reads these on each call (no caching), so revocation is immediate without a restart for most paths. The single exception is OP_INJECT_COMMAND_ALLOWLIST reading happens at tool-call time, not extension-spawn time, so it also picks up changes live.

Adding a new toggle

When a future extension introduces a new write/reveal env var, add a matching CapabilityToggle to crates/setup/src/capabilities.rs::INVENTORY. Without that entry, agent doctor capabilities is silently incomplete — the inventory is the operator-facing source of truth.

Context optimization

Four independent mechanisms reduce the number of tokens sent to the LLM on every request, without changing the agent's behavior. They live under llm.context_optimization in llm.yaml and can be flipped per agent under agents.<id>.context_optimization.

# config/llm.yaml
context_optimization:
  prompt_cache:
    enabled: true                   # default
    long_ttl_providers: [anthropic, vertex]
  compaction:
    enabled: false                  # default off — opt in per agent
    compact_at_pct: 0.75
    tail_keep_tokens: 20000
    tool_result_max_pct: 0.30
    summarizer_model: ""            # empty = reuse the agent's main model
    lock_ttl_seconds: 300
  token_counter:
    enabled: true                   # default
    backend: auto                   # auto | anthropic_api | tiktoken
    cache_capacity: 1024
  workspace_cache:
    enabled: true                   # default
    watch_debounce_ms: 500
    max_age_seconds: 0              # 0 = never force refresh (notify is authoritative)

1. Prompt caching

Materializes the system prompt as a list of cache_control blocks on the Anthropic wire so the stable prefix (workspace + skills + tool catalog + binding glue) is billed at 0.1× input cost on every cache hit. OpenAI / DeepSeek paths surface their automatic prompt_tokens_details.cached_tokens field through the same CacheUsage struct. Gemini and MiniMax flatten the blocks into the legacy system slot today (warned once per process).

Block layout (4 cache breakpoints, the Anthropic max):

  1. workspace — IDENTITY / SOUL / USER / AGENTS / MEMORY (Ephemeral1h)
  2. skills — per-binding skill catalog (Ephemeral1h)
  3. binding_glue — peer directory + per-binding system prompt + language directive (Ephemeral1h)
  4. channel_meta — sender id + per-turn context (Ephemeral5m)

Tools array is sorted alphabetically by name (the registry iterates a non-deterministic DashMap) and the last tool gets a 1h cache_control marker when cache_tools=true.

What to watch

  • llm_cache_read_tokens_total{agent, provider, model} — should dominate llm_cache_creation_tokens_total after the first turn of a warm session.
  • llm_cache_hit_ratio{agent} — target >0.7 on multi-turn agents; <0.3 means you're paying the write premium without the discount.

When to flip off

  • Provider rejects the request with a 400 mentioning cache_control (very old model). Mitigation: the framework already strips markers for claude-2.x; if Anthropic adds another exception, override ANTHROPIC_CACHE_BETA="..." to disable the beta header.
  • A custom-built LLM gateway in front of Anthropic doesn't pass the cache_control field through.

2. Compaction (online history folding)

When the pre-flight token estimate crosses compact_at_pct * effective_window, the agent runs a secondary LLM call to fold history[..tail_start] into a single summary string. The summary replaces the head; the last tail_keep_tokens worth of turns ride forward verbatim. Subsequent turns prepend the summary as a synthetic user/assistant pair so Anthropic's role-alternation rule stays valid.

Defaults are intentionally conservative: off by default. Roll out per agent via agents.<id>.context_optimization.compaction: true.

agents:
  - id: ana
    context_optimization:
      compaction: true   # ana opts in early, others stay off

What to watch

  • llm_compaction_triggered_total{agent, outcome} — outcomes are ok, failed, lock_held, no_boundary, tool_result_truncated.
  • llm_compaction_duration_seconds{agent, outcome="ok"|"failed"} — a rising p99 means the summarizer model is overloaded; lower compact_at_pct so triggers are smaller (cheaper) and more frequent.

When to flip off

  • Quality regression in long sessions — the summary may be losing active-task state. Inspect compactions_v1 rows in the SQLite store to see what was folded; bump tail_keep_tokens so more verbatim context survives.
  • Lock contention spikes — multiple processes (NATS multi-node) racing on the same session. The lock is per-session so this only happens with sticky-session misrouting; fix at the broker level rather than disabling compaction.

Safety nets

  • compaction_locks_v1 carries TTL (lock_ttl_seconds) — a crashed compactor doesn't deadlock the session; the next acquire after the TTL wins automatically.
  • Audit log: every successful compaction inserts a row in compactions_v1 with the summary text + token cost. Inspect with sqlite3 memory.db "SELECT * FROM compactions_v1 WHERE session_id = ? ORDER BY compacted_at DESC".
  • Failure path: 3 retries with backoff; on total failure the original history goes to the LLM unchanged (graceful degradation, never silent data loss).

3. Token counting (pre-flight sizing)

TokenCounter trait with two backends:

  • AnthropicTokenCounter — calls POST /v1/messages/count_tokens. Exact (matches billing). LRU-cached on blake3(payload): the stable tools+identity prefix hashes the same on every turn, so the network round-trip happens ~once per process lifetime.
  • TiktokenCounter — offline cl100k_base approximation. Drift vs Anthropic billing measured at 5–15%. Fine for budget gating, not for hard limits.

The cascade wraps the primary in a CircuitBreaker (failure_threshold=3, 30s→300s backoff): on count_tokens outage the agent loop falls back to tiktoken so the request still goes through. Once the breaker has opened at least once, is_exact() flips to false for the rest of the process so dashboards don't conflate sample populations.

What to watch

  • llm_prompt_tokens_estimated{agent, provider, model} — compare against llm_prompt_tokens_drift{...} (histogram in percent).
  • A drift p99 climbing past 20% means the active backend is wrong for your model — switch from tiktoken to anthropic_api (or vice versa for non-Anthropic providers).

When to flip off

  • The agent runs against a self-hosted gateway that doesn't honor count_tokens. Set backend: tiktoken to skip the round-trip.

4. Workspace bundle cache

Reads of IDENTITY / SOUL / USER / AGENTS / MEMORY MDs go through an in-memory Arc<WorkspaceBundle> cache keyed by (root, scope, sorted extras). A notify-debouncer-full watcher (default 500ms) drops every entry under a workspace root when any *.md changes. Non-MD file changes are ignored.

What to watch

  • workspace_cache_hits_total{path} should dominate workspace_cache_misses_total{path} once the cache is warm.
  • workspace_cache_invalidations_total{path} rising without operator edits points to a tool that writes to the workspace too aggressively.

When to flip off

  • NFS / FUSE filesystems where notify(7) drops events. Set workspace_cache.max_age_seconds: 60 (or similar) to force a refresh after the absolute TTL even without a watch event.

Per-agent overrides

The four enables — and only the enables — can be flipped per agent in agents.yaml. The numeric knobs (compact_at_pct, tail_keep_tokens, watch_debounce_ms, …) stay global to keep the surface narrow.

agents:
  - id: ana
    context_optimization:
      prompt_cache: true
      compaction: true
      token_counter: true
      workspace_cache: true
  - id: bob
    context_optimization:
      prompt_cache: false  # bob runs against a gateway that strips cache_control

Hot-reload behavior

Changing global knobs (llm.yaml) takes effect on the next request once the reload coordinator picks up the file change (Phase 18). For per-agent enables, the override rides on Arc<AgentConfig> inside RuntimeSnapshot and is observed on the next policy_for(...) lookup. The LlmAgentBehavior struct itself still caches its compactor / prompt_cache_enabled fields at construction — toggling those without a process restart requires the future ArcSwap<CompactionRuntime> refactor noted in proyecto/FOLLOWUPS.md.

Rollout playbook

  1. Deploy with everything at defaults — prompt_cache=true, compaction=false, token_counter=true, workspace_cache=true.
  2. Watch llm_cache_hit_ratio for 24h. Expect it to climb to >0.7 on chatty agents; if it stays low, check that the workspace bundle is stable across turns (no MD writes mid-session).
  3. Pick one agent, opt it into compaction (agents.<id>.context_optimization.compaction: true), reload config, watch for a week.
  4. If llm_compaction_triggered_total{outcome="ok"} > 0 and quality feedback is positive, roll compaction out to the rest of the fleet.
  5. If drift on llm_prompt_tokens_drift is consistently <10%, leave token_counter.backend: auto. If higher, consider backend: tiktoken for non-Anthropic providers — saves the round-trip without losing accuracy you didn't have anyway.

Link understanding

When a user message contains URLs, the runtime can fetch them, extract the main text, and inject a # LINK CONTEXT block into the system prompt for that turn. The agent stops saying "I can't see what's at that link" and starts answering against the actual page content.

The feature is off by default. Opt in per agent (and optionally override per binding).

Per-agent config

# config/agents.yaml
agents:
  - id: ana
    link_understanding:
      enabled: true              # default: false
      max_links_per_turn: 3      # cap URLs fetched per message
      max_bytes: 262144          # 256 KiB per response, streamed
      timeout_ms: 8000           # per-fetch HTTP timeout
      cache_ttl_secs: 600        # 0 disables cache
      deny_hosts:                # appended to built-in denylist
        - internal.corp

Built-in denylist (always applied, cannot be removed): localhost, 127.0.0.1, ::1, metadata.google.internal, 169.254.169.254. Defense against SSRF to internal endpoints.

Per-binding override

Per-binding link_understanding overrides the agent default. Useful to disable on a noisy channel:

agents:
  - id: ana
    link_understanding: { enabled: true }
    bindings:
      - inbound: plugin.inbound.whatsapp.*
        link_understanding: { enabled: false }   # narrow on WA
      - inbound: plugin.inbound.telegram.*
        # inherits agent default (enabled: true)

null / omitted = inherit. Any object = full replace.

What gets injected

For each fetched URL, one bullet:

# LINK CONTEXT

- https://example.com/post — Title of the page
  First paragraphs of main text, collapsed to ~max_bytes characters,
  HTML stripped, scripts and styles dropped.

The block lands inside the system prompt for that turn only. Cache hits skip the fetch but still render the block.

Hard caps (cannot be raised by config)

CapValue
URL length2048 chars
Redirect chain5 hops
User-Agentnexo-link-understanding/0.1
Response stream cutoffmax_bytes (drops the rest)
Newlines / control chars in extracted textsanitised (prompt-injection guard)

Operations

  • A single shared LinkExtractor (HTTP client + LRU cache, capacity 256) is built at boot and reused by every agent runtime in the process.
  • Cache is in-process only. Restarts cold.
  • Telemetry exported on /metrics:
    • nexo_link_understanding_fetch_total{result="ok|blocked|timeout|non_html|too_big|error"} — counter, one increment per fetch attempt.
    • nexo_link_understanding_cache_total{hit="true|false"} — counter, incremented on every TTL-cached lookup so dashboards can compute hit-rate without instrumenting the agent loop.
    • nexo_link_understanding_fetch_duration_ms — histogram (single series, no labels). Only observed for attempts that actually issued an HTTP request — cache hits and host-blocked URLs skip it so latency percentiles reflect real network work.

When to leave it off

  • Agents talking to untrusted senders where the agent must not be pivoted into fetching attacker-controlled URLs.
  • Channels with strict latency budgets — a fetch can add up to timeout_ms to the turn.
  • Privacy-sensitive deployments where outbound HTTP from the agent host is not allowed.

Web search

The web_search built-in tool lets an agent query the web through one of four providers: Brave, Tavily, DuckDuckGo, Perplexity. The runtime owns provider selection, caching, sanitisation, and circuit breaking — agents only see results.

The feature is off by default. Operators opt in per agent (and optionally override per binding).

Per-agent config

# config/agents.yaml
agents:
  - id: ana
    web_search:
      enabled: true               # default false
      provider: auto              # "auto" | "brave" | "tavily" | "duckduckgo" | "perplexity"
      default_count: 5            # 1..=10
      cache_ttl_secs: 600         # 0 disables cache
      expand_default: false       # default value of `expand` arg

provider: auto

Picks the first credentialed provider in this order:

  1. brave (env BRAVE_SEARCH_API_KEY)
  2. tavily (env TAVILY_API_KEY)
  3. perplexity (env PERPLEXITY_API_KEY, requires the perplexity feature)
  4. duckduckgo (no key — bundled by default; the always-available fallback)

DuckDuckGo scrapes html.duckduckgo.com and is rate-limited / captcha-prone; the runtime detects bot challenges and trips the breaker so the next call rotates to a different provider.

Per-binding override

Same shape as link_understanding: null (default) inherits the agent value, any object replaces it.

agents:
  - id: ana
    web_search: { enabled: true }
    bindings:
      - inbound: plugin.inbound.whatsapp.*
        web_search: { enabled: false }   # silent on WA
      - inbound: plugin.inbound.telegram.*
        # inherits agent default

Tool surface

The LLM sees this signature:

{
  "name": "web_search",
  "parameters": {
    "query":     "string  (required)",
    "count":     "integer (1-10, optional)",
    "provider":  "string  (optional override)",
    "freshness": "day | week | month | year (optional)",
    "country":   "ISO-3166 alpha-2 (optional)",
    "language":  "ISO-639-1 (optional)",
    "expand":    "boolean (optional)"
  }
}

Return shape:

{
  "provider": "brave",
  "query":    "rust async runtimes",
  "from_cache": false,
  "results": [
    {
      "url": "https://example.com/post",
      "title": "Title",
      "snippet": "First 4 KiB of the description, sanitised.",
      "site_name": "example.com",
      "published_at": "2026-04-20T00:00:00Z"
    }
  ]
}

When expand: true and Phase 21 link understanding is enabled, the top three hits also get a body field populated by the shared LinkExtractor. Bodies obey the same denylist + size caps that Link understanding describes.

Cache

In-process SQLite cache shared across every agent. Key format:

sha256(SCHEMA_VERSION || provider || query || canonical_params)

canonical_params excludes provider (router decides) and expand (post-processing). cache_ttl_secs: 0 disables caching entirely.

Operators that want a separate cache file or schema migration set web_search.cache.path in web_search.yaml (planned — see FOLLOWUPS).

Circuit breaker

Every provider call goes through nexo_resilience::CircuitBreaker keyed web_search:<provider>. Default config: 5 consecutive failures trip the breaker, exponential backoff up to 120 s. Open-state calls return ProviderUnavailable(provider) immediately and the router rotates to the next candidate (when called via auto-detect).

Sanitisation

Every title, url, and snippet returned by a provider passes through sanitise_for_prompt:

  • control chars stripped,
  • CR / LF / tab collapsed to single spaces,
  • runs of whitespace collapsed,
  • byte-capped at 4 KiB (snippet) / 512 B (title) / 2 KiB (URL),
  • truncation respects UTF-8 char boundaries.

This is the same defence-in-depth Phase 19 (language directive) and Phase 21 (# LINK CONTEXT) apply: SERPs are attacker-controlled input.

Telemetry

Exported on /metrics:

  • nexo_web_search_calls_total{provider,result} — counter, one increment per provider attempt. result is ok (provider returned hits), error (network / HTTP / parse failure), or unavailable (the breaker short-circuited the call before it left the process).
  • nexo_web_search_cache_total{provider,hit} — counter, every TTL-cached lookup. provider is the first candidate (the one the cache key is built from). Compute hit rate as cache_total{hit="true"} / sum(cache_total).
  • nexo_web_search_breaker_open_total{provider} — counter; one increment per request the breaker rejected. Pair with circuit_breaker_state{breaker="web_search:<provider>"} to alert on sustained open state vs a flap.
  • nexo_web_search_latency_ms{provider} — histogram. Only observed for attempts that issued an HTTP request, so the percentile reflects real provider latency (cache hits and breaker short-circuits would pull p50 down to 0 and hide regressions).

When to leave it off

  • Privacy-sensitive deployments where outbound HTTP from the agent host is not allowed.
  • Channels where the cost of a noisy SERP in the prompt outweighs the agent's value (use per-binding enabled: false).
  • Agents that already have link_understanding for the URLs the user shares — no need for SERP duplication.

Web fetch

The web_fetch built-in tool lets an agent retrieve the cleaned body text + title for one or more URLs the agent already knows. Companion to Web search: web_search finds URLs, web_fetch retrieves them.

Distinct from web_search.expand=true because the agent often knows the URL up-front (skill output, RSS poll, calendar attachment, user message) and would otherwise have to either hallucinate a search query or shell out to a fetch-url extension.

When to use which

ScenarioTool
Agent needs to find content matching a queryweb_search
Agent has a URL from a web_search hit and wants the bodyweb_search(expand=true)
Agent has a URL from a poller / skill / user messageweb_fetch
Agent has a list of URLs to triageweb_fetch(urls=[...])

Tool signature

{
  "name": "web_fetch",
  "parameters": {
    "urls":      ["https://example.com/article", "https://other.com/page"],
    "max_bytes":  65536          // optional; clamped to deployment cap
  }
}

Response shape:

{
  "results": [
    {
      "url":   "https://example.com/article",
      "title": "Example article",
      "body":  "First paragraph...",
      "ok":    true
    },
    {
      "url":    "https://internal.intranet.local/private",
      "ok":     false,
      "reason": "fetch failed (host blocked, timeout, non-HTML, oversized, or transport error). Check `nexo_link_understanding_fetch_total{result}` for the bucket."
    }
  ],
  "count": 2
}

A bad URL returns a {ok: false, reason} row instead of bailing the whole call, so the agent can still consume the successful ones. Per-call cap of 5 URLs; longer lists get trimmed with a warn log.

Configuration

web_fetch has no dedicated config. It rides on Link understanding:

  • link_understanding.enabled — gates the tool entirely. With it false, every fetch returns {ok: false, reason: "disabled by policy"}.
  • link_understanding.max_bytes — deployment-wide ceiling. The tool's max_bytes arg can shrink but never grow past this.
  • link_understanding.deny_hosts — host blocklist (loopback, private subnets, internal cloud metadata endpoints, plus whatever the operator added).
  • link_understanding.timeout_ms — per-fetch HTTP timeout.
  • link_understanding.cache_ttl_secs — cache TTL. Successful fetches are cached so a second web_fetch of the same URL inside the TTL is free.

Per-binding overrides via EffectiveBindingPolicy::link_understanding (see Per-binding capability override).

Telemetry

web_fetch reuses every counter the auto-link pipeline emits. There's no separate dashboard:

  • nexo_link_understanding_fetch_total{result}ok / blocked / timeout / non_html / too_big / error.
  • nexo_link_understanding_cache_total{hit}true / false.
  • nexo_link_understanding_fetch_duration_ms — histogram, only populated when an HTTP request actually went out (cache hits and host-blocked URLs skip it so percentiles reflect real fetch work).

The bundled Grafana dashboard (ops/grafana/nexo-llm.json) already plots all three.

Why a per-call cap of 5 URLs

A runaway agent given the prompt "fetch every link in this 10k RSS dump" would otherwise queue thousands of HTTP requests synchronously, blowing the prompt budget and hammering the target hosts. 5 covers every realistic agentic workflow (read 3 candidates, pick the best two, summarise) while leaving a clear ceiling. Operators who want batch behaviour should spawn a TaskFlow that calls web_fetch in chunks with cursor persistence.

Comparison to extensions

The fetch-url Python extension does roughly the same thing. web_fetch differs in three ways:

  1. In-process — no subprocess spawn, no Python interpreter, no extension wire protocol. Sub-100ms cold path on the happy case.
  2. Shared cache + telemetry — links the user shares (auto- expanded by Phase 21 link-understanding) AND links the agent fetches via web_fetch populate the same LRU. The second access is always free.
  3. Same security defaults — same deny-host list, same size cap, same timeout. Operators tune one knob, two surfaces honour it.

Use the extension when the runtime path is wrong shape (custom auth, post-only endpoints, non-HTML responses you want raw). Use web_fetch for the standard "give me the article" case, which is most of them.

Implementation

The tool lives at crates/core/src/agent/web_fetch_tool.rs::WebFetchTool and is registered for every agent unconditionally in src/main.rs. The per-binding link_understanding.enabled policy gates whether the underlying fetch happens; the tool itself is always visible in the agent's tool list so operators can write "call web_fetch on URL X" prompts without needing a per-agent web_fetch.enabled flag.

Source of truth for FOLLOWUPS W-2 closure.

Pairing protocol

Two coexisting protocols ship in nexo-pairing:

  • DM-challenge inbound gate — opt-in per binding. Unknown senders on WhatsApp / Telegram receive a one-time human-friendly code; the operator approves them via CLI. Existing senders pass through unchanged.
  • Setup-code QR — operator-initiated. nexo pair start issues a short-lived HMAC-signed bearer token + a gateway URL, packs them into a base64url payload, and renders a QR. A companion app scans, presents the token to the daemon, and gets a session token in return.

The feature is off by default. Existing setups see no behaviour change until the operator flips pairing_policy.auto_challenge on a binding.

DM-challenge gate

Per-binding config

# config/agents.yaml
agents:
  - id: ana
    inbound_bindings:
      - plugin: whatsapp
        instance: personal
        pairing_policy:
          auto_challenge: true   # default false

The gate runs before the plugin publishes to the broker. Three outcomes per inbound message:

OutcomeWhenPlugin action
Admitsender in pairing_allow_from (or policy off)publish as normal
Challenge { code }unknown sender, auto_challenge: true, slot freereply with code, drop message
Dropmax-pending exhausted (3 per channel/account)silent drop

Operator workflow

$ nexo pair list
CODE       CHANNEL         ACCOUNT          CREATED                     SENDER
K7M9PQ2X   whatsapp        personal         2026-04-25T13:21:00Z        +57311...

$ nexo pair approve K7M9PQ2X
Approved whatsapp:personal:+57311... (added to allow_from)

The next message from +57311... admits through the gate.

pair list only shows pending challenges by default. Use --all to also dump every active row in pairing_allow_from (approved + seeded), and --include-revoked to keep soft-deleted entries in the listing for audit:

$ nexo pair list --all
No pending pairing requests.

CHANNEL         ACCOUNT           SENDER                    VIA         APPROVED                    REVOKED
telegram        cody_nexo_bot     1194292426                seed        2026-04-26 17:52:10 UTC     -
whatsapp        personal          +57311...                 cli         2026-04-25 13:21:00 UTC     -

$ nexo pair list --all --include-revoked --json | jq '.allow[0]'
{
  "channel": "whatsapp",
  "account_id": "personal",
  "sender_id": "+57311...",
  "approved_via": "cli",
  "approved_at": "2026-04-25T13:21:00Z"
}

--json always returns { "pending": [...], "allow": [...] } so consumers get a stable shape regardless of --all.

Cache + revoke

The gate caches decisions for 30 s to keep SQLite off the hot path. Revokes (and freshly-seeded admits) are eventually consistent within that window:

$ nexo pair revoke whatsapp:+57311...
Revoked whatsapp:+57311...

For an immediate effect, trigger a hot-reload — the coordinator runs PairingGate::flush_cache as a post-reload hook (Phase 70.7), so nexo reload (or any file-watched config edit) drops the cache and the next inbound message re-queries the store:

$ nexo reload

A daemon restart still works as a hammer when reload is disabled.

Migrating an existing bot

If you already have known senders, seed them so the gate doesn't challenge mid-conversation when you flip auto_challenge: true:

$ nexo pair seed whatsapp personal +57311... +57222... +57333...
Seeded 3 sender(s) into whatsapp:personal allow_from

seed is idempotent; running it twice is safe and re-activates any sender that was previously revoked.

Setup-code QR

Issuing

$ nexo pair start --public-url wss://nexo.example.com --qr-png /tmp/p.png --json
{
  "url": "wss://nexo.example.com",
  "url_source": "pairing.public_url",
  "bootstrap_token": "eyJwcm9maWxlIjoi...",
  "expires_at": "2026-04-25T13:32:00Z",
  "payload": "eyJ1cmwi..."
}

payload is what goes in the QR. The companion decodes it to recover {url, bootstrap_token, expires_at}, opens the WebSocket, and presents the token as Authorization: Bearer <bootstrap_token>.

URL resolution

Priority chain (first non-empty wins):

  1. --public-url (CLI flag)

  2. tunnel.url (Phase tunnel — TODO: wire when accessor lands)

  3. gateway.remote.url

  4. LAN bind address (when gateway.bind=lan)

  5. fail-closed: the daemon refuses to issue a code on a loopback-only gateway. As of Phase 70.5 the CLI also prints a ready-to-run nexo pair seed <channel> <account> <SENDER> for every plugin instance configured under config/plugins/, so a dev-machine operator can skip the QR flow entirely:

    $ nexo pair start --ttl-secs 300
    Pairing-start needs a non-loopback gateway URL.
    For local testing you usually don't need the QR flow at all —
    seed the operator's chat into the allowlist directly:
    
      nexo pair seed telegram cody_nexo_bot <YOUR_TELEGRAM_USER_ID>
      nexo pair seed whatsapp default <YOUR_WHATSAPP_NUMBER>
    
    Or, to keep using the QR flow, set one of:
      - `pairing.public_url` in config/pairing.yaml
      - `--public-url <wss://…>` flag
      - run `nexo` with the tunnel enabled (writes tunnel.url)
    

ws/wss security policy

Cleartext ws:// is allowed only on hosts the operator can reasonably trust to be private:

  • 127.0.0.1 / ::1 (loopback)
  • RFC1918 (10/8, 172.16/12, 192.168/16)
  • link-local (169.254/16)
  • *.local mDNS hostnames
  • 10.0.2.2 (Android emulator)
  • Any host listed in pairing.ws_cleartext_allow_extra

Everything else exigirá wss://. This matches OpenClaw's posture in research/src/pairing/setup-code.ts.

Token format

b64u(claims_json) + "." + b64u(hmac_sha256(secret, claims_json))
  • claims_json = {"profile":"companion-v1","expires_at":"...","nonce":"<32 hex>","device_label":"..."}
  • secret = 32 bytes in ~/.nexo/secret/pairing.key (auto-generated on first boot with 0600 perms; rotate by deleting + restarting).

Verification is constant-time (subtle crate) so timing leaks don't discriminate between "wrong sig" and "wrong claims".

Threat model

ConcernMitigation
Brute-force pairing code32^8 ≈ 10^12 keyspace; 60 min TTL; max 3 pending per (channel, account)
Token replay after expiryTTL on expires_at (default 10 min); HMAC verify fails closed
Token forgeryHMAC-SHA256 with 32-byte secret; constant-time compare
Secret leakRotate via rm ~/.nexo/secret/pairing.key && restart; all in-flight tokens invalidate
TOCTOU on approveSingle SQL transaction (approve reads + insert + delete in one tx)
ws cleartext on hostile networkRefuse to issue cleartext URL outside private-host allowlist
DoS via flood of pending requestsMax 3 per (channel, account); TTL 60 min auto-prunes

Storage layout

Two SQLite tables in <memory_dir>/pairing.db:

pairing_pending (channel, account_id, sender_id PRIMARY KEY,
                 code, created_at, meta_json)

pairing_allow_from (channel, account_id, sender_id PRIMARY KEY,
                    approved_at, approved_via, revoked_at)

Soft-delete (revoked_at) keeps historical context: an operator can later see "+57311 was approved on X, revoked on Y" for audit.

When to leave it off

  • Single-user setups where the operator is the only sender — the gate adds a SQL hit per message for no security gain.
  • Bots that take public input by design (e.g. a self-service support bot) — the gate would block every customer.
  • Until you have an agent setup web-search-style wizard, manual pair seed is the only friendly migration path.

Adapter registry

Each channel that participates in pairing implements PairingChannelAdapter in its plugin crate. The adapter owns three channel-specific decisions the runtime cannot make on its own:

  • normalize_sender(raw) — canonicalise inbound sender ids before the gate hits the store. WhatsApp strips @c.us / @s.whatsapp.net and prepends +; Telegram lower-cases @username and passes numeric chat ids through.
  • format_challenge_text(code) — render the operator-facing pairing message. The default is plain UTF-8; the Telegram adapter overrides it to escape MarkdownV2 reserved characters and wrap the code in backticks so the user can long-press to copy.
  • send_reply(account, to, text) — publish the challenge through the channel's outbound topic (plugin.outbound.{whatsapp,telegram}[.<account>]) using the payload shape that channel's dispatcher expects.

The bin (src/main.rs) constructs a PairingAdapterRegistry at boot and registers the WhatsApp + Telegram adapters. The runtime consults the registry on every inbound event whose binding has pairing.auto_challenge: true. Channels with no registered adapter fall back to a hardcoded broker publish that mirrors the legacy text on plugin.outbound.{channel} — operators still see the challenge in their channel, but without per-channel formatting.

Telemetry lives under pairing_inbound_challenged_total{channel,result} with result one of delivered_via_adapter, delivered_via_broker, publish_failed, no_adapter_no_broker_topic, so dashboards can split adapter vs. fallback delivery rates per channel.

CLI reference

nexo pair start [--for-device <name>] [--public-url <url>]
                 [--qr-png <path>] [--ttl-secs <n>] [--json]
nexo pair list  [--channel <id>] [--all] [--include-revoked] [--json]
nexo pair approve <CODE> [--json]
nexo pair revoke <channel>:<sender_id>
nexo pair seed <channel> <account_id> <sender_id> [<sender_id>...]
nexo pair help

Anonymous telemetry (opt-in)

Nexo can emit a weekly heartbeat with anonymous, aggregated deployment shape so the project knows what configurations are actually in production. The heartbeat is disabled by default — nothing leaves your host until you explicitly opt in.

This page documents exactly what's sent, what isn't, and how to inspect the payload before enabling it.

What is sent

Every 7 days (drift-resistant — 7d ± 1h jitter), if telemetry is enabled, Nexo POSTs a single JSON document to https://telemetry.lordmacu.dev/nexo over HTTPS:

{
  "schema_version": 1,
  "instance_id": "0fa3...",
  "version": "0.1.1",
  "rust_version": "1.80.1",
  "os": "linux",
  "arch": "aarch64",
  "uptime_days": 14,

  "agents": {
    "total": 3,
    "active_24h": 2
  },

  "channels": {
    "whatsapp": 1,
    "telegram": 1,
    "email": 0,
    "browser": 1
  },

  "llm_providers": [
    "minimax",
    "anthropic"
  ],

  "memory_backend": "sqlite-vec",

  "sessions": {
    "average_per_agent_24h": 12,
    "p95_per_agent_24h": 28
  },

  "extensions_loaded": 4,

  "broker_kind": "nats"
}

What is not sent

  • Message content. Not a single byte of any conversation, prompt, response, or tool call ever leaves the host.
  • Identifiers. No phone numbers, email addresses, contact names, agent names, channel handles. The instance_id is a random UUID generated on first opt-in and stored in ~/.nexo/telemetry-id; it can't be tied to anything except a rerun of the same install.
  • API keys / tokens / secrets. None. The provider list is the literal string "minimax", never the key.
  • IP addresses. The receiving server (telemetry.lordmacu.dev) drops the source IP at ingress before the payload hits any database. The HTTP access log retains only the country code derived from a one-way hash of the IP, used solely to plot the geographic distribution gauge on the public dashboard.
  • Hostname. Not in the payload. Not derived from anything in the payload.
  • Time of day. The heartbeat is jittered so the timestamp doesn't reveal a pattern.

Why opt in

It's the only honest signal the project has about what's actually deployed. Without it, every roadmap discussion is guessing. With it, prioritization improves: if 80% of opt-in deployments use Anthropic + WhatsApp, then a regression on that combo gets a hot-fix; a niche feature goes to maintenance mode.

The aggregate dashboard at https://lordmacu.github.io/nexo-rs/usage/ (published once Phase 41 fully ships) shows everyone what everyone else is doing in aggregate — same data the maintainers see.

Enable / disable

# Show current state + what would be sent right now
nexo telemetry status

# Enable (writes to /etc/nexo-rs/telemetry.yaml or ~/.nexo/telemetry.yaml)
nexo telemetry enable

# Inspect exactly what tomorrow's heartbeat will contain
nexo telemetry preview

# Disable + remove the instance_id file
nexo telemetry disable

Hot-reload aware (Phase 18) — toggling doesn't require a daemon restart. The runtime watches the telemetry config; the next heartbeat tick respects whatever is currently on disk.

First-launch banner

On first nexo boot in a fresh install, the daemon prints once to the journal:

========================================================================
  nexo telemetry is DISABLED.
  Enabling it sends an anonymous, aggregated weekly heartbeat
  describing your deployment shape (channel mix, LLM provider mix,
  agent count). No message content, no identifiers, no API keys.
  Inspect the payload:        nexo telemetry preview
  Enable:                     nexo telemetry enable
  Read the full spec:         https://lordmacu.github.io/nexo-rs/ops/telemetry.html
========================================================================

Subsequent boots stay silent. Toggling on or off prints a one-line confirmation.

Server-side guarantees

The receiving endpoint at telemetry.lordmacu.dev:

  1. Drops the source IP at the load balancer, before the request reaches any application code or log aggregator.
  2. Stores the JSON document verbatim with no enrichment.
  3. Aggregates documents per instance_id only to compute the active_install_count cardinality on the public dashboard.
  4. Retains raw documents for 90 days, then aggregates and deletes the originals.
  5. Does not correlate documents across instance_id rotations — if you nexo telemetry disable && nexo telemetry enable, you become a fresh install in the dataset.

The server source code lives at https://github.com/lordmacu/nexo-telemetry-server (deferred — opens once Phase 41 finishes server side). Reproducible build, verifiable signatures.

Inspecting in transit

The HTTP request is plain HTTPS POST with the JSON payload above as the body. Easy to mitm in a corp environment:

mitmproxy -p 8888 -s drop_telemetry.py &
NEXO_TELEMETRY_PROXY=http://127.0.0.1:8888 nexo telemetry preview

The runtime respects HTTPS_PROXY / HTTP_PROXY / standard proxy env vars for the heartbeat HTTP client (it goes through the same reqwest client every other Nexo egress uses).

Disabling at the firewall

If you just want to make sure no telemetry can leave even if it gets accidentally enabled:

sudo iptables -A OUTPUT -d telemetry.lordmacu.dev -j REJECT

The runtime will see a network error in its logs every 7 days (rate-limited to once-per-week to not flood). It does not retry-forever — one attempt per scheduled tick.

Compliance notes

  • GDPR: anonymous aggregate data with no identifiers and no PII falls outside Article 4(1) "personal data". The instance_id is technical metadata, not a pseudonym — it can't be re-tied to a natural person via any data the project holds.
  • HIPAA: no PHI is collected; the field set is infrastructure metadata only.
  • Corporate sec teams: the receiving endpoint speaks only HTTPS, no fallback to HTTP. The server cert is publicly pinnable. The payload schema is documented + versioned; new fields require bumping schema_version and a documented changelog entry below.

Schema changelog

VersionReleasedWhat changed
1TBD when Phase 41 shipsInitial schema as documented above

Future schema changes append a row here. Old clients are not forced to upgrade — the server accepts every advertised schema_version indefinitely (rolled-up dashboard panels include only the fields a given schema carries).

Out of scope

  • Per-agent / per-binding metrics — that's the Prometheus /metrics endpoint, scraped locally by your own Prometheus (see Grafana dashboards). The telemetry heartbeat is deployment-shape only.
  • Crash reports — Nexo emits anyhow backtraces to the local journal but never sends them off-host.
  • Real-time analytics — heartbeat is once weekly. There's no call-home for live metrics, ever.

Benchmarks

The workspace ships criterion benchmark suites for every hot path that runs on the data plane. CI executes them on every PR + weekly on main so regressions are visible before merge.

Quick run

# Single crate:
cargo bench -p nexo-resilience

# Single bench within a crate:
cargo bench -p nexo-broker --bench topic_matches

# Single group within a bench:
cargo bench -p nexo-broker --bench topic_matches -- 'topic_matches/wildcard'

Output goes to target/criterion/. Open index.html under that directory in a browser for the full HTML report.

Coverage matrix

CrateBenchWhat it measuresRun target
nexo-resiliencecircuit_breakerCircuitBreaker::allow (closed + open), on_success, on_failure, 8-task concurrent allow contentionsub-100ns per call
nexo-brokertopic_matchesNATS-style pattern matching (exact, single-wildcard *, multi-wildcard >, 50-pattern storm)sub-100ns per match
nexo-brokerlocal_publishEnd-to-end LocalBroker::publish with 0 / 1 / 10 / 50 subscribers (DashMap scan + try_send + slow-consumer drop counter)sub-10µs at 50 subs
nexo-llmsse_parsersOpenAI / Anthropic / Gemini SSE parsers, 50-chunk fixtures (typical short answer)chunks/sec scales linearly
nexo-taskflowtickWaitEngine::tick at 10 / 100 / 1 000 active waiting flowssub-millisecond at single-host scale

What's NOT benched yet

These are tracked under Phase 35.5 follow-up:

  • nexo-core transcripts FTS search — needs SQLite fixture seed before the bench is meaningful.
  • nexo-core redaction pipeline — wait for the local-LLM redaction backend (Phase 68.7) so we measure the real path operators ship.
  • nexo-mcp encode_request / parse_notification_method — cheap to add; will land alongside an MCP-stdio round-trip bench.
  • nexo-memory vector-search recall — needs a public dataset baseline.

Add a bench by following the patterns in crates/<x>/benches/:

  1. [dev-dependencies] adds criterion = "0.5" (with async_tokio if you need a runtime).
  2. [[bench]] registers name = "<bench>" and harness = false.
  3. Bench file uses Throughput::Elements(N) so output is ops/sec, not raw ns/iter.
  4. Each criterion_group! covers a distinct conceptual path — don't bundle unrelated paths.

CI integration

.github/workflows/bench.yml runs the matrix on:

  • every PR that touches crates/**, Cargo.lock, or Cargo.toml
  • weekly on Sunday 04:00 UTC against main
  • manual workflow_dispatch

Each run uploads target/criterion/ as an artifact retained 30 days. PR runs save with --save-baseline pr-<number>; main runs save as main. Compare locally with:

# Pull the artifact for PR #42
gh run download <run-id> --name bench-nexo-broker-<run-id>

# Compare against the local main baseline
cargo bench -p nexo-broker -- --baseline main

Today the CI job is informational — a regression doesn't fail the PR. Once we have ~10 main runs of baseline data per crate, the workflow gates on >10% regression per group. That's Phase 35.6 done-criteria.

Known limitations

  • GitHub Actions runners are noisy. The ubuntu-latest shared runner tier shows ±5-10% variance on microbenchmarks. This is why we don't gate on small regressions yet — the baseline noise floor is itself ~5%.
  • Benches don't measure cold cache. cargo bench's warm-up phase reaches steady-state CPU caches; first-call latency on a cold runtime is not captured. Add a separate bench_cold_* group when this matters (it usually doesn't — hot path is what matters at scale).
  • No cross-crate end-to-end benchmark yet. Phase 35.3 (load test rig) covers that; today's suites are per-crate microbenchmarks.

Reading criterion output

A typical run prints:

publish/mixed_50_subs   time:   [12.347 µs 12.451 µs 12.567 µs]
                        thrpt:  [3.9786 Melem/s 4.0153 Melem/s 4.0494 Melem/s]
                 change: time:   [-0.4%  +0.3%  +1.1%]    (p = 0.62 > 0.05)
                         thrpt:  [-1.1% -0.3% +0.4%]
                         No change in performance detected.
  • time is the per-iteration latency (lower better).
  • thrpt is throughput (higher better) — only present when the bench declared Throughput::Elements(N).
  • change compares against the previous run on the same hardware. p > 0.05 means the difference is within noise.

Look for change reporting "Performance has regressed" with a red bar — that's the signal a PR introduced a regression.

Backup + restore

Nexo state lives under NEXO_HOME (default ~/.nexo/ for native installs, /var/lib/nexo-rs/ for the systemd package, /app/data/ in the Docker image). Backing it up + restoring it is the operator's responsibility today; a proper nexo backup / nexo restore subcommand is tracked under Phase 36.

Quickest path — scripts/nexo-backup.sh

The repo ships a shell script that does the right thing without stopping the daemon:

# Single-shot, output to ./
NEXO_HOME=/var/lib/nexo-rs sudo -E scripts/nexo-backup.sh

# Custom output dir, exclude secrets (default)
scripts/nexo-backup.sh --out /backups/

# Include secrets/ for full recovery (encrypt the archive yourself)
scripts/nexo-backup.sh --include-secrets

What it does:

  1. Hot snapshot every SQLite DB via sqlite3 .backup — the official online-backup mechanism. Captures a consistent point-in-time image even with concurrent writers; no daemon stop required.
  2. rsync non-DB state — JSONL transcripts, the agent workspace-git dir if Phase 10.9 is enabled, any operator files dropped under NEXO_HOME. Skips *.tmp, *.lock, and the queue/ disk-queue dir (replays on next boot from NATS, no need to back up).
  3. secret/ excluded by default. Re-run with --include-secrets to include them; encrypt the resulting tarball before transit (use age, gpg, or push to an encrypted bucket).
  4. sha256 manifest at MANIFEST.sha256 inside the archive so restore can verify integrity.
  5. zstd-19 compression — typical 10× ratio over raw SQLite.
  6. Sidecar <archive>.sha256 with the archive's outer hash so backup pipelines can detect transit corruption.

Restore

# Pull the archive locally first
scp ops@host:/backups/nexo-backup-20260426T121500Z.tar.zst .

# Extract
zstd -dc nexo-backup-20260426T121500Z.tar.zst | tar -xf -

# Verify the manifest
cd nexo-backup-20260426T121500Z
sha256sum -c MANIFEST.sha256

# Stop the daemon (state must not be mid-write)
sudo systemctl stop nexo-rs

# Replace state
sudo rsync -a --delete --chown=nexo:nexo \
  ./ /var/lib/nexo-rs/

# Start
sudo systemctl start nexo-rs
sudo journalctl -u nexo-rs -f

The daemon must be stopped during the rsync — SQLite WAL files do not survive a parallel-write replacement.

Cron schedule

Drop in /etc/cron.daily/nexo-backup:

#!/bin/sh
set -eu
ARCHIVE_DIR=/backups/nexo
mkdir -p "$ARCHIVE_DIR"

# Snapshot, retain locally
NEXO_HOME=/var/lib/nexo-rs \
    /opt/nexo-rs/scripts/nexo-backup.sh --out "$ARCHIVE_DIR"

# Push to remote (Backblaze, S3, Wasabi, etc.)
rclone copy --include '*.tar.zst*' "$ARCHIVE_DIR" remote:nexo-backups/

# Retain 30 days locally + 90 days remote
find "$ARCHIVE_DIR" -name 'nexo-backup-*.tar.zst*' -mtime +30 -delete
rclone delete --min-age 90d remote:nexo-backups/

chmod +x /etc/cron.daily/nexo-backup. Single-host operators get a tested daily backup pipeline in 6 lines.

What survives a backup

ComponentIn backupNotes
Long-term memory (vector + relational)memory.db
Transcriptstranscripts/ JSONL + transcripts.db FTS
TaskFlow statetaskflow.db
Pairing store + setup-code key⚠️DB included; key only with --include-secrets
LLM credentials⚠️secret/ only with --include-secrets
Per-agent SOUL.md + MEMORY.mdrsync from workspace
Agent workspace gitfull .git dir included if Phase 10.9 is on
Disk-queue (NATS replay buffer)regenerates from NATS on boot
Process logsjournalctl handles those separately

Migrations

Schema migrations across Nexo versions are still ad-hoc — ALTER TABLE … .ok() patterns inside the runtime. Phase 36 adds:

  • nexo migrate status — show the applied vs available migration set
  • nexo migrate up [target] — apply pending migrations forward
  • nexo migrate down [target] — roll back if a release ships reversible migrations
  • A migrations/ dir with versioned, checksummed SQL files

Until then, pin to a specific Nexo version per deployment and test upgrades on a copy of the backup before applying to production.

Status

Tracked as Phase 36 — Backup, restore, migrations.

Sub-phaseStatus
scripts/nexo-backup.sh shell bridge✅ shipped
Operator doc (this page)✅ shipped
nexo backup --out <dir> subcommand⬜ deferred
nexo restore --from <archive> subcommand⬜ deferred
nexo migrate up/down/status versioned migrations⬜ deferred
Encrypted archive output (age / gpg)⬜ deferred
CI test that backup → restore round-trips on a fixture⬜ deferred

The shell script + this doc are the bridge. Once the runtime subcommands ship, this page rewrites to point at them and the script gets retired.

Privacy toolkit

GDPR-style operator workflows for handling user data requests until the proper nexo forget / nexo export-user subcommands ship (tracked under Phase 50).

Right to be forgotten

scripts/nexo-forget-user.sh does cascading delete across every SQLite DB and JSONL transcript under NEXO_HOME, then VACUUMs the databases so the deleted rows don't survive in free pages.

# Stop the daemon first — SQLite WAL doesn't survive parallel writes
sudo systemctl stop nexo-rs

# DRY RUN — shows what would be deleted, doesn't change anything
NEXO_HOME=/var/lib/nexo-rs sudo -E scripts/nexo-forget-user.sh \
  --id "+5491155556666"

# When the dry-run looks right, re-run with --apply
NEXO_HOME=/var/lib/nexo-rs sudo -E scripts/nexo-forget-user.sh \
  --id "+5491155556666" \
  --apply

# Restart
sudo systemctl start nexo-rs

What gets deleted (cascading across all DBs):

Table columnMatchSource DB
user_idexactevery DB
sender_idexactevery DB (used in pairing, transcripts)
account_idexactevery DB (used in WA / TG plugins)
contact_idexactmemory + transcripts
peer_idexactagent-to-agent routing

Plus JSONL transcript lines where any of those keys equals the target id.

The script emits forget-user-<id>-<timestamp>.json with the exact deletion counts — this is the operator's GDPR audit trail, ship it back to the requester as proof of compliance.

--keep-audit flag

Strict GDPR says even the admin-audit row recording the deletion should be removed (the user has the right to no trace). But that breaks operator audit chains. Use --keep-audit to opt out of that single specific erasure:

nexo-forget-user.sh --id "<id>" --apply --keep-audit

The script keeps the admin_audit table row showing that the deletion happened (without the user-id field, which is hashed). Other tables fully wiped either way.

Right to data export

Until nexo export-user --id <id> ships, manual SQL works:

USER_ID="+5491155556666"
OUT_DIR="export-${USER_ID}-$(date -u +%Y%m%dT%H%M%SZ)"
mkdir -p "$OUT_DIR"

# Stop the daemon for a consistent point-in-time export
sudo systemctl stop nexo-rs

# Per-DB extraction
for db in /var/lib/nexo-rs/*.db; do
    name=$(basename "$db" .db)
    sqlite3 "$db" \
        ".headers on" \
        ".mode json" \
        ".output $OUT_DIR/${name}.json" \
        "SELECT * FROM ($(sqlite3 "$db" '
          SELECT GROUP_CONCAT(
            \"SELECT '\" || name || \"' AS table_name, * FROM \" || name ||
            \" WHERE user_id = '\" || ? || \"' OR sender_id = '\" || ? || \"' OR account_id = '\" || ? || \"'\",
            \" UNION ALL \"
          )
          FROM sqlite_master m
          WHERE m.type='table'
            AND EXISTS (
              SELECT 1 FROM pragma_table_info(m.name) p
              WHERE p.name IN ('user_id','sender_id','account_id')
            )
        '))" -- "$USER_ID" "$USER_ID" "$USER_ID"
done

# Per-JSONL extraction
for f in /var/lib/nexo-rs/transcripts/*.jsonl; do
    name=$(basename "$f")
    jq -c \
        --arg id "$USER_ID" \
        'select((.user_id // .sender_id // .account_id // "") == $id)' \
        "$f" > "$OUT_DIR/$name"
done

# Restart
sudo systemctl start nexo-rs

# Tar + zstd, optionally encrypt
tar -C "$(dirname "$OUT_DIR")" -cf - "$(basename "$OUT_DIR")" | \
    zstd -19 -T0 > "${OUT_DIR}.tar.zst"

# (Recommended) age-encrypt before transit
age -r age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \
    -o "${OUT_DIR}.tar.zst.age" \
    "${OUT_DIR}.tar.zst"
shred -u "${OUT_DIR}.tar.zst"

The result is a tarball the operator hands to the requester — JSON files per DB + filtered transcript JSONLs — encrypted with the requester's age public key.

When nexo export-user --id <id> ships, this whole shell pipeline collapses into one command with built-in encryption.

Retention policy

Operator-defined per deployment. Recommended defaults:

SurfaceRetentionWhy
Transcripts90 daysEnough for ops debugging + agent recall
Memory (long-term)indefiniteAgent's working memory; pruned by recall signals
TaskFlow finished flows30 daysAudit trail for completed work
TaskFlow failed flows365 daysForensics
Admin audit log365 daysCompliance
Disk-queue (NATS replay)7 daysDisaster recovery
Pairing pending requests60 minTTL-enforced by the store

Apply via cron (until nexo retention apply ships):

# /etc/cron.daily/nexo-retention
#!/bin/sh
set -eu
DB=/var/lib/nexo-rs/transcripts.db

# 90-day rolling window on transcripts
sqlite3 "$DB" "DELETE FROM transcripts
                WHERE timestamp < strftime('%s', 'now', '-90 days');"
sqlite3 "$DB" 'VACUUM;'

# Same for taskflow finished + failed
DB=/var/lib/nexo-rs/taskflow.db
sqlite3 "$DB" "DELETE FROM flows
                WHERE status='Finished'
                  AND finished_at < datetime('now', '-30 days');"
sqlite3 "$DB" "DELETE FROM flows
                WHERE status='Failed'
                  AND finished_at < datetime('now', '-365 days');"

PII detection (deferred)

Phase 50 plans inbound PII flagging — separate from the existing outbound redactor. The rough shape:

  • Regex pre-screen for SSN-shape, credit-card-shape (Luhn-checked), phone-number-shape per locale.
  • Optional LLM-backed second-pass via the future Phase 68 local tier (gemma3-270m).
  • Hits land in data/pii-flags.jsonl for operator review; agent dialog continues unimpeded.

Today: nothing automated. The outbound redactor in crates/core/src/redaction.rs (regex-based) catches the obvious shapes before they reach long-term memory or the LLM, but doesn't emit a queue for operator review.

Encryption at rest

Two roads, both deferred to Phase 50.x:

  • Application-levelsqlcipher build of libsqlite3-sys with a key fed from secrets/. Every page encrypted; backups need the same key to restore.
  • Filesystem-leveldm-crypt / LUKS on the volume hosting NEXO_HOME. Operator does it once at provision, no Nexo changes required.

The native install + Hetzner / Fly recipes assume filesystem-level crypto handled by the host (LUKS on Hetzner, encrypted EBS on AWS, Fly volumes are encrypted at rest by default). When sqlcipher is ready we'll document switching tiers.

Status

CapabilityStatus
scripts/nexo-forget-user.sh cascading delete✅ shipped
Operator data-export shell pipeline (above)✅ documented
Retention policy + cron template✅ documented
nexo forget --user <id> subcommand⬜ deferred
nexo export-user --id <id> subcommand⬜ deferred
Inbound PII detection + review queue⬜ deferred
sqlcipher encryption at rest⬜ deferred
Admin-action audit log (separate from this script's manifest)⬜ deferred

Tracked as Phase 50 — Privacy toolkit.

Health checks

Three layers of health probes for a Nexo deployment, each tuned for a different consumer:

  1. /health — liveness. Cheap (atomic flag check). HTTP 200 means the process is up; doesn't guarantee it can serve work.
  2. /ready — readiness. Expensive (verifies broker connection, agents loaded, snapshot warm). HTTP 200 means the runtime can accept inbound traffic. Use this for load-balancer health checks.
  3. scripts/nexo-health.sh — operator + monitoring. JSON summary with counter snapshots. Bridge until nexo doctor health (Phase 44) ships.

Liveness — /health

Returns HTTP 200 + ok body when the agent process is alive. The runtime sets a RUNNING flag at startup and clears it on graceful shutdown. Does not verify any subsystem — useful for "is the daemon there at all" probes.

curl -fsSL http://127.0.0.1:8080/health
# ok

Kubernetes liveness probe:

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 3
  failureThreshold: 3

A failing liveness probe should restart the container. Be generous on initialDelaySeconds — first-boot extension discovery + memory open + agent runtime spin-up can take 15-25s.

Readiness — /ready

Returns 200 only when all of:

  • Broker (NATS or local) is reachable
  • Every configured agent has loaded its tool registry
  • The hot-reload snapshot has been warmed (Phase 18)
  • Pairing store is open (if pairing_policy.auto_challenge is on)

Returns 503 with a JSON body listing the failing subsystem otherwise:

{
  "ready": false,
  "reasons": [
    {"subsystem": "broker", "detail": "nats://localhost:4222: connection refused"}
  ]
}

Use this for load-balancer / service-mesh routing decisions. A node that's live but not ready should not receive traffic.

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  periodSeconds: 5
  timeoutSeconds: 2
  failureThreshold: 1

Operator one-shot — scripts/nexo-health.sh

Single-shot JSON summary intended for watch -n 5 nexo-health.sh during ops, cron health-mailers, and uptime monitors that want one structured payload covering everything.

# Default — pretty human output
scripts/nexo-health.sh

# JSON only (cron, monitoring scrapers)
scripts/nexo-health.sh --json

# Custom hosts (e.g., probing through a service mesh)
scripts/nexo-health.sh --host nexo.internal:8080 \
                      --metrics-host nexo.internal:9090

# Strict mode — open circuit breaker counts as unhealthy.
# Default mode tolerates breaker-open (degraded-but-up).
scripts/nexo-health.sh --strict

Pretty output:

============================================================
 nexo-rs health  ·  2026-04-26T15:30:00Z
============================================================

  overall:      ok
  admin:        127.0.0.1:8080
  metrics:      127.0.0.1:9090

  probes:
    ✓ live       ok
    ✓ ready      ok
    ✓ metrics    ok

  counters:
    tool_calls_total              4711
    llm_stream_chunks_total       28391
    web_search_breaker_open_total 0

JSON shape (for monitoring scrapers):

{
  "overall": "ok",
  "timestamp": "2026-04-26T15:30:00Z",
  "endpoints": { "admin": "127.0.0.1:8080", "metrics": "127.0.0.1:9090" },
  "probes": [
    {"name": "live",    "status": "ok", "detail": "ok"},
    {"name": "ready",   "status": "ok", "detail": "{...}"},
    {"name": "metrics", "status": "ok", "detail": "# HELP nexo_..."}
  ],
  "counters": {
    "tool_calls_total":              4711,
    "llm_stream_chunks_total":       28391,
    "web_search_breaker_open_total": 0
  }
}

Exit codes:

  • 0 — overall healthy
  • 1 — at least one probe failed (or --strict and a breaker is open)

Cron health mailer

# /etc/cron.d/nexo-health
*/5 * * * * nexo /opt/nexo-rs/scripts/nexo-health.sh --json --strict \
    >> /var/log/nexo-rs/health.jsonl 2>&1 \
    || (tail -1 /var/log/nexo-rs/health.jsonl | mail -s "nexo unhealthy" ops@yourorg)

Five-minute resolution, one line of JSONL per check, mail on failure.

Uptime monitor integration

UptimeRobot / BetterStack / Pingdom:

URL:        https://nexo.example.com/ready
Interval:   60s
Timeout:    5s
Expected:   HTTP 200

That's all most monitors need. The JSON body of /ready explains the failure when the alert fires.

What nexo-health.sh adds beyond /ready

Signal/readynexo-health.sh
Process up + accepting traffic
Counter snapshot (tool calls, LLM chunks)
Web-search breaker state
Single JSON payload❌ (HTTP 200/503)
Suitable for HTTP probe❌ (shells out)

Use /ready for the orchestrator. Use nexo-health.sh for the operator's eyeballs and the alerting pipeline.

Status

Tracked as Phase 44 — Auxiliary observability surfaces.

CapabilityStatus
/health liveness endpoint✅ shipped (Phase 9)
/ready readiness endpoint✅ shipped (Phase 9)
scripts/nexo-health.sh operator one-shot✅ shipped
Operator runbook (this page)✅ shipped
nexo doctor health aggregating subcommand⬜ deferred
nexo inspect <session_id> state-transition pretty-print⬜ deferred
Per-session structured event log under data/events/⬜ deferred

Cost & quota controls

Operator runbook for tracking + capping LLM spend. Today the runtime emits enough Prometheus metrics for an operator to build their own picture; the proper nexo costs subcommand + budget caps land in Phase 45.

Estimating spend — scripts/nexo-cost-report.sh

Aggregates nexo_llm_stream_chunks_total by provider, multiplies by a price table, prints (or emits JSON) per-provider rolling totals.

# Human-readable report against the local /metrics endpoint
scripts/nexo-cost-report.sh

# JSON for monitoring / dashboards
scripts/nexo-cost-report.sh --json

# Custom price table (your negotiated enterprise rates)
scripts/nexo-cost-report.sh --prices ~/our-enterprise-rates.tsv

# Probe a remote daemon
scripts/nexo-cost-report.sh --metrics-host nexo.internal:9090

Pretty output:

============================================================
 nexo-rs cost report  ·  2026-04-26T15:30:00Z
============================================================

  PROVIDER                    CHUNKS     EST_TOKENS    EST_USD
  anthropic                    28391          85173    $0.7666
  minimax                       4711          14133    $0.0042
  ollama                        1208           3624    $0.0000

  total estimated: $0.7708

  disclaimer: heuristic estimate. Calibrate
    NEXO_TOKENS_PER_CHUNK once you have a measured baseline.

Calibration

The default tokens-per-chunk = 3 is a heuristic. To get an accurate number for your deployment:

  1. Find a typical conversation in transcripts (session_logs tool output).
  2. Sum the usage.total_tokens from the chat.completion end event(s).
  3. Divide by the total chunk count emitted during that conversation (visible in nexo_llm_stream_chunks_total{provider="...",kind="text_delta"}).
  4. Set NEXO_TOKENS_PER_CHUNK env to the result.

Example:

# Anthropic typical: 4-token granularity per delta
NEXO_TOKENS_PER_CHUNK=4 scripts/nexo-cost-report.sh

# OpenAI typical: 1 token per delta on streaming
NEXO_TOKENS_PER_CHUNK=1 scripts/nexo-cost-report.sh

When the runtime ships nexo_llm_tokens_total{provider,model,direction} (Phase 45 deliverable), the heuristic is replaced by direct token counts and the calibration step disappears.

Built-in price table

ProviderModel$/1M in$/1M out
anthropicclaude-opus-415.0075.00
anthropicclaude-sonnet-43.0015.00
anthropicclaude-haiku-40.804.00
openaigpt-4o2.5010.00
openaigpt-4o-mini0.150.60
minimaxabab6.5s0.200.60
minimaxM2.50.301.50
geminigemini-1.5-pro1.255.00
geminigemini-1.5-flash0.0750.30
deepseekdeepseek-chat0.140.28
ollama*0.000.00

These are public list prices as of 2026-04. Operators with enterprise contracts override via --prices:

provider	model	in_per_1m	out_per_1m
anthropic	claude-sonnet-4	2.40	12.00
openai	gpt-4o	2.00	8.00

(One row per provider×model. * model = applies to any model from that provider.)

Daily budget alerts via cron

Snapshot every 24h, mail the operator if estimated spend > cap:

# /etc/cron.daily/nexo-cost-alert
#!/bin/sh
set -eu
CAP=10.00            # $/day soft cap

REPORT=$(/opt/nexo-rs/scripts/nexo-cost-report.sh --json)
TOTAL=$(echo "$REPORT" | jq -r '.total_estimated_usd')

if awk -v t="$TOTAL" -v c="$CAP" 'BEGIN { exit !(t > c) }'; then
    echo "$REPORT" | mail -s "nexo daily spend over \$$CAP: \$$TOTAL" \
        ops@yourorg.com
fi

This is alerting only, not enforcement — the runtime keeps serving traffic. For hard caps, wait for Phase 45.

Hard quota caps (deferred)

Phase 45 ships per-agent monthly budget caps:

# config/agents.yaml — once 45.x lands
agents:
  - id: kate
    cost_cap_usd:
      monthly: 50.00
      daily: 5.00
      action: refuse_new_turns   # or: warn_only, throttle
      warn_topic: alerts.kate.budget

When hit:

  • refuse_new_turns — agent returns a fixed response ("I've reached my budget for the period; please ask the operator to extend.") to every new inbound. Existing in-flight turns finish.
  • warn_only — log + telemetry but keep serving.
  • throttle — switch to a cheaper model variant (claude-haiku-4 instead of claude-opus-4) for the rest of the period.

Per-binding token rate limits (e.g. "WhatsApp sales binding capped at 5k tokens/hour") layer on top of the existing sender_rate_limit. Phase 45.x.

Inspecting the metrics directly

If the script is too coarse:

# Top providers by total chunks (last 5m rate)
curl -sS http://127.0.0.1:9090/metrics | \
    awk '/^nexo_llm_stream_chunks_total/{gsub(/.*provider="/, "", $1); gsub(/".*/, "", $1); n[$1]+=$2} END{for (p in n) print n[p], p}' | \
    sort -rn

# TTFT p95 by provider (curl + jq if you have promtool):
promtool query instant http://127.0.0.1:9090 \
    'histogram_quantile(0.95, sum by (provider, le) (rate(nexo_llm_stream_ttft_seconds_bucket[5m])))'

The full metric inventory lives in Grafana dashboards → metric coverage (in repo as ops/grafana/README.md).

Status

Tracked as Phase 45 — Cost & quota controls.

CapabilityStatus
scripts/nexo-cost-report.sh heuristic estimator✅ shipped
Operator runbook (this page)✅ shipped
nexo_llm_tokens_total{provider,model,direction} metric⬜ deferred
Per-agent monthly budget cap (config + enforcement)⬜ deferred
agents.<id>.cost_cap_usd schema⬜ deferred
Per-binding token rate limit⬜ deferred
Pre-flight token-count predictor in agent prompt⬜ deferred
nexo costs CLI rolling 24h/7d/30d aggregator⬜ deferred
/api/costs admin endpoint⬜ deferred

Recipes

End-to-end walkthroughs that wire multiple subsystems together. Each recipe runs against a clean checkout of nexo-rs — prerequisites are at the top.

RecipeWhat you build
WhatsApp sales agentA drop-in agent that greets WhatsApp leads, asks qualifying questions, and notifies a human on hot leads.
Agent-to-agent delegationRoute work from one agent to another using agent.route.* with correlation ids.
Python extensionWrite a stdlib-only extension that adds a custom tool to any agent.
MCP server from Claude DesktopExpose the agent's tools to the Anthropic desktop client.
NATS with TLS + authHarden the broker for a multi-node deployment.
Rotating config without downtimeThree Phase 18 hot-reload scenarios: API key rotation, A/B prompt swap, narrowing an outbound allowlist mid-incident.

If a recipe drifts from reality, open an issue — it means the docs didn't get updated alongside a code change.

WhatsApp sales agent

Build a drop-in agent that handles a sales line on WhatsApp:

  • Greets the lead with the right operator (ETB / Claro / generic)
  • Qualifies via a short scripted flow (address, package, budget)
  • Notifies a human on hot leads, narrows the tool surface so the LLM only ever sees the lead-notification tool

This is the production shape of the shipped ana agent.

Prerequisites

  • agent built (cargo build --release)
  • NATS running (docker run -p 4222:4222 nats:2.10-alpine)
  • A MiniMax M2.5 key
  • A phone with WhatsApp ready to scan a QR

1. Provide the LLM key

export MINIMAX_API_KEY=...
export MINIMAX_GROUP_ID=...

2. Create a gitignored agent file

config/agents.d/ana.yaml is gitignored; put the business-sensitive content there.

agents:
  - id: ana
    model:
      provider: minimax
      model: MiniMax-M2.5
    plugins: [whatsapp]
    inbound_bindings:
      - plugin: whatsapp
    allowed_tools:
      - notify_lead                        # only this tool is visible
    outbound_allowlist:
      whatsapp:
        - "573000000000@s.whatsapp.net"    # human advisor's WA
    workspace: ./data/workspace/ana
    workspace_git:
      enabled: true
    heartbeat:
      enabled: false
    system_prompt: |
      You are Ana, a sales advisor for ETB and Claro. Help customers
      choose the best internet, TV, and phone package.

      On the first incoming message:
      - If it contains "etb" -> route directly to the ETB flow.
      - If it contains "claro" -> route directly to the Claro flow.
      - Otherwise, ask which operator they prefer.

      Capture: name, address, socioeconomic stratum, preferred package
      (internet only / internet+TV / triple play).

      When the lead is ready, invoke `notify_lead` with JSON containing:
      {name, phone, address, operator, package, notes}. Do not call any
      other tool — this is your only tool.

3. Pair WhatsApp for this agent

./target/release/agent setup whatsapp

The wizard creates ./data/workspace/ana/whatsapp/default/, flips config/plugins/whatsapp.yaml::whatsapp.session_dir to point at it, and renders a QR. Scan from the WhatsApp app.

4. Ship the notify_lead tool as an extension

Copy the Rust template and rename:

cp -r extensions/template-rust extensions/notify-lead
cd extensions/notify-lead

Edit plugin.toml:

[plugin]
id = "notify-lead"
version = "0.1.0"

[capabilities]
tools = ["notify_lead"]

[transport]
type = "stdio"
command = "./target/release/notify-lead"

Implement tools/notify_lead in src/main.rs — it should publish to plugin.outbound.whatsapp.default with a recipient = the human advisor number you listed in outbound_allowlist.

Build and install:

cargo build --release
cd ../..
./target/release/agent ext install ./extensions/notify-lead --link --enable
./target/release/agent ext doctor --runtime

5. Run

./target/release/agent --config ./config

Flow diagram

sequenceDiagram
    participant U as Lead
    participant WA as WhatsApp
    participant N as NATS
    participant A as Ana
    participant H as Human advisor

    U->>WA: "Hi, I want internet service"
    WA->>N: plugin.inbound.whatsapp
    N->>A: deliver
    A->>A: qualify (address, package)
    A->>A: invoke notify_lead(json)
    A->>N: plugin.outbound.whatsapp (advisor number)
    N->>WA: deliver
    WA->>H: "🚨 New lead — Luis, 573111111111, triple play"

Why this shape works

  • allowed_tools: [notify_lead] prevents the LLM from hallucinating other actions — the model literally cannot see other tools.
  • outbound_allowlist.whatsapp is defense-in-depth: even if the LLM crafts a send to an unexpected number, the runtime rejects it.
  • workspace_git.enabled: true lets you audit what Ana remembered over time via memory_history — useful for reviewing tough calls.
  • Gitignored agents.d/ana.yaml keeps tarifarios and business content out of the public repo.

Testing

  • Open WhatsApp on a second phone and send "hi, ETB"
  • Watch agent status ana for session activity
  • Watch docker compose logs agent | jq 'select(.agent == "ana")' for turn-by-turn reasoning

Agent-to-agent delegation

Route work from one agent to another using agent.route.<target_id> with a correlation id. Typical shapes:

  • Kate delegates research to ops and waits for the reply
  • Ana fans out lead data to crm-bot, ticket-bot, and logger
  • A supervisor agent orchestrates specialist subagents

Prerequisites

  • Two agents configured in config/agents.yaml (and/or agents.d/)
  • NATS running
  • Either agent can be the caller or callee; the topology is symmetric

Agent config

agents:
  - id: kate
    model: { provider: minimax, model: MiniMax-M2.5 }
    plugins: [telegram]
    inbound_bindings: [{ plugin: telegram }]
    allowed_delegates: [ops, crm-bot]
    description: "Personal assistant; delegates research to ops."

  - id: ops
    model: { provider: minimax, model: MiniMax-M2.5 }
    accept_delegates_from: [kate]
    description: "Operations agent; answers factual questions about systems."

Key fields:

  • allowed_delegates (on the caller) — globs of peer ids this agent may route to. Empty = no restriction.
  • accept_delegates_from (on the callee) — inverse gate. Empty = no restriction.
  • description — injected into both sides' # PEERS block so the LLM knows who can do what.

Both gates are glob lists and can be set on either side or both.

Wire shape

sequenceDiagram
    participant K as Kate
    participant B as NATS
    participant O as Ops

    Note over K: LLM decides to delegate
    K->>B: publish agent.route.ops<br/>{correlation_id: "req-abc", body: "what's the latest DB migration status?"}
    B->>O: deliver
    O->>O: on_message + LLM turn
    O->>B: publish agent.route.kate<br/>{correlation_id: "req-abc", body: "migration 0042 is running..."}
    B->>K: deliver
    K->>K: correlate reply by req-abc

Correlation ids are caller-chosen strings. The callee echoes the id back on the reply; the caller uses it to match replies to requests (especially for fan-out + reassemble patterns).

Using the delegate tool

The runtime exposes a delegate tool whenever allowed_delegates is non-empty. LLM call shape:

{
  "name": "delegate",
  "args": {
    "to": "ops",
    "body": "what's the latest DB migration status?"
  }
}

The runtime:

  1. Generates a fresh correlation_id
  2. Publishes to agent.route.ops with that id
  3. Waits (bounded) for the reply on agent.route.kate
  4. Returns the body as the tool result

Timeouts and retry policy match the broker defaults — the circuit breaker on the target topic protects against an unreachable callee.

Fan-out

To fan out to multiple peers, the LLM can issue several delegate calls in one turn. The runtime issues each with a unique correlation_id and gathers the replies in parallel.

Guardrails

  • Self-delegation is rejected at the manager level.
  • Unknown target id → tool returns an error result, no broker traffic.
  • allowed_delegates empty + no constraint means the agent can delegate to any peer — prefer an explicit list in production.

Observability

Every delegation emits two log lines (dispatch + reply) with structured fields:

{"agent": "kate", "target": "ops", "correlation_id": "...", "event": "delegate_dispatch"}
{"agent": "kate", "target": "ops", "correlation_id": "...", "event": "delegate_reply", "latency_ms": 1342}

Filter on correlation_id to trace a single delegation end to end.

Python extension

Ship a custom tool written in Python — no dependencies beyond stdlib. The agent spawns your script, handshakes with it over stdin/stdout, and exposes your tool to the LLM.

Prerequisites

  • python3 on the host $PATH
  • A running nexo-rs install with extensions.enabled: true

1. Copy the template

cp -r extensions/template-python extensions/word-count
cd extensions/word-count

2. Edit plugin.toml

[plugin]
id = "word-count"
version = "0.1.0"
description = "Count words in a piece of text."
priority = 0

[capabilities]
tools = ["count_words"]

[transport]
type = "stdio"
command = "python3"
args = ["./main.py"]

[requires]
bins = ["python3"]

[meta]
license = "MIT OR Apache-2.0"

[requires] bins = ["python3"] gates the extension: if Python isn't on $PATH, the runtime skips the extension with a warn log instead of crash-looping.

3. Write main.py

#!/usr/bin/env python3
import sys, json

def reply(id, result=None, error=None):
    msg = {"jsonrpc": "2.0", "id": id}
    if error is None:
        msg["result"] = result
    else:
        msg["error"] = error
    sys.stdout.write(json.dumps(msg) + "\n")
    sys.stdout.flush()

def log(*args):
    print(*args, file=sys.stderr, flush=True)

HANDSHAKE = {
    "server_version": "0.1.0",
    "tools": [{
        "name": "count_words",
        "description": "Count whitespace-separated words in a string.",
        "input_schema": {
            "type": "object",
            "properties": {"text": {"type": "string"}},
            "required": ["text"]
        }
    }],
    "hooks": []
}

def main():
    log("word-count starting")
    for line in sys.stdin:
        try:
            req = json.loads(line)
        except json.JSONDecodeError:
            continue
        method = req.get("method", "")
        rid = req.get("id")
        if method == "initialize":
            reply(rid, HANDSHAKE)
        elif method == "tools/count_words":
            params = req.get("params", {}) or {}
            text = params.get("text", "")
            count = len(text.split())
            reply(rid, {"count": count})
        else:
            reply(rid, error={"code": -32601, "message": f"unknown method: {method}"})

if __name__ == "__main__":
    main()

Make it executable:

chmod +x main.py

4. Validate and install

cd ../..
./target/release/agent ext validate ./extensions/word-count/plugin.toml
./target/release/agent ext install ./extensions/word-count --link --enable
./target/release/agent ext doctor --runtime

--link creates a symlink instead of a copy — good for the edit-test loop. doctor --runtime actually spawns the extension and runs the handshake, so a Python error that kills the interpreter during init surfaces here rather than in production logs.

5. Allow the tool per agent

The registered tool name is ext_word-count_count_words. Add it to the right agent's allowed_tools (or use a glob):

agents:
  - id: kate
    allowed_tools:
      - ext_word-count_*
      # ...

6. Run

./target/release/agent --config ./config

Send a message that would prompt the LLM to use the tool; watch the logs for tools/count_words on stderr.

Debugging

  • stderr of the Python process is forwarded to the agent's log pipeline. print(..., file=sys.stderr) lines show up in the agent's tracing output with the extension=word-count field.
  • Handshake failures are visible in ext doctor --runtime and prevent the tool from being registered at all.
  • Per-tool latency shows up in the nexo_tool_latency_ms{tool="ext_word-count_count_words"} Prometheus histogram.

Productionizing

  • Pin command to an absolute path or a virtualenv-local interpreter; python3 on $PATH may vary across hosts.
  • Pick your dependency strategy carefully — the template is stdlib only. If you need requests or similar, ship a requirements.txt
    • bootstrap script, or switch to the Rust template.
  • If the extension holds a connection to a remote service, add a heartbeat loop so you can detect liveness.
  • For long-running tool calls, print status events to stderr — they become structured log entries and help debug hung tools.

MCP server from Claude Desktop

Expose nexo-rs tools (memory, Gmail, WhatsApp send, browser, etc.) to the Anthropic desktop app so your agent-sandboxed capabilities show up inside Claude conversations.

Same technique works for Cursor, Zed, and anything else that speaks MCP — the config shape is identical.

Prerequisites

  • Built agent binary at a known path (e.g. /usr/local/bin/agent)
  • A working config/ directory (reuse the one your daemon normally uses, or point at a dedicated one)
  • Anthropic API key (or OAuth bundle) configured for the agent

1. Enable the MCP server

config/mcp_server.yaml:

enabled: true
name: nexo
allowlist:
  - memory_*           # recall + store + history
  - forge_memory_checkpoint
  - google_*           # if you paired Google OAuth
  - browser_*          # if you want Claude to drive Chrome
expose_proxies: false  # hide ext_* and mcp_* from the IDE
auth_token_env: ""     # leave empty for local spawn; set if tunneling

Pick the smallest allowlist that covers what you want the IDE to do. Each glob is power you're handing the IDE user.

2. Wire Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "nexo": {
      "command": "/usr/local/bin/agent",
      "args": ["mcp-server", "--config", "/srv/nexo-rs/config"],
      "env": {
        "RUST_LOG": "info",
        "AGENT_LOG_FORMAT": "json"
      }
    }
  }
}

Restart Claude Desktop. The nexo block should appear in the tool picker; pick tools from it the same way you pick built-ins.

3. Verify

Ask Claude: "use the nexo tool my_stats and show me the output."

If it works, Claude calls agent mcp-server as a subprocess, which emits JSON-RPC over stdin/stdout. Logs hit Claude's app-level log file plus stderr of the spawned agent (configurable via AGENT_LOG_FORMAT=json).

Wire shape

sequenceDiagram
    participant CD as Claude Desktop
    participant A as agent mcp-server
    participant TR as ToolRegistry
    participant MEM as Memory tool
    participant LTM as SQLite

    CD->>A: spawn subprocess
    CD->>A: initialize
    A-->>CD: {capabilities: {tools}}
    CD->>A: notifications/initialized
    CD->>A: tools/list
    A->>TR: enumerate (allowlist-filtered)
    TR-->>A: tool defs
    A-->>CD: [memory_recall, memory_store, …]
    CD->>A: tools/call {name: memory_recall, args: {query: "..."}}
    A->>MEM: invoke
    MEM->>LTM: SELECT ...
    LTM-->>MEM: rows
    MEM-->>A: result
    A-->>CD: content

Recipes within the recipe

Recall my cross-session memory from Claude

Allowlist:

allowlist:
  - memory_recall
  - memory_history

Now inside a Claude conversation: "recall what I told you about Luis's address last week." Claude calls memory_recall on your agent's SQLite — Claude itself has no persistent memory; this is how you give it one.

Post to WhatsApp from Claude

Allowlist:

allowlist:
  - whatsapp_send_message

⚠ Be careful. This gives whoever sits at the IDE the ability to send WhatsApp messages from your paired account. Only enable if you trust the IDE user as much as you'd trust the agent.

Read-only Gmail from Claude

Allowlist:

allowlist:
  - google_auth_status
  - google_call

Pair with GOOGLE_ALLOW_SEND= (unset) to keep the google_call tool read-only.

Auth token

If you expose the MCP server over a tunnel (not a local spawn), set auth_token_env to guard the initialize call:

auth_token_env: NEXO_MCP_TOKEN

Then set NEXO_MCP_TOKEN in the agent's env and have the client send it on initialize. Clients that don't present the token are rejected.

Gotchas

  • expose_proxies: true transitively exposes every upstream MCP server. If the agent already consumes a Gmail MCP server, turning this on lets Claude reach through — usually not what you want.
  • Allowlist globs match whole tool names. memory_* is OK; mem* is not — enumerate with agent ext list and real tool names before wiring globs.
  • Rate limits still apply. whatsapp_send_message through this path counts against the same WhatsApp rate bucket as the agent's own uses.

NATS with TLS + auth

Harden the broker for a multi-node deployment: mTLS on the client connection, NKey-based authentication, and a separate NATS server process (not the throwaway Docker-compose one).

Prerequisites

  • A NATS server ≥ 2.10
  • nsc CLI for generating NKeys
  • The agent binary deployed where it will run

1. Generate NKeys

nsc add operator --generate-signing-key nexo-ops
nsc add account --name nexo-prod
nsc add user --name agent-kate --account nexo-prod
nsc generate creds --account nexo-prod --name agent-kate > secrets/agent-kate.nkey

secrets/agent-kate.nkey is a single-file credential that contains both the NKey seed and the signed JWT. Treat it like any other secret — gitignored, Docker-secret, k8s-secret.

2. Configure the NATS server

nats-server.conf:

listen: 0.0.0.0:4222
http: 0.0.0.0:8222

tls {
  cert_file: "/etc/nats/tls/server.crt"
  key_file:  "/etc/nats/tls/server.key"
  ca_file:   "/etc/nats/tls/ca.crt"
  verify:    true       # require client certs too (mTLS)
}

authorization {
  operator = "/etc/nats/nsc/operator.jwt"
  resolver = MEMORY
  accounts = [
    { name: nexo-prod, jwt: "/etc/nats/nsc/nexo-prod.jwt" }
  ]
}

Start the server:

nats-server -c nats-server.conf

3. Configure the agent

config/broker.yaml:

broker:
  type: nats
  url: tls://nats.example.com:4222
  auth:
    enabled: true
    nkey_file: ./secrets/agent-kate.nkey
  persistence:
    enabled: true
    path: ./data/queue
  fallback:
    mode: local_queue
    drain_on_reconnect: true

The agent reads nkey_file at startup and presents it on every connection.

4. Verify the client

Before starting the full agent, smoke-test the credentials with the nats CLI:

nats --creds ./secrets/agent-kate.nkey \
     --tlsca /etc/nats/tls/ca.crt \
     -s tls://nats.example.com:4222 \
     pub test.topic "hello"

If this works, the agent will too.

5. Deploy

Start the agent as usual:

agent --config ./config

On boot the agent:

  1. Opens a TLS connection to the broker
  2. Presents its NKey + JWT
  3. Server validates against the operator/account JWT
  4. Subscribes only to subjects its account is allowed to access

6. Multi-agent isolation

Give each agent its own NKey and an export/import declaration in the NSC account so agents can talk to each other on specific subjects only. Example policy:

# allow kate to publish agent.route.ops
# deny kate from publishing plugin.outbound.* (only the WA plugin should)

The agent does not enforce NATS auth itself — it just presents credentials. The broker enforces. That's the point: you can revoke a compromised agent without touching the agent's code or config.

Observability

  • circuit_breaker_state{breaker="nats"} flips to 1 if the broker rejects the credentials on startup or after a refresh
  • disk queue buffers every publish while the circuit is open — see Event bus — disk queue
  • nats --trace on the server side logs every auth failure with the rejected subject

Gotchas

  • verify: true (mTLS) requires client certs and NKey auth. Picking one or the other is a policy choice — don't half-configure.
  • JWT expiry. Account JWTs expire; NSC's push command renews them against the resolver.
  • Disk queue on client side. Even with auth misconfigured, the agent keeps running on the local fallback; operators may miss the outage without alerting on circuit_breaker_state.

Rotating config without downtime

Three practical hot-reload scenarios. Each shows the YAML edit, how to trigger the swap, and what the operator should see in the logs and on the metrics endpoint. Reference: Config hot-reload.

Prerequisites

  • A running daemon (agent in another terminal or under systemd).
  • Broker reachable from the same host (broker.yaml).
  • Phase 16 + Phase 18 features enabled (default since 0.x of nexo-rs).

A quick sanity check:

$ agent reload
reload v1: applied=1 rejected=0 elapsed=14ms
  ✓ ana

If you get exit 1 with "no control.reload.ack received within 5s", the daemon isn't running or runtime.reload.enabled is false — fix that first.


1. Rotate an LLM API key

The Anthropic key on production rotates every 90 days. Old key still valid for an hour after the rotation.

Edit

config/llm.yaml:

 providers:
   anthropic:
-    api_key: ${file:./secrets/anthropic_old.txt}
+    api_key: ${file:./secrets/anthropic_new.txt}
     base_url: https://api.anthropic.com

Apply

# Drop the new key first, THEN trigger the reload — the file watcher
# would also do it 500 ms after the save, the CLI is just explicit.
$ printf '%s' "sk-ant-..." > secrets/anthropic_new.txt
$ chmod 600 secrets/anthropic_new.txt
$ agent reload
reload v2: applied=2 rejected=0 elapsed=22ms
  ✓ ana
  ✓ bob

Verify

# The aggregate counter bumped:
$ curl -s localhost:9090/metrics | grep config_reload_applied_total
config_reload_applied_total 2

# Per-agent versions advanced:
$ curl -s localhost:9090/metrics | grep runtime_config_version
runtime_config_version{agent_id="ana"} 2
runtime_config_version{agent_id="bob"} 2

# Watch one agent's next turn — the new key is used by the LlmClient
# rebuilt inside RuntimeSnapshot::build:
$ tail -f agent.log | grep "llm request"

In-flight LLM calls keep using the old client (the in-flight Arc<dyn LlmClient> is captured per-turn). They land in <30 s; the old key is still valid for the hour the auth team gave you.


2. A/B test a system prompt

You want to roll out a friendlier sales pitch on Ana's WhatsApp binding without touching the Telegram one (which has a longer support persona).

Edit

config/agents.d/ana.yaml:

 inbound_bindings:
   - plugin: whatsapp
     allowed_tools: [whatsapp_send_message]
     outbound_allowlist:
       whatsapp: ["573115728852"]
-    system_prompt_extra: |
-      Channel: WhatsApp sales. Follow the ETB/Claro lead-capture flow.
+    system_prompt_extra: |
+      Channel: WhatsApp sales (variant B — warmer tone).
+      Follow the ETB/Claro lead-capture flow but lead with a personal
+      greeting and use first names.
   - plugin: telegram
     instance: ana_tg
     allowed_tools: ["*"]
     ...

Apply

The file watcher picks the save up automatically:

$ tail -f agent.log
INFO config reload applied version=3 applied=["ana"] rejected_count=0 elapsed_ms=18

Or trigger manually:

$ agent reload
reload v3: applied=1 rejected=0 elapsed=18ms
  ✓ ana

Verify

Send one message on each channel and tail the LLM request log to see which prompt block went to the model.

$ grep "snapshot_version=3" agent.log
INFO inbound matched binding agent_id=ana plugin=whatsapp \
  binding_index=0 snapshot_version=3

Telegram binding's system_prompt_extra is unchanged; only the WA binding picks up variant B.

Roll back

If variant B underperforms, git revert the YAML and agent reload. Sessions in flight finish their turn on B; the next inbound is back on A.


3. Tighten an outbound allowlist after an incident

A jailbroken prompt almost made Ana send WhatsApp messages to arbitrary numbers (Phase 16's defense-in-depth caught it). Until you investigate, narrow the allowlist to the on-call advisor only.

Edit

config/agents.d/ana.yaml:

 inbound_bindings:
   - plugin: whatsapp
     allowed_tools: [whatsapp_send_message]
     outbound_allowlist:
       whatsapp:
-        - "573115728852"
-        - "573215555555"
-        - "573009999999"
+        - "573115728852"   # incident-only: on-call advisor

Apply

$ agent reload
reload v4: applied=1 rejected=0 elapsed=15ms
  ✓ ana

Verify

Try the previously-allowed-but-now-blocked number from a test message. The LLM will try; the tool will reject:

ERROR tool_call rejected reason="recipient 573215555555 is not in \
  this agent's whatsapp outbound allowlist"

The session's Arc<RuntimeSnapshot> is captured at the start of each turn, so even mid-conversation the next user reply re-loads from the new snapshot and the allowlist update takes effect immediately.


What you cannot reload (yet)

  • Adding or removing agents — restart the daemon. Phase 19.
  • Plugin instances (whatsapp.yaml, telegram.yaml instance blocks) — restart the daemon. Plugin sessions own QR pairing / long-polling state that needs lifecycle plumbing. Phase 19.
  • broker.yaml, memory.yaml — restart the daemon. Long-lived connections + storage handles aren't safe to swap mid-flight.
  • workspace, skills_dir, transcripts_dir on an agent — restart that agent.

The daemon logs every restart-required field that changed during a reload as warn so you don't have to remember which knob lives where.

See also

Build a poller module

Three steps. No main.rs edit, no scheduler, no breaker, no SQLite work. The runner gives you all of that — your code only describes what to fetch, what to dispatch, and (optionally) what kind-specific LLM tools to expose.

Reference: crates/poller/src/builtins/ for in-tree examples (gmail.rs, rss.rs, webhook_poll.rs, google_calendar.rs).

Step 1 — implement the trait

#![allow(unused)]
fn main() {
// crates/poller/src/builtins/jira.rs
use std::sync::Arc;

use nexo_poller::{
    OutboundDelivery, PollContext, Poller, PollerError, TickOutcome,
};
use async_trait::async_trait;
use serde::Deserialize;
use serde_json::{json, Value};

#[derive(Debug, Deserialize, Clone)]
#[serde(deny_unknown_fields)]
struct JiraConfig {
    base_url: String,
    project_key: String,
    deliver: nexo_poller::builtins::gmail::DeliverCfg,
}

pub struct JiraPoller;

#[async_trait]
impl Poller for JiraPoller {
    fn kind(&self) -> &'static str { "jira" }

    fn description(&self) -> &'static str {
        "Polls Jira for newly assigned issues in a project."
    }

    fn validate(&self, config: &Value) -> Result<(), PollerError> {
        serde_json::from_value::<JiraConfig>(config.clone())
            .map(drop)
            .map_err(|e| PollerError::Config {
                job: "<jira>".into(),
                reason: e.to_string(),
            })
    }

    async fn tick(&self, ctx: &PollContext) -> Result<TickOutcome, PollerError> {
        let cfg: JiraConfig = serde_json::from_value(ctx.config.clone())
            .map_err(|e| PollerError::Config {
                job: ctx.job_id.clone(),
                reason: e.to_string(),
            })?;

        // 1. Pull data. Use ctx.cursor for incremental fetches.
        // 2. Decide what to dispatch.
        // 3. Build OutboundDelivery items — the runner publishes them
        //    via Phase 17 credentials so you never touch the broker.

        let payload = json!({ "text": "(jira tick — replace with real fetch)" });
        Ok(TickOutcome {
            items_seen: 0,
            items_dispatched: 1,
            deliver: vec![OutboundDelivery {
                channel: nexo_auth::handle::TELEGRAM,
                recipient: cfg.deliver.to.clone(),
                payload,
            }],
            next_cursor: None,
            next_interval_hint: None,
        })
    }
}
}

Anything Poller::validate returns Err(PollerError::Config { … }) fails this job at boot — siblings keep going.

Poller::tick returns:

  • Ok(TickOutcome) — the runner persists next_cursor, increments counters, dispatches every OutboundDelivery via the agent's Phase 17 binding, and sleeps until next slot.
  • Err(PollerError::Transient(…)) — counts toward the breaker; next tick retries with backoff.
  • Err(PollerError::Permanent(…)) — auto-pauses the job and fires the failure_to alert.

PollContext.stores exposes the credential stores when your module needs paths (e.g., Gmail / Calendar built-ins read client_id_path from there). Plain ctx.credentials.resolve(…) is enough when you only need a CredentialHandle.

Step 2 — register

#![allow(unused)]
fn main() {
// crates/poller/src/builtins/mod.rs
pub mod gmail;
pub mod google_calendar;
pub mod jira;          // ← new
pub mod rss;
pub mod webhook_poll;

pub fn register_all(runner: &PollerRunner) {
    runner.register(Arc::new(gmail::GmailPoller::new()));
    runner.register(Arc::new(rss::RssPoller::new()));
    runner.register(Arc::new(webhook_poll::WebhookPoller::new()));
    runner.register(Arc::new(google_calendar::GoogleCalendarPoller::new()));
    runner.register(Arc::new(jira::JiraPoller));   // ← new
}
}

That is the only place wiring is touched. main.rs already calls register_all.

Step 3 — declare a job

# config/pollers.yaml
pollers:
  jobs:
    - id: ana_jira_assigned
      kind: jira
      agent: ana
      schedule: { every_secs: 300 }
      config:
        base_url: https://company.atlassian.net
        project_key: ENG
        deliver:
          channel: telegram
          to: "1194292426"

Run the daemon. Verify with:

agent pollers list                # ana_jira_assigned shows up
agent pollers run ana_jira_assigned   # tick on demand

Add per-kind LLM tools

Your module can ship its own tools alongside the generic pollers_* ones. Override Poller::custom_tools:

#![allow(unused)]
fn main() {
fn custom_tools(&self) -> Vec<nexo_poller::CustomToolSpec> {
    use nexo_llm::ToolDef;
    use nexo_poller::{CustomToolHandler, CustomToolSpec, PollerRunner};
    use async_trait::async_trait;

    struct JiraSearch;
    #[async_trait]
    impl CustomToolHandler for JiraSearch {
        async fn call(
            &self,
            runner: Arc<PollerRunner>,
            args: Value,
        ) -> anyhow::Result<Value> {
            // Use `runner` to inspect / mutate jobs the same way
            // built-in `pollers_*` tools do — list_jobs, run_once,
            // set_paused, reset_cursor are all available.
            let id = args["id"]
                .as_str()
                .ok_or_else(|| anyhow::anyhow!("`id` required"))?;
            let outcome = runner.run_once(id).await?;
            Ok(json!({ "matching": outcome.items_seen }))
        }
    }

    vec![CustomToolSpec {
        def: ToolDef {
            name: "jira_search".into(),
            description: "Run the Jira poll job once without persisting state.".into(),
            parameters: json!({
                "type": "object",
                "properties": {
                    "id": { "type": "string" }
                },
                "required": ["id"]
            }),
        },
        handler: Arc::new(JiraSearch),
    }]
}
}

The agent then sees jira_search automatically — no extra registration step. The adapter in nexo-poller-tools::register_all walks every registered Poller's custom_tools() and wires each spec into the per-agent ToolRegistry.

What the runner gives you for free

  • Per-job tokio task with every | cron | at schedule + jitter.
  • Cross-process atomic lease in SQLite (lease takeover after TTL expiry — daemon crash mid-tick is recoverable).
  • Cursor persistence — your next_cursor is the next tick's ctx.cursor. Survives restarts. agent pollers reset <id> clears it.
  • Exponential backoff on Transient, auto-pause on Permanent.
  • Per-job circuit breaker keyed on ("poller", job_id).
  • Outbound dispatch via Phase 17 — OutboundDelivery lands at plugin.outbound.<channel>.<instance> resolved from the agent's binding. You never touch the broker.
  • 7 Prometheus series labelled by kind, agent, job_id, status. Audit log under target=credentials.audit.
  • Admin endpoints + CLI subcommands (agent pollers …).
  • Six generic LLM tools (pollers_list, pollers_show, pollers_run, pollers_pause, pollers_resume, pollers_reset).
  • Hot-reload via POST /admin/pollers/reloadadd | replace | remove | keep plan applied atomically.

Tests pattern

#![allow(unused)]
fn main() {
#[tokio::test]
async fn validate_accepts_minimal() {
    let p = JiraPoller;
    let cfg = json!({
        "base_url": "https://x.atlassian.net",
        "project_key": "ENG",
        "deliver": { "channel": "telegram", "to": "1" },
    });
    p.validate(&cfg).unwrap();
}

#[tokio::test]
async fn validate_rejects_unknown_field() {
    let p = JiraPoller;
    let cfg = json!({ "wat": true, "deliver": { "channel": "x", "to": "1" }});
    assert!(p.validate(&cfg).is_err());
}
}

Cursor / dispatch tests follow the same pattern as the in-tree built-ins (gmail.rs, rss.rs, webhook_poll.rs).

Anti-patterns

  • Don't publish to the broker directly from tick. Return OutboundDelivery so the runner uses Phase 17 + audit log.
  • Don't share global state across modules. Use cursors for per-job state; use DashMap inside your struct for per-account caches (gmail does this for GoogleAuthClient).
  • Don't sleep inside tick for backoff. Return PollerError::Transient and let the runner own the backoff schedule — that way agent pollers reset and hot-reload still cancel cleanly.
  • Don't auto-create jobs from inside an LLM tool. The runner intentionally exposes only read + control on existing jobs. Operators own pollers.yaml.

Deploy on Hetzner Cloud (CX22)

A concrete recipe for a single-VPS production deploy. CX22 is the Hetzner sweet spot — €3.79/mo, 2 vCPU, 4 GB RAM, 40 GB SSD, ARM64, 20 TB transfer included. Runs the Nexo daemon + an internal NATS broker comfortably with headroom for the browser plugin (Chrome).

This recipe targets a single-tenant personal-agent deploy. For multi-tenant or multi-process see Phase 32.

What you end up with

  • Nexo daemon under systemd, auto-start on boot
  • NATS broker on the same host (nats-server from the official Debian package), auto-start
  • Cloudflare Tunnel for inbound HTTPS without opening ports
  • UFW firewall: only outbound + cloudflared
  • Unattended security upgrades
  • TLS handled by Cloudflare; no Let's Encrypt cert renewal to babysit

Estimated cost: ~€4/month (CX22 only; Cloudflare Tunnel is free).

0. Prerequisites

  • Hetzner Cloud account with API token
  • Cloudflare account with a domain pointed at it
  • SSH key uploaded to Hetzner (hcloud ssh-key create --name ops --public-key-from-file ~/.ssh/id_ed25519.pub)

1. Provision the VPS

Via Hetzner Cloud console: New Server → Location: any close to your users → Image: Debian 12 → Type: CX22 (ARM64, shared vCPU). Add your SSH key. Name it nexo-1.

CLI alternative:

hcloud server create \
  --name nexo-1 \
  --type cx22 \
  --image debian-12 \
  --ssh-key ops \
  --location nbg1

Wait ~30s, grab the IPv4 from the dashboard.

2. Initial hardening (one-time)

SSH in as root, then drop privileges to a sudo user:

ssh root@<ip>
adduser ops
usermod -aG sudo ops
rsync --archive --chown=ops:ops ~/.ssh /home/ops
exit

ssh ops@<ip>
sudo apt update && sudo apt full-upgrade -y
sudo apt install -y unattended-upgrades ufw fail2ban
sudo dpkg-reconfigure -p low unattended-upgrades

# Firewall: deny inbound, allow outbound + ssh from your IP only
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow from <your-home-ip> to any port 22 proto tcp
sudo ufw enable

# Disable root SSH + password auth
sudo sed -i 's/^#\?PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config
sudo sed -i 's/^#\?PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
sudo systemctl restart ssh

3. Install Nexo from the .deb

Once Phase 27.4 ships and a release exists with an arm64 .deb:

curl -LO https://github.com/lordmacu/nexo-rs/releases/latest/download/nexo-rs_arm64.deb

# Verify the signature first (Phase 27.3):
curl -LO https://github.com/lordmacu/nexo-rs/releases/latest/download/nexo-rs_arm64.deb.bundle
cosign verify-blob \
  --bundle nexo-rs_arm64.deb.bundle \
  --certificate-identity-regexp 'https://github.com/lordmacu/nexo-rs/.*' \
  --certificate-oidc-issuer https://token.actions.githubusercontent.com \
  nexo-rs_arm64.deb \
  || { echo "REFUSING TO INSTALL UNSIGNED PACKAGE"; exit 1; }

sudo apt install ./nexo-rs_arm64.deb

The post-install scaffolds the nexo user, owns /var/lib/nexo-rs/, and prints next steps. Does not auto-start the service — that comes after we wire config.

4. Install + enable NATS

# Hetzner Debian repo doesn't ship nats-server; use the upstream .deb
NATS_VERSION=2.10.20
curl -LO "https://github.com/nats-io/nats-server/releases/download/v${NATS_VERSION}/nats-server-v${NATS_VERSION}-linux-arm64.deb"
sudo apt install ./nats-server-v${NATS_VERSION}-linux-arm64.deb
sudo systemctl enable --now nats-server

NATS now listens on 127.0.0.1:4222 (loopback only) — exactly what we want; only Nexo running on the same host should reach it.

5. Wire Nexo config

sudo -u nexo nexo setup

The wizard asks for:

  • LLM provider keys (Anthropic / MiniMax / etc.) — paste them; they land in /var/lib/nexo-rs/secret/ mode 0600 owned by nexo:nexo
  • WhatsApp / Telegram pairing — defer if not needed yet
  • Memory backend — pick sqlite-vec (default for single-host)

The wizard writes /etc/nexo-rs/{agents,broker,llm,memory}.yaml. Verify broker.yaml points at nats://127.0.0.1:4222.

6. Cloudflare Tunnel for HTTPS

The Nexo admin port (8080) shouldn't be exposed directly. Use a tunnel:

# Install cloudflared
curl -LO https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-arm64.deb
sudo apt install ./cloudflared-linux-arm64.deb

# Authenticate (opens a browser link — visit it on your laptop)
cloudflared tunnel login

# Create tunnel
cloudflared tunnel create nexo-1

# Route a hostname
cloudflared tunnel route dns nexo-1 nexo.yourdomain.com

# Config
sudo mkdir -p /etc/cloudflared
sudo tee /etc/cloudflared/config.yml >/dev/null <<EOF
tunnel: nexo-1
credentials-file: /home/ops/.cloudflared/<UUID>.json

ingress:
  - hostname: nexo.yourdomain.com
    service: http://127.0.0.1:8080
  - service: http_status:404
EOF

# Run as a service
sudo cloudflared service install
sudo systemctl enable --now cloudflared

Now https://nexo.yourdomain.com reaches the Nexo admin via Cloudflare's edge — TLS terminated at Cloudflare, no cert renewal, DDoS protection bundled.

7. Start Nexo

sudo systemctl enable --now nexo-rs
sudo journalctl -u nexo-rs -f

You should see the boot sequence: config validated → broker connected → agents loaded → ready.

8. Verify

# Local health check (over the loopback)
curl -fsSL http://127.0.0.1:8080/health

# External via the tunnel
curl -fsSL https://nexo.yourdomain.com/health

# Metrics endpoint
curl -fsSL http://127.0.0.1:9090/metrics | head -20

9. Backups

The state lives in /var/lib/nexo-rs/. Daily snapshot to S3 / Backblaze:

# /etc/cron.daily/nexo-backup
#!/bin/sh
set -eu
TIMESTAMP=$(date -u +%Y%m%dT%H%M%SZ)
BACKUP="/tmp/nexo-${TIMESTAMP}.tar.zst"

# Pause the runtime briefly so SQLite isn't mid-write.
systemctl stop nexo-rs

tar -I 'zstd -19 -T0' \
    -cf "$BACKUP" \
    -C /var/lib/nexo-rs \
    --exclude='./queue/*.tmp' \
    .

systemctl start nexo-rs

# Upload — adjust to your storage backend
rclone copy "$BACKUP" remote:nexo-backups/
rm "$BACKUP"

# Retain last 30
rclone delete --min-age 30d remote:nexo-backups/

chmod +x /etc/cron.daily/nexo-backup.

For a sub-second pause-free backup, use SQLite's VACUUM INTO-based hot backup — track Phase 36 (backup, restore, migrations) for the upcoming nexo backup subcommand.

10. Updates

# Pull the latest .deb
curl -LO https://github.com/lordmacu/nexo-rs/releases/latest/download/nexo-rs_arm64.deb
# Verify (always)
cosign verify-blob ...
# Install (apt restarts the service automatically)
sudo apt install ./nexo-rs_arm64.deb

Or wire the apt repo (Phase 27.4 follow-up) and run apt upgrade nexo-rs like any other system package.

Limits + escape hatches

  • Browser plugin uses ~300 MB RAM per Chrome process. CX22 has 4 GB; budget 2 instances tops. Bump to CX32 (€7/mo, 4 vCPU, 8 GB) when you start hitting OOM.
  • NATS on the same host is fine for single-tenant; for multi-host, run NATS on its own VM (CX12, €3.29/mo).
  • TLS at Cloudflare only means traffic between Cloudflare's edge and your VPS is plain HTTP over the tunnel. The tunnel is encrypted at the transport layer (QUIC + mTLS to Cloudflare), so this is fine — but if you want defense-in-depth, terminate TLS again locally with caddy or nginx.

Troubleshooting

  • Tunnel disconnects after rebootsystemctl status cloudflared. The credentials file moved if you reinstalled cloudflared with a different service install. Re-run cloudflared service install after cloudflared tunnel login.
  • NATS refuses connections — the upstream .deb binds 0.0.0.0:4222 by default. Edit /etc/nats-server/nats-server.conf to set host: 127.0.0.1 and systemctl restart nats-server.
  • Nexo can't write to /var/lib/nexo-rs/sudo chown -R nexo:nexo /var/lib/nexo-rs && sudo chmod 0750 /var/lib/nexo-rs.
  • Docker compose — single-machine but containerized (vs systemd-native here)
  • Native install — the underlying mechanics of step 3 if you skip the .deb
  • Phase 27.4 (Debian / RPM packages) — source of the .deb this recipe consumes

Deploy on Fly.io

Recipe for a single-region Fly.io deploy. Fly's strengths fit Nexo well: persistent volumes (for the SQLite state), health checks, free TLS, easy multi-region scale-out, and a generous free tier (up to 3 shared-1x VMs free) that covers a personal agent.

What you end up with

  • Nexo daemon + bundled local NATS broker on a single Fly machine
  • Persistent volume mounted at /var/lib/nexo-rs/
  • Free TLS via fly.io subdomain (custom domain optional)
  • Auto-redeploy on every git push to main (via Fly GitHub Action)
  • Fly's built-in metrics + log streaming

Estimated cost: $0–$5/mo (free tier covers shared-1x VM + small volume; bigger Chrome workloads = $5-15/mo on a performance-1x).

0. Prerequisites

# Install flyctl
curl -L https://fly.io/install.sh | sh
fly auth login
fly auth signup     # if first time

# Confirm:
fly version

1. Initialize the app

From the repo root:

fly launch \
  --name nexo-yourname \
  --region <closest-region>  \
  --vm-cpu-kind shared       \
  --vm-cpus 1                \
  --vm-memory 1024           \
  --no-deploy

--no-deploy lets us tweak the generated fly.toml before the first build.

2. fly.toml

Replace the auto-generated fly.toml with this:

app = "nexo-yourname"
primary_region = "ams"           # or whichever closest

# Use the published GHCR image instead of building per-deploy.
[build]
  image = "ghcr.io/lordmacu/nexo-rs:latest"

# Persistent state — Fly volumes survive restarts and are
# mounted into the VM. SQLite + transcripts + secret/ live here.
[mounts]
  source = "nexo_data"
  destination = "/app/data"

# Override the container CMD so config + state align with the
# fly volume layout. NEXO_HOME defaults to /app/data so
# everything writable lands on the volume.
[env]
  RUST_LOG = "info"
  NEXO_HOME = "/app/data"

# `services` block tells Fly which container ports to expose.
[[services]]
  internal_port = 8080
  protocol = "tcp"
  auto_stop_machines = false   # keep the agent running 24/7
  auto_start_machines = true
  min_machines_running = 1

  [[services.ports]]
    port = 80
    handlers = ["http"]
    force_https = true

  [[services.ports]]
    port = 443
    handlers = ["tls", "http"]

  [services.concurrency]
    type = "connections"
    soft_limit = 200
    hard_limit = 250

  [[services.tcp_checks]]
    interval = "15s"
    timeout = "2s"
    grace_period = "30s"

# Metrics endpoint — Fly scrapes Prometheus-style automatically.
[metrics]
  port = 9090
  path = "/metrics"

# VM sizing — bump to performance-1x when the browser plugin is on.
[[vm]]
  cpu_kind = "shared"
  cpus = 1
  memory_mb = 1024

3. Create the volume

fly volumes create nexo_data --region ams --size 3

3 GB covers SQLite + a few months of transcripts. Bump as needed.

4. Set secrets

Fly's secret store injects them as env vars at runtime. Reference them from config/llm.yaml via ${ENV_VAR} placeholders:

fly secrets set ANTHROPIC_API_KEY=sk-ant-...
fly secrets set MINIMAX_API_KEY=...
fly secrets set MINIMAX_GROUP_ID=...
# Anything else your llm.yaml references via ${...}

The Nexo config loader resolves ${ANTHROPIC_API_KEY} placeholders from the process env — works the same whether the env vars come from /run/secrets/, ~/.bashrc, or Fly secrets.

5. Pre-bake the config

Fly mounts /app/data from the volume but /app/config lives inside the image. Two options:

Option A — bake config into a custom image (recommended). Wrap the GHCR image in a tiny Dockerfile:

# Dockerfile.fly
FROM ghcr.io/lordmacu/nexo-rs:latest

# Copy your operator config tree into the image. Adjust to
# whatever your setup needs — just don't ship secrets here, use
# fly secrets for those.
COPY ./config/fly /app/config

# fly.toml's CMD already passes `--config /app/config`.

Then change fly.toml:

[build]
  dockerfile = "Dockerfile.fly"

Option B — write config to the volume on first boot. Use a Fly machine init script that runs nexo setup --non-interactive --from-env once, then exits.

6. Deploy

fly deploy

First deploy spins up the volume + machine. Subsequent deploys hot-swap the image with zero-downtime rolling restart.

7. Verify

# Health
fly status
curl https://nexo-yourname.fly.dev/health

# Metrics (over the Fly internal network)
fly proxy 9090:9090 -a nexo-yourname &
curl http://127.0.0.1:9090/metrics | head -20

# Logs
fly logs

# SSH in if something looks off
fly ssh console

8. Custom domain

fly certs add nexo.yourdomain.com
# Add the CNAME to your DNS as instructed
fly certs check nexo.yourdomain.com

9. Continuous deploy on push

Drop this into .github/workflows/fly-deploy.yml:

name: fly-deploy
on:
  push:
    branches: [main]
permissions:
  contents: read
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: superfly/flyctl-actions/setup-flyctl@master
      - run: flyctl deploy --remote-only
        env:
          FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}

Get a token: fly tokens create deploy -x 999999h. Drop in repo secrets as FLY_API_TOKEN.

10. Backups

# Manual snapshot
fly volumes snapshots create nexo_data
fly volumes snapshots list  nexo_data

# Restore (creates a new volume from the snapshot)
fly volumes create nexo_data_restored \
  --snapshot-id vs_xxxxxxxxxxxx \
  --region ams

For automated backups, set up a daily Fly cron machine that runs fly volumes snapshots create against the data volume.

Limits + escape hatches

  • Free tier shared-1x has 1 vCPU + 256 MB RAM — too small for the browser plugin. Disable Chrome (plugins.browser.enabled: false) on shared-1x; or bump to performance-1x ($15/mo, 1 vCPU + 2 GB).
  • Single-region by default — Fly has a multi-region story but the broker (NATS) doesn't speak Fly's distributed primitives. For multi-region, run NATS on a dedicated VM with NatsBroker cluster mode and pin Nexo machines to the same region as their broker.
  • Volume snapshots cost $0.15/GB/month — small but adds up if you keep many. Auto-prune via the snapshot cron.

Troubleshooting

  • Volume mount fails on machine startfly volumes list must show the volume in the same region as the machine. Mismatch = create the volume in the right region or move the machine.
  • Out of memory + machine cycles — most likely the browser plugin loaded Chrome on a shared-1x. Check fly logs for OOM killer messages; bump VM size or disable the browser plugin.
  • Secrets not picked up after deploy — Fly redacts them in logs but they're in the env. SSH in (fly ssh console), run printenv | grep ANTHROPIC to verify.
  • Docker GHCR — same image Fly pulls
  • Hetzner deploy — bare-VM alternative if you outgrow Fly's free tier or want full control
  • Phase 27.5 (Docker GHCR) — source of the image this recipe pulls

Deploy on AWS (EC2)

Recipe for a single-AZ AWS deploy on t4g.small (ARM Graviton). Fits a personal-agent or small team; production multi-AZ scale-out needs Phase 32 multi-host orchestration.

What you end up with

  • Nexo daemon under systemd on EC2 + EBS gp3 for state
  • Nginx + ACM cert for TLS termination (free)
  • Route53 hostname pointing at the instance
  • IAM role granting only SES send + S3 backup-bucket access (no console / no read of other AWS resources)
  • Daily snapshot of the EBS volume + lifecycle policy retaining 30
  • CloudWatch agent shipping /var/log/nexo-rs/*.log + metrics

Estimated cost (us-east-1, on-demand):

  • t4g.small instance: ~$13.43/mo
  • gp3 16 GB EBS: ~$1.28/mo
  • Route53 hosted zone: $0.50/mo
  • ACM cert: free
  • SES outbound (5k emails/mo on free tier first 12 months): free then $0.10/1k
  • Total: ~$15-20/mo

Cheaper alternative for personal-agent budgets: use Hetzner's CX22 at €4/mo if you don't need AWS-specific integrations.

0. Prerequisites

  • AWS account with billing alarms set
  • Route53 hosted zone for your domain
  • AWS CLI installed and aws configure'd locally
  • Terraform 1.5+ if you want infra-as-code (recommended)

The repo will eventually ship deploy/terraform/aws/ (Phase 40 follow-up). Until then, here's a minimal main.tf:

terraform {
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }
}

provider "aws" {
  region = "us-east-1"
}

# --- VPC + subnet -----------------------------------------------------
resource "aws_vpc" "nexo" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_support   = true
  enable_dns_hostnames = true
  tags = { Name = "nexo" }
}

resource "aws_subnet" "nexo_public" {
  vpc_id                  = aws_vpc.nexo.id
  cidr_block              = "10.0.1.0/24"
  availability_zone       = "us-east-1a"
  map_public_ip_on_launch = true
}

resource "aws_internet_gateway" "nexo" {
  vpc_id = aws_vpc.nexo.id
}

resource "aws_route_table" "nexo_public" {
  vpc_id = aws_vpc.nexo.id
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.nexo.id
  }
}

resource "aws_route_table_association" "nexo_public" {
  subnet_id      = aws_subnet.nexo_public.id
  route_table_id = aws_route_table.nexo_public.id
}

# --- security group ----------------------------------------------------
resource "aws_security_group" "nexo" {
  name   = "nexo"
  vpc_id = aws_vpc.nexo.id

  # SSH only from your home IP — replace 1.2.3.4/32 with yours.
  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["1.2.3.4/32"]
  }

  # 443 open to the world, terminated at nginx on the instance.
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # 80 only to redirect to https.
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# --- IAM role: SES + S3 backups, nothing else --------------------------
resource "aws_iam_role" "nexo" {
  name = "nexo-instance"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action    = "sts:AssumeRole"
      Effect    = "Allow"
      Principal = { Service = "ec2.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy" "nexo" {
  name = "nexo-instance-policy"
  role = aws_iam_role.nexo.id
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      { Effect = "Allow", Action = ["ses:SendEmail","ses:SendRawEmail"], Resource = "*" },
      { Effect = "Allow", Action = ["s3:PutObject","s3:GetObject","s3:DeleteObject","s3:ListBucket"], Resource = ["arn:aws:s3:::your-nexo-backups","arn:aws:s3:::your-nexo-backups/*"] }
    ]
  })
}

resource "aws_iam_instance_profile" "nexo" {
  name = "nexo-instance"
  role = aws_iam_role.nexo.name
}

# --- AMI lookup: latest Debian 12 arm64 -------------------------------
data "aws_ami" "debian" {
  most_recent = true
  owners      = ["136693071363"]   # Debian official
  filter {
    name   = "name"
    values = ["debian-12-arm64-*"]
  }
}

# --- instance ----------------------------------------------------------
resource "aws_instance" "nexo" {
  ami                    = data.aws_ami.debian.id
  instance_type          = "t4g.small"
  subnet_id              = aws_subnet.nexo_public.id
  vpc_security_group_ids = [aws_security_group.nexo.id]
  iam_instance_profile   = aws_iam_instance_profile.nexo.name
  key_name               = "your-existing-aws-keypair-name"

  root_block_device {
    volume_size = 16
    volume_type = "gp3"
    encrypted   = true
  }

  tags = {
    Name = "nexo-1"
  }
}

# --- Route53 DNS -------------------------------------------------------
data "aws_route53_zone" "main" {
  name = "yourdomain.com."
}

resource "aws_route53_record" "nexo" {
  zone_id = data.aws_route53_zone.main.zone_id
  name    = "nexo.yourdomain.com"
  type    = "A"
  ttl     = 300
  records = [aws_instance.nexo.public_ip]
}

output "nexo_ip" {
  value = aws_instance.nexo.public_ip
}

Then:

terraform init
terraform apply
# review the plan; type 'yes'

2. Hardening + install (post-provision)

SSH in:

ssh admin@nexo.yourdomain.com
sudo apt update && sudo apt full-upgrade -y
sudo apt install -y unattended-upgrades ufw fail2ban nginx certbot python3-certbot-nginx
sudo dpkg-reconfigure -p low unattended-upgrades

# UFW — defense in depth on top of the security group
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable

# Disable root SSH + password auth
sudo sed -i 's/^#\?PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config
sudo sed -i 's/^#\?PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
sudo systemctl restart ssh

Install Nexo (when 27.4 .deb is available):

curl -LO https://github.com/lordmacu/nexo-rs/releases/latest/download/nexo-rs_arm64.deb
# Verify Cosign signature first (Phase 27.3) — see verify.md
sudo apt install ./nexo-rs_arm64.deb

NATS:

NATS_VERSION=2.10.20
curl -LO "https://github.com/nats-io/nats-server/releases/download/v${NATS_VERSION}/nats-server-v${NATS_VERSION}-linux-arm64.deb"
sudo apt install ./nats-server-v${NATS_VERSION}-linux-arm64.deb
sudo systemctl enable --now nats-server

3. nginx + ACM-via-certbot

sudo tee /etc/nginx/sites-available/nexo >/dev/null <<'EOF'
server {
    listen 80;
    server_name nexo.yourdomain.com;
    return 301 https://$server_name$request_uri;
}

server {
    listen 443 ssl http2;
    server_name nexo.yourdomain.com;

    # Cert paths populated after `certbot --nginx`
    ssl_certificate     /etc/letsencrypt/live/nexo.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/nexo.yourdomain.com/privkey.pem;
    ssl_protocols       TLSv1.2 TLSv1.3;

    # Health check — proxied through to the daemon
    location /health    { proxy_pass http://127.0.0.1:8080; access_log off; }
    location /ready     { proxy_pass http://127.0.0.1:8080; access_log off; }

    # Admin surface (auth via the daemon's session token)
    location /api/      { proxy_pass http://127.0.0.1:8080; }
    location /admin/    { proxy_pass http://127.0.0.1:8080; }

    # Block /metrics from public — scrape internally only
    location /metrics   { return 403; }
}
EOF
sudo ln -s /etc/nginx/sites-available/nexo /etc/nginx/sites-enabled/nexo
sudo nginx -t

# Issue cert (ACME via Let's Encrypt — same chain ACM uses)
sudo certbot --nginx -d nexo.yourdomain.com --non-interactive --agree-tos -m ops@yourdomain.com
sudo systemctl reload nginx

If you want AWS ACM specifically (instead of Let's Encrypt), front the EC2 with an ALB and attach an ACM cert there — adds ~$18/mo for the ALB. Most personal deploys don't need it.

4. Wire SES for outbound email

The IAM role grants ses:SendEmail. Configure in config/llm.yaml:

plugins:
  email:
    provider: ses
    aws_region: us-east-1
    # Credentials come from the EC2 instance profile — no keys
    # in the YAML.
    sender: "agent@nexo.yourdomain.com"

Verify the sender domain in SES first:

aws ses verify-domain-identity --domain yourdomain.com
# Add the printed TXT record to Route53
aws ses set-identity-mail-from-domain --identity yourdomain.com \
    --mail-from-domain mail.yourdomain.com

If your SES account is still in sandbox, request production access via the SES console — required to send to non-verified recipients.

5. EBS snapshots + lifecycle

# Daily snapshot via DLM (Data Lifecycle Manager) — set up once
# in Terraform or via the console:

aws dlm create-lifecycle-policy \
    --description "nexo daily snapshots, retain 30" \
    --state ENABLED \
    --execution-role-arn arn:aws:iam::ACCT:role/AWSDataLifecycleManagerDefaultRole \
    --policy-details '{...}'   # see DLM docs

Or the cheap way: cron + aws ec2 create-snapshot on the instance itself, retaining 30 days locally.

6. CloudWatch logs + metrics

sudo apt install -y amazon-cloudwatch-agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
# Point at /var/log/nexo-rs/*.log + 9090/metrics scrape

The Prometheus metrics endpoint can be pulled by CloudWatch Container Insights via the EMF agent if you go in that direction. For most personal deploys, journalctl + a Grafana Cloud free-tier scrape is cheaper.

Limits + escape hatches

  • t4g.small RAM (2 GB) is tight if the browser plugin is on. Bump to t4g.medium (4 GB, ~$26/mo) before turning on Chrome.
  • Single AZ. AZ outage = full downtime. Multi-AZ needs Phase 32 + an external NATS cluster. Acceptable for personal agents; not for SLAs.
  • SES sandbox limit (200 emails/day) until you request production. Plan for this if email channel is primary.
  • EIP not allocated. Stop/start the instance and the public IP changes. Allocate an Elastic IP (free when attached) if the Route53 record can't auto-update.

Troubleshooting

  • Nexo can't send emailaws sts get-caller-identity from the instance must show the nexo-instance role. If empty, the instance profile is missing.
  • certbot --nginx fails — DNS hasn't propagated yet. Wait 5-10 min after the Route53 record creation.
  • /health returns 503 — broker not ready. systemctl status nats-server; if good, check journalctl -u nexo-rs for credential errors (instance profile didn't propagate, or config/llm.yaml references a key the instance can't reach).
  • Hetzner Cloud — bare-VM, cheaper
  • Fly.io — easier scaling, less AWS lock-in
  • Phase 27.4 (Debian package) — source of the .deb this recipe consumes
  • Phase 27.3 (Cosign) — signature verification before install

Architecture Decision Records

Short documents capturing why the architecture is the way it is. Each ADR names an alternative that was considered and rejected, and the forces that drove the choice. Read these when you're tempted to change something load-bearing.

Format loosely follows Michael Nygard's ADR template: context, decision, consequences.

Index

#TitleStatus
0001Single-process runtime over microservicesAccepted
0002NATS as the brokerAccepted
0003sqlite-vec for vector searchAccepted
0004Per-agent tool sandboxing at registry build timeAccepted
0005Drop-in agents.d/ directory for private configsAccepted
0006Per-agent git repo for memory forensicsAccepted
0007WhatsApp via whatsapp-rs (Signal Protocol)Accepted
0008MCP dual role — client and serverAccepted
0009Dual MIT / Apache-2.0 licensingAccepted

Writing a new ADR

  1. Copy the template (next ADR below, or use 0001 as a reference)
  2. Number sequentially: NNNN-short-slug.md
  3. Set status: Proposed while in review, flip to Accepted or Rejected after the discussion settles
  4. Link from this index
  5. Do not edit accepted ADRs in place. Create a new ADR that supersedes it and mark the old one Superseded by NNNN.

ADRs are load-bearing documentation — they're how future you (and future contributors) learn that "NATS over RabbitMQ was not an accident."

ADR 0001 — Single-process runtime over microservices

Status: Accepted Date: 2026-01

Context

nexo-rs hosts N agents, each with its own LLM client, channel plugins, memory views, and extensions. The natural first instinct for Rust systems targeting real uptime is to split this into microservices: an agent service, a plugin service per channel, a memory service, etc., wired over the broker.

Every microservice adds:

  • A serialization boundary (more CPU, more latency)
  • A deployment artifact (more Dockerfiles, more CI)
  • A failure mode (service down vs process down)
  • An ops surface (metrics, health, logs per service)

The alternative — one binary hosting every subsystem as tokio tasks — gives up none of the durability (the disk queue + DLQ survive a process restart anyway) and keeps all in-memory caches naturally shared.

Decision

Ship one binary (agent) that hosts:

  • Every agent runtime (one tokio task per agent)
  • Every channel plugin (WhatsApp, Telegram, browser, …)
  • Broker client + disk queue + DLQ
  • Memory (short-term in-mem, long-term SQLite, vector sqlite-vec)
  • Extension runtimes (stdio / NATS)
  • MCP client and server
  • TaskFlow runtime
  • Metrics + health + admin HTTP servers

Coordination between tasks happens over the broker (NATS or the local mpsc fallback) exactly as if they were separate processes. Swapping to microservices later requires zero code changes on either side of the bus.

Consequences

Positive

  • One Dockerfile, one health probe, one metrics endpoint
  • No IPC overhead on hot paths (LLM tool calls go ToolRegistry → Extension through a tokio channel, not a network hop)
  • Memory caches (session, tool registry) are naturally shared
  • Simpler ops: one log stream, one trace span hierarchy

Negative

  • A bug that panics the process takes down every agent at once (the single-instance lockfile mitigates the blast radius by preventing silent double-boot)
  • Scaling out means running more agent processes pointed at the same NATS — isolation between them requires deliberate NATS subject partitioning

Escape hatch

If a subsystem needs its own lifecycle (example: a GPU-heavy inference service), ship it as a NATS extension — it's automatically out-of-process and auto-discovered by the agent. Microservices by the back door, without splitting the monolith first.

ADR 0002 — NATS as the broker

Status: Accepted Date: 2026-01

Context

The event bus sits under every inter-plugin and inter-agent communication. Requirements:

  • Subject-based routing with wildcards (plugin.inbound.*, agent.route.<id>)
  • Low-latency pub/sub (sub-millisecond on LAN)
  • No broker-side state to manage unless we opt in
  • Clustered production deployments
  • Mature async Rust client

Alternatives considered:

  • RabbitMQ — heavier, queue-per-binding mental model fits less well for fan-out across plugin instances, ops overhead higher
  • Redis streams / pub-sub — streams are great for durable event logs but the stream-per-subject model clashes with free-form plugin.outbound.<channel>.<instance> naming; pub-sub has no durability
  • Kafka — overkill for sub-millisecond request/reply loops, heavy ops, partition count becomes a thing you think about
  • Custom over TCP — too much invented complexity

Additional implementation note: a crate literally called natsio came up in early design research; it does not exist on crates.io. The real Rust client is async-nats (from the NATS org itself), matching the NATS 2.10 server line.

Decision

Use NATS as the broker. Specifically:

  • Client: async-nats = "0.35" (pinned in Cargo.toml)
  • Subject namespace: plugin.inbound.*, plugin.outbound.*, plugin.health.*, agent.events.*, agent.route.*
  • Fallback: a local tokio::mpsc bus implementing the same Broker trait for offline / single-machine runs
  • Durability: SQLite disk queue in front of every publish; drains FIFO on reconnect; 3 attempts before DLQ

Consequences

Positive

  • Standard ops path (monitor on :8222/healthz, prometheus exporter, clustering via well-known recipes)
  • Pub/sub semantics are trivial to reason about
  • Swapping in JetStream later for persistent streams is additive
  • Zero broker state in the happy path — restart NATS without catastrophe thanks to the disk queue

Negative

  • NATS auth (NKey / JWT) has its own learning curve — see the NATS TLS + auth recipe
  • No built-in message ordering guarantee across subjects (only per-subscriber). Callers that need ordering (e.g. delegation with correlation id) must enforce it themselves

Forbidden anti-pattern

  • Do not use natsio or any other non-async-nats client. The crate doesn't exist on crates.io; copy-paste from older design docs will mislead.

ADR 0003 — sqlite-vec for vector search

Status: Accepted Date: 2026-02

Context

Agents benefit from semantic recall — surface a memory whose text doesn't share keywords with the query but shares meaning. The usual playbook: run a dedicated vector database.

Requirements:

  • Zero extra infrastructure for single-machine deployments
  • Same durability and transactional model as the rest of memory
  • Embedding-dimension sanity checks at startup
  • Hybrid retrieval (keyword ⊔ vector) without a separate query plane

Alternatives considered:

  • Qdrant / Weaviate / Milvus — all excellent; all require an extra service, network hop, and ops surface
  • pgvector — would force Postgres everywhere, abandoning SQLite for long-term memory
  • Simple numpy file + linear scan — works for small datasets, falls over past ~10k memories per agent

Decision

Use sqlite-vec: a SQLite extension that adds a vec0 virtual table in the same DB file as long-term memory.

  • One SQLite file holds memories, memories_fts, and vec_memories — a single JOIN returns content + tags alongside similarity
  • Dimension is checked at schema init; mismatch between config and existing rows aborts startup with an explicit message
  • sqlite3_auto_extension registers once per process
  • Hybrid retrieval uses Reciprocal Rank Fusion (K=60) over the keyword FTS5 hits and the vector neighbors

Consequences

Positive

  • Zero-infra single-machine deploys keep working — no extra service to run
  • Backups, replication, export are all just "copy the .db file"
  • Transactional writes: INSERT into memories + vec_memories in one statement; no dual-write races
  • Hybrid retrieval is easy (see vector docs)

Negative

  • sqlite-vec is newer than Qdrant; its indexing algorithm improves over time. Large indexes may need re-sorting periodically
  • Changing embedding models (even same-dimension ones) produces a stale index — the ADR doesn't solve this, users must reindex
  • The sqlite3_auto_extension registration happens once per process and has caught test suites that spawn many short-lived connections off-guard

Swap-out path

EmbeddingProvider is a trait and the recall_mode = vector branch is a single code path. Replacing sqlite-vec with Qdrant is a day's work, not a rewrite.

ADR 0004 — Per-agent tool sandboxing at registry build time

Status: Accepted Date: 2026-02

Context

The same process hosts agents with very different blast radii. Ana runs on WhatsApp against leads; Kate manages a personal Telegram; ops has Proxmox credentials. The LLM in one agent must never see — let alone invoke — tools registered for another agent.

Three enforcement points are possible:

  1. Prompt-level sandboxing — "don't use these tools." Relies on model compliance. Fails under adversarial prompts.
  2. Runtime filter — every tools/call checks a policy before dispatch. Robust, but the LLM still sees the tools in tools/list and can hallucinate calls.
  3. Registry build-time pruning — the agent's ToolRegistry is built with only the allowed tools. The LLM literally cannot see the others.

Decision

Default to registry build-time pruning.

  • allowed_tools: [] (empty) = every registered tool visible
  • allowed_tools: [glob, …] = strict allowlist, tools not matching are removed from the registry before the LLM's tools/list call is answered
  • For agents with inbound_bindings[], the base registry keeps every tool and per-binding overrides apply build-time filtering at turn time — a single agent can narrow its surface differently per channel

Additional layers stack on top:

  • outbound_allowlist.<channel>: [recipients] — even with whatsapp_send_message in the registry, the runtime rejects sends to unlisted recipients (defense in depth)
  • tool_rate_limits — per-tool rate limiting for side-effectful tools
  • Per-agent workspace and long-term memory (WHERE agent_id = ?) — data-level isolation

Consequences

Positive

  • Adversarial prompts can't invoke missing tools — the model has no token string for them
  • Easy mental model: grep allowed_tools to see what an agent can do
  • Prompt tokens stay small (tool list scales with allowlist, not registry)

Negative

  • A misconfigured allowed_tools silently hides tools the LLM expected to use — the agent returns "I can't do that," puzzling both user and developer. Mitigation: agent status shows the effective tool set per agent
  • Dynamic granting mid-session is not supported (would require re-handshake with the MCP clients)

ADR 0005 — Drop-in agents.d/ directory for private configs

Status: Accepted Date: 2026-02

Context

Two kinds of agent content coexist in the same project:

  • Public — the framework demo agents, ops helpers, templates
  • Private — sales prompts, tarifarios, internal phone numbers, compliance-flagged customer scripts

The obvious "one agents.yaml" approach forces everything to be either committed (leaking business content) or gitignored (losing the template reference). Neither is acceptable.

Decision

Split by path convention:

  • config/agents.yaml — committed, public-safe defaults
  • config/agents.d/*.yamlgitignored drop-in directory
  • config/agents.d/*.example.yaml — committed templates
  • Merge happens at load time: every .yaml in agents.d/ gets its agents: array concatenated to the base list
  • Files load in lexicographic filename order, so 00-common.yaml
    • 10-prod.yaml composes predictably
  • .gitignore includes:
    config/agents.d/*.yaml
    !config/agents.d/*.example.yaml
    

Consequences

Positive

  • Safe to open-source the repo; real business content stays private
  • Templates stay in git (ana.example.yaml) so newcomers can copy and fill
  • Per-environment layering falls out for free (00-dev.yaml vs 10-prod.yaml per deploy)

Negative

  • Agent-id collisions across files are possible — the loader rejects them at startup with an explicit error. Operators must coordinate file naming
  • Not every config is split this way — some operators expected plugins.d/, llm.d/, etc. We decided against the generalization until a concrete need appeared

ADR 0006 — Per-agent git repo for memory forensics

Status: Accepted Date: 2026-03

Context

An agent's memory evolves over time — dream sweeps promote memories, the agent writes USER.md / AGENTS.md / SOUL.md revisions, session closes append to MEMORY.md. When an agent misbehaves, "what did it know and when?" is a real debugging question.

Options considered:

  • Append-only audit log per write — possible, but rolls out a custom scheme for every file
  • DB-level revision history — works for LTM rows but not for workspace markdown files
  • Git — battle-tested, standard tooling, git log and git blame ship with every developer's laptop

Decision

When workspace_git.enabled: true, the agent's workspace directory is a per-agent git repository. The runtime commits at three specific moments:

  • Dream sweep finishes — commit subject promote, body lists promoted memories with scores
  • Session close — commit subject session-close, body includes session id and agent id
  • Explicit forge_memory_checkpoint(note) tool call — commit subject checkpoint: {note}

Commit mechanics:

  • Staged: every non-ignored file (respects auto-generated .gitignore that excludes transcripts/, media/, *.tmp)
  • Skipped: files larger than 1 MiB (MAX_COMMIT_FILE_BYTES)
  • Idempotent: no-op commit if tree clean
  • Author: {agent_id} <agent@localhost> (configurable)
  • No remote by default — operators add one if archival matters

Consequences

Positive

  • git log gives you a timestamped history of every memory evolution, for free
  • memory_history tool lets the LLM reason about its own past state — e.g. "what did I believe about this user last week?"
  • git diff <oldest>..HEAD is one command away when debugging
  • Familiar tooling for humans (git bisect a misbehaving agent)

Negative

  • Repositories grow over time; operators should add a remote with periodic push-and-repack
  • Commits are process-scoped — an agent process crash between "write MEMORY.md" and "commit" leaves an uncommitted diff. The next commit picks it up, but at that point the audit event is merged
  • Transcripts are intentionally excluded from commits — they can be enormous and aren't the forensic artifact the ADR is aimed at

ADR 0007 — WhatsApp via whatsapp-rs (Signal Protocol)

Status: Accepted Date: 2026-02

Context

"Add WhatsApp support" has three common paths:

  1. Official WhatsApp Business API — rate-limited, costs per message, requires business verification, limits proactive outreach to approved templates. Fine for some deployments, a bad fit for "run an agent on your personal number for a small business."
  2. Unofficial web-scraping libraries (e.g. whatsapp-web.js) — pretend to be a browser, fragile against UI changes, frequently banned
  3. Signal Protocol reimplementation — speak the native protocol that the WhatsApp mobile app speaks. Stable, fast, no scraping, permits all message types (voice, media, reactions, edits, etc.)

Decision

Use whatsapp-rs (Cristian's crate) which implements the Signal Protocol handshake + pairing + message layer in Rust. nexo-rs wraps it in crates/plugins/whatsapp:

  • Pairing: setup-time QR scan via Client::new_in_dir() — the wizard creates a per-agent session dir and renders the QR as Unicode blocks
  • Runtime: the plugin subscribes to inbound messages, forwards to plugin.inbound.whatsapp[.<instance>], handles the outbound side via the tool family (whatsapp_send_message, whatsapp_send_reply, whatsapp_send_reaction, whatsapp_send_media)
  • Credentials expiry: the plugin does not fall back to a runtime QR on 401 — the operator must re-pair via the wizard. The runtime refuses to boot without valid creds. This is a deliberate safety net against silent re-pair loops that would cross-deliver to the wrong account
  • Multi-account: each agent points at its own session dir. No XDG_DATA_HOME mutation

Consequences

Positive

  • Full feature coverage (voice, media, reactions, edits, groups)
  • No per-message cost beyond the bandwidth
  • No business-verification paperwork
  • Works on a personal number, a secondary SIM, anything you can pair to WhatsApp's Linked Devices

Negative

  • Signal Protocol parity is non-trivial; keeping up with WhatsApp protocol evolution is an ongoing commitment of whatsapp-rs
  • Running an agent on a personal number is a policy choice. WhatsApp's Terms of Service don't love automated accounts; use whatsapp-rs on numbers you own and are ready to re-pair if they get banned
  • Multi-account needs careful session-dir management — see Plugins — WhatsApp gotchas

Forbidden alternatives

  • Puppeteer / whatsapp-web.js / selenium — pulls the entire Chromium runtime into the process, breaks constantly, and is detected and banned faster than the Signal Protocol path
  • Business API — only if the deployment pays for it and the agent flow survives template constraints; ship a separate plugin if this comes up

ADR 0008 — MCP dual role: client and server

Status: Accepted Date: 2026-03

Context

Model Context Protocol is becoming the de facto integration surface for LLM-driven tools. Two questions arose during the Phase 12 design:

  1. Should the agent be an MCP client (consume external MCP servers as tools)?
  2. Should the agent be an MCP server (expose its own tools to external MCP clients like Claude Desktop, Cursor, Zed)?

These are independent decisions. Picking one does not force the other.

Decision

Do both. Same process, same ToolRegistry, different transports.

  • ClientMcpRuntimeManager spawns stdio or HTTP MCP servers per session (with a shared "sentinel session" for servers that don't need per-session isolation). Their tools register into the per-session ToolRegistry with names like {server_name}_{tool_name} and are callable by the agent like any built-in
  • Serveragent mcp-server subcommand reads JSON-RPC from stdin and writes responses to stdout. An mcp_server.yaml allowlist controls which tools are exposed. Configurable auth_token_env guards the initialize call when the server is exposed through a tunnel

Both sides speak MCP 2024-11-05 (streamable HTTP) with SSE fallback for legacy servers.

Consequences

Positive

  • Being a client: any MCP-speaking tool ecosystem is reachable without writing a custom extension
  • Being a server: the agent's tools + memory become available inside Claude Desktop / Cursor / Zed — cross-session memory, remote actions, etc.
  • Interop with the broader MCP catalog is a configuration change, not a code change

Negative

  • Two independent code paths to keep current as the MCP spec evolves
  • expose_proxies configuration gotcha: enabling it on the server side makes every upstream MCP server transitively visible to the consuming client. Default is false and the docs call this out explicitly
  • MCP spec churn (2024-11-05 vs future versions) needs staying power

ADR 0009 — Dual MIT / Apache-2.0 licensing

Status: Accepted Date: 2026-04

Context

Open-sourcing nexo-rs required picking a license. Constraints:

  • The Rust ecosystem convention (rustc, tokio, serde, clap, axum…) is dual MIT / Apache-2.0
  • Downstream projects should be able to pick whichever license fits their own project's obligations
  • Attribution to the original author must be legally enforceable — the author explicitly asked that users "use it, just name me"
  • The author doesn't want to ship a custom / restrictive license that confuses or scares off contributors

Alternatives considered:

  • MIT alone — fine, but missing the explicit patent grant that Apache-2 gives (relevant to corporate downstream users)
  • Apache-2 alone — fine, but incompatible with GPLv2 downstream (MIT is compatible)
  • AGPL-3 — forces source-release on SaaS; nexo-rs isn't trying to prevent cloud forks
  • BSL (Business Source License) — source-available with time-delayed open-source conversion; inappropriate for a framework whose value is in wide adoption
  • Custom "use it, name me" — would need a lawyer for every edge case; a solved problem doesn't need a new solution

Decision

Dual-license under MIT OR Apache-2.0:

  • LICENSE-MIT — full text of the MIT License, 2026 Cristian García
  • LICENSE-APACHE — full text of the Apache-2.0 License
  • Cargo.toml: license = "MIT OR Apache-2.0" (SPDX)
  • NOTICE file at repo root (required to be preserved by Apache-2.0 §4(d)) carries the attribution — author, contact, original repo URL
  • README links all three + explains the SPDX choice

Downstream users pick whichever they prefer. Attribution is mandatory under both.

Consequences

Positive

  • Fits existing Rust ecosystem tooling (crates.io, rustdoc headers, CI scanners)
  • Maximum compatibility: GPLv2 projects pick MIT, patent-sensitive corporate projects pick Apache-2
  • NOTICE file gives the author the strongest attribution lever available in permissive OSS: removing it is a license violation

Negative

  • Contributors who want to submit PRs agree (per Apache-2 §5) that their contributions are dual-licensed under the same terms. Some contributors may require a CLA discussion; none so far
  • Trademark on the name "nexo-rs" is not covered — this ADR is about the code, not the brand. If the brand becomes load-bearing, register a trademark separately
  • License — human-facing version of this decision
  • NOTICE — enforceable attribution block

Contributing

PRs welcome. A few ground rules keep the codebase coherent.

Workflow

All feature work follows the /forge pipeline:

/forge brainstorm <topic>  →  /forge spec <topic>  →  /forge plan <topic>  →  /forge ejecutar <topic>

Per-sub-phase done criteria live in PHASES.md.

Rules of the road

  • All code, code comments, and Markdown docs in English.
  • No hardcoded secrets. Use ${ENV_VAR} or ${file:...} in YAML.
  • Every external call goes through CircuitBreaker. No exceptions.
  • Don't commit anything under secrets/.
  • Don't skip hooks (--no-verify). Fix the underlying lint / test issue instead.

Docs must follow

Any change that touches user-visible behavior — features, config fields, CLI flags, tool surfaces, retry policies — must update the mdBook under docs/ in the same commit. Docs phase plan: docs/PHASES.md. All mdBook pages must be written in English.

Pure-internal changes (private renames, refactors, test-only) are exempt — mention that explicitly in the commit body.

Local checks

cargo fmt --all
cargo clippy --workspace --all-targets -- -D warnings
cargo test --workspace
./scripts/check_mdbook_english.sh
./scripts/check_markdown_english.sh
mdbook build docs

CI runs all of the above on every push and every PR.

Git pre-commit hook

The repo ships a pre-commit hook at .githooks/pre-commit that:

  1. Docs-sync gate — rejects the commit if production files under crates/, src/, config/, extensions/, scripts/, .github/, or Cargo.{toml,lock} are staged without anything under docs/.
  2. cargo fmt --all -- --check
  3. cargo clippy --workspace -- -D warnings
  4. cargo test --workspace --quiet

Enable it once per clone:

git config core.hooksPath .githooks

(./scripts/bootstrap.sh does this for you.)

Bypass tags

The docs-sync gate honors a single opt-out tag. Include it in the commit message when the change is genuinely internal and doesn't need docs:

refactor: rename private fn [no-docs]

Acceptable reasons:

  • Private refactor, no change to any public API
  • Test-only changes
  • Dependency bumps with no behavior change
  • CI-config fiddling that doesn't alter ops

Do not use [no-docs] for anything a user would notice. If in doubt, update the docs — it's the lower-regret path.

Full escape hatch

git commit --no-verify disables all hooks (fmt, clippy, tests, docs-sync). Last resort, not a habit.

Reporting issues

Open a GitHub issue with:

  • nexo-rs version / commit hash
  • Rust version (rustc -V)
  • OS / arch
  • Relevant log lines (redact secrets)
  • Minimal reproduction

License of contributions

Contributions are dual-licensed MIT OR Apache-2.0 as described in License.

Releases

Two complementary tools own the release pipeline:

ToolOwns
release-plzversion bumps, git tags, crates.io publish, per-crate CHANGELOG.md
cargo-distcross-target binary tarballs, curl | sh / PowerShell installers, sha256 sidecars

They run on the same tag (nexo-rs-v<version>) and stay independent — no overlapping config. Phase 27 brings both online; Phase 27.2 wires the GitHub Actions workflow that combines them on tag push.

What ships

The nexo binary is the only artifact in release tarballs. Every other binary in the workspace (driver subsystem, dispatch tools, companion-tui, mock MCP server) carries [package.metadata.dist] dist = false so cargo-dist excludes it. Dev / smoke programs (browser-test, integration-browser-check, llm_smoke) live as [[example]] entries under examples/ for the same reason.

Build provenance — nexo version

build.rs injects four stamps captured at compile time:

  • NEXO_BUILD_GIT_SHA — short git SHA of the build commit (or unknown outside a git checkout)
  • NEXO_BUILD_TARGET_TRIPLE — full Rust target triple
  • NEXO_BUILD_CHANNEL — opaque channel marker; defaults to source. The release workflow overrides via NEXO_BUILD_CHANNEL=apt-musl (etc.) so support tickets carry install-channel provenance.
  • NEXO_BUILD_TIMESTAMP — UTC ISO8601 timestamp of the build

Operators see them with:

nexo version
# nexo 0.1.1
#   git-sha:   abc1234
#   target:    x86_64-unknown-linux-musl
#   channel:   apt-musl
#   built-at:  2026-04-27T12:34:56Z

nexo --version (without --verbose or the subcommand) prints the short form nexo <version>.

Local validation

make dist-check

Builds the host-target tarball via dist build --target $(rustc -vV | sed -n 's|host: ||p') and runs scripts/release-check.sh. The smoke gate verifies every present tarball contains the bin + LICENSE-* + README.md and that the host-native --version output matches the workspace version. Targets the local toolchain can't satisfy emit [release-check] WARN lines instead of failing.

Full setup notes (cargo-dist, cargo-zigbuild, zig, rustup targets): packaging/README.md.

What's automatic vs manual

StepOwner
Bump version + open release PRrelease-plz (CI on push to main)
Tag commit + crates.io publishrelease-plz (on PR merge)
Build 2 musl tarballs (x86_64 + aarch64)release.yml (Phase 27.2 ✅) — cargo-dist
Build Termux .deb (aarch64-linux-android)release.yml (Phase 27.2 ✅) — packaging/termux/build.sh
Upload tarballs + Termux deb + sha256 sidecarsrelease.yml (Phase 27.2 ✅)
Smoke-test nexo --version + provenance stampsrelease.yml (Phase 27.2 ✅)
Sign tarballs + Termux deb (cosign keyless)sign-artifacts.yml (Phase 27.3 ✅)
Generate CycloneDX + SPDX SBOMssbom.yml (Phase 27.9 🔄)
Apt repo publish + signed Release filePhase 27.4 deferred
Yum / dnf repo publishPhase 27.4 deferred
Termux pkg indexPhase 27.8 deferred
Homebrew bottle auto-PRPhase 27.6 PARKED (Apple targets dropped)
nexo self-updatePhase 27.10 deferred

Adding a new bin to the release

  1. Declare the [[bin]] in the appropriate crate's Cargo.toml.
  2. If the crate hosts the bin via [package.metadata.dist] dist = false, either remove that opt-out or move the bin to a new crate that doesn't carry it.
  3. Re-run make dist-check and confirm the new bin shows up under [bin] in the dist plan output.
  4. Update scripts/release-check.sh's per-archive content check if the new bin should be required.

Adding a new target

  1. Append the target triple to targets = […] in dist-workspace.toml.
  2. Append the matching tarball name to EXPECTED_TARBALLS in the smoke gate.
  3. Land the toolchain story in the GH Actions release workflow (Phase 27.2) — without that, the target builds locally only.

License

nexo-rs is dual-licensed under either:

at your option. SPDX: MIT OR Apache-2.0.

Attribution is required

Redistributions — source, binary, modified, or unmodified — must preserve the NOTICE file and the copyright attribution, as required by Section 4(d) of the Apache License.

Nexo-rs
Copyright 2026 Cristian García <informacion@cristiangarcia.co>

This product includes software developed by Cristian García.
Original project: https://github.com/lordmacu/nexo-rs

Why dual-licensed

Dual MIT / Apache-2.0 is the Rust ecosystem convention (rustc, tokio, serde, clap, etc.). It maximizes downstream compatibility:

  • MIT is compatible with GPLv2 (Apache-2.0 is not)
  • Apache-2.0 grants explicit patent rights (MIT does not)

Users pick whichever fits their project.

Contributions

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in nexo-rs by you shall be dual-licensed as above, without any additional terms or conditions — per Section 5 of the Apache License.

API reference (rustdoc)

Every public type, trait, function, and module in the nexo-rs workspace is documented via cargo doc. The CI workflow runs cargo doc --workspace --no-deps and publishes the output under /api/ on the same GitHub Pages deployment as this book.

Open the rustdoc

What's there

One rustdoc page per workspace crate:

CrateContents
agentTop-level binary — mostly wiring; see src/main.rs.
nexo-coreAgent trait, AgentRuntime, SessionManager, ToolRegistry, HookRegistry, agent-facing tools (memory, taskflow, self_report, delegate, workspace_git).
nexo-brokerBroker trait (NatsBroker, LocalBroker), disk queue, DLQ.
nexo-llmLlmClient trait, MiniMax / Anthropic / OpenAI-compat / Gemini clients, retry + rate limiter.
nexo-memoryShort-term / long-term / vector types, LongTermMemory API.
nexo-configYAML struct types, env/file placeholder resolution.
nexo-extensionsExtensionManifest, ExtensionDiscovery, StdioRuntime, CLI.
nexo-mcpMCP client + server primitives.
nexo-taskflowFlow, FlowStore, FlowManager, WaitEngine.
nexo-resilienceCircuitBreaker.
nexo-setupWizard field registry, YAML patcher.
nexo-tunnelCloudflared tunnel helper.
nexo-authPer-agent credential gauntlet, resolver, audit.
nexo-plugin-*Channel plugins (browser, whatsapp, telegram, email, google, gmail-poller).

When to read rustdoc vs the book

GoalStart here
Understand a subsystem's purposethis book
Read a specific trait's methods / signaturesrustdoc
Wire two subsystems togetherbook → rustdoc
Embed a crate in your own binaryrustdoc
Audit what's public APIrustdoc (anything not in rustdoc is internal)

Building locally

# All crates, no dependencies:
cargo doc --workspace --no-deps

# Open the nexo-core rustdoc in a browser:
cargo doc -p nexo-core --no-deps --open

Warnings are rejected in CI (RUSTDOCFLAGS=-D warnings). Run the same locally before pushing if you edited doc comments:

RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps

Public-API stability

The workspace has not committed to semver-level stability yet. Public signatures change between code phases; follow PHASES.md and commit history when upgrading.