Introduction
nexo-rs is a Rust framework for building multi-agent LLM systems that live on real messaging channels — WhatsApp, Telegram, email — instead of a chat webapp. Event-driven (NATS, or a built-in local broker — zero external infra), per-agent tool sandboxes, drop-in configuration for private vs. public agents.
Coming from OpenClaw? That's the closest reference point — TypeScript, Node, single-process. nexo-rs keeps the "agents on real channels" idea and trades JS familiarity for: a single static binary, a fault-tolerant broker layer (NATS or a local stdio bridge — no NATS server required), per-agent capability sandboxes, durable workflows, secrets audit, plugin SDKs in 4 languages, and Termux-first portability. See vs OpenClaw for the full side-by-side.
One process, many agents, many channels. Kate handles your personal Telegram; Ana works the WhatsApp sales line; a cron-style poller sweeps Gmail for leads — all sharing one broker, one tool registry, and one memory layer.
Boots with zero config. nexo runs against documented defaults
when no config dir exists — 0 agents, local broker, admin RPCs +
health endpoint live. Add a persona, a YAML, or nexo init's 19
commented sample files when you're ready. Switch broker mode at
runtime with nexo set-broker {local,nats}.
Single binary, ~90 MB. No Node, no npm, no Docker required.
The release tarball you download is ~15 MB (xz-compressed); the
.deb is ~18 MB, the .rpm ~25 MB. Runs on a fresh VPS, on Termux
without root, or as a systemd unit. Pre-built binaries ship for
Linux (x86_64 / aarch64, static musl), macOS (Intel / Apple Silicon),
and Windows; .deb / .rpm / Termux .deb too.
flowchart LR
WA[WhatsApp] --> NATS[(NATS broker)]
TG[Telegram] --> NATS
MAIL[Email / Gmail poller] --> NATS
BROWSER[Browser CDP] --> NATS
NATS --> ANA[Agent: Ana]
NATS --> KATE[Agent: Kate]
NATS --> OPS[Agent: ops-bot]
ANA --> TOOLS[Tools & extensions]
KATE --> TOOLS
OPS --> TOOLS
TOOLS --> MEM[(Memory: SQLite + sqlite-vec)]
TOOLS --> LLM{{LLM providers}}
Why it exists
Most "agent frameworks" assume one LLM talking to one user through one UI. Real deployments are not shaped that way:
- Several agents with different personas, models, and skills
- Multiple channels (WA + Telegram + mail) feeding the same agents
- Business logic that is not LLM-driven (scheduled tasks, regex email triage, lead notifications) running next to the LLM loop
- Private prompts and pricing tables alongside an open-source core
nexo-rs is opinionated toward that shape.
What's in the box
| Area | What ships |
|---|---|
| Runtime | Multi-agent core, SessionManager, Heartbeat, CircuitBreaker. Boots zero-config; nexo init scaffolds 19 commented sample YAMLs |
| Broker | NATS (async-nats) + disk queue + DLQ + backpressure, or local — stdio-bridge for subprocess plugins, no external server. Flip at runtime: nexo set-broker {local,nats} |
| LLMs | MiniMax M2.5 (primary), Anthropic (OAuth + API), OpenAI-compat, Gemini, DeepSeek |
| Plugins | WhatsApp, Telegram, Email, Browser (CDP), Google (Gmail/Calendar/Drive/Sheets), Web Search (Brave/Tavily/DuckDuckGo/Perplexity). Install with nexo plugin install <owner>/<repo> (GitHub Releases tarball) or cargo install nexo-plugin-{whatsapp,telegram,email,browser,google,web-search} from crates.io |
| Memory | Short-term in-memory, long-term SQLite, vector via sqlite-vec |
| Extensions | TOML manifest, stdio + NATS runtimes, CLI, 20+ skills shipped |
| MCP | Client (stdio + HTTP), agent as MCP server, hot-reload |
| TaskFlow | Durable multi-step flow runtime with wait/resume |
| Soul | Identity, MEMORY.md, dreaming, workspace-git, transcripts |
| Personas | Out-of-tree agent definitions installed via nexo persona install <owner>/<repo> (v2 manifest, GitHub Releases). Cody is the reference pack. |
Who it is for
- Developers who want to run real agents — not a ChatGPT demo with retrieval.
- Multi-tenant single-install — several agents, several channels, isolated by config.
- Fault-tolerance-first teams — disk queue, DLQ, circuit breakers, single-instance lock, no message drop on reconnect.
- Anyone extending with their own stack — stdio extensions in any language, MCP, drop-in private agents.
What it is not
- Not a chatbot, not a webapp. It has no UI of its own.
- Not a replacement for LangChain/LlamaIndex as a "primitives library". It is an operational runtime.
- Not a channel-abstraction layer. WhatsApp behaves like WhatsApp, Telegram like Telegram. The runtime surfaces channels, not uniforms them.
Three minutes to a running agent
# 1. Install nexo-rs — pre-built binary, no Rust toolchain needed:
curl -fsSL https://lordmacu.github.io/nexo-rs/install.sh | bash
# (or `cargo install nexo-rs` from crates.io, or download the
# .deb / .rpm / Termux .deb from GitHub Releases.)
# 2. Add a channel plugin (GitHub Releases tarball OR crates.io):
nexo plugin install lordmacu/nexo-plugin-whatsapp
# Built-ins: nexo-plugin-{whatsapp,telegram,email,browser,google,web-search}.
# crates.io path: `cargo install nexo-plugin-web-search` (etc.).
# 3. Install the Cody programmer-pair persona (or any other v2 pack):
nexo persona install lordmacu/nexo-persona-cody
# 4. Boot. The daemon picks up the persona + plugins automatically.
nexo
nexo (no subcommand) runs the daemon against documented
defaults for every YAML when no config dir exists; nexo plugin install drops a channel plugin under <state_dir>/plugins/;
nexo persona install lays down a ready-to-run agent + plugin
bindings under <state_dir>/personas/. To tune from a documented
baseline instead of the bare defaults, run nexo init first to
scaffold 19 commented sample YAMLs. Other install channels
(Docker, Nix, native packages, Termux): see the
installation guide.
Build your own persona pack? See Installing personas for the v2 manifest shape + GitHub Releases wire convention.
Next
- Zero-config quickstart (30s)
- Installation
- Quick start (10min walkthrough)
- Installing personas
- Architecture overview
- API reference (rustdoc) — every public type in the workspace
Why nexo-rs
If you've tried to build a real agent system, you know the gap.
Most "agent frameworks" assume one LLM talking to one user through one UI. Real deployments are not shaped that way.
You have several agents with different personas, models, and skills. Multiple channels (WhatsApp + Telegram + mail) feed the same agents. Business logic that is not LLM-driven (scheduled tasks, regex email triage, lead notifications) runs next to the LLM loop. Private prompts and pricing tables live alongside an open-source core. Customer support agents need a different tool sandbox than your billing bot.
nexo-rs is opinionated toward that shape.
What you get
- One process, many agents, many channels. Kate handles your personal Telegram. Ana works the WhatsApp sales line. A cron-style poller sweeps Gmail for leads — all sharing one broker, one tool registry, and one memory layer.
- Single binary, ~90 MB. No Node, no npm, no Docker required.
The release tarball you actually download is ~15 MB (xz-compressed);
the
.debis ~18 MB, the.rpm~25 MB. Runs on a fresh VPS, on Termux without root, or as a systemd unit. - Production-grade by default. Event bus with disk fallback —
NATS for multi-host, or a built-in
localbroker (stdio-bridge for subprocess plugins, no external server) for single-host;nexo set-brokerflips between them. Per-agent capability sandboxes. Cosign-verified plugin marketplace. Multi-tenant SaaS-ready. - Zero-config boot.
nexoruns against documented defaults with no config dir — admin RPCs + health live, 0 agents. Add a persona, a YAML, ornexo init's 19 commented samples when you're ready.
Three layers of extensibility
When you're ready to add functionality, pick the right layer:
| Layer | Use it when | Ships as |
|---|---|---|
| Plugin | You want a new channel (Discord, Slack, custom protocol) or to expose new tools to agents. | Subprocess in your favorite language (Rust, Python, TypeScript, PHP), tarball published via GitHub Releases. |
| Extension | You're bundling tools, advisors, skills, MCP servers as a self-contained unit — typical for SaaS verticals (sales, support, marketing). | Local-path tarball; operator runs nexo ext install. |
| Microapp | You're building a SaaS product on top of nexo-rs. The framework runs in the background; your app owns the UI and the multi-tenant story. | Your own application, talking to nexo-rs over admin RPC (NATS). |
What it's not
- Not a chat webapp. There's no built-in UI; you bring your own channels (and your own UI for microapps).
- Not a single-tenant prototype. Multi-tenant SaaS is a first-class shape, not an afterthought.
- Not a research toy. Every release is signed; the install pipeline has cosign verification + sha256 checks; the broker has disk fallback; the test surface covers all four SDK languages.
How it compares
The closest reference point is OpenClaw (TypeScript, Node). If that's where you're coming from, here's the trade: you give up JS familiarity, you get —
- A single static Rust binary (vs Node +
node_modules); pre-built for Linux / macOS / Windows +.deb/.rpm/ Termux - A fault-tolerant broker layer — NATS for multi-host or a local stdio bridge with no external server (vs in-memory only)
- Zero-config boot +
nexo initdocumented YAML scaffolds - Per-agent capability sandboxes
- Durable workflows (TaskFlow) + secrets audit
- Termux / mobile-first portability
- Plugin SDKs in 4 languages — Rust, Python, TypeScript, PHP (vs TS only)
See vs OpenClaw for the full side-by-side.
Where to next
- New here? → Quickstart — install
- first agent in 5 minutes.
- Want to write a plugin? → Plugin contract.
- Building a SaaS? → Microapps · getting started.
- Curious about internals? → Browse the Advanced section in the sidebar — architecture deep-dives, ADRs, design notes.
What you can build
A non-exhaustive gallery of products people are shipping (or could ship by next week) on top of nexo-rs. Each card links to the recipe / template that gets you 80 % of the way.
If you're scanning to decide whether nexo-rs fits your use case, read this page top-to-bottom. If something matches your shape, follow the link.
Channel agents
Installing a channel plugin — each card below needs one. They ship as GitHub Releases tarballs; install with
nexo plugin install <owner>/<repo>:nexo plugin install lordmacu/nexo-plugin-whatsapp nexo plugin install lordmacu/nexo-plugin-telegram nexo plugin install lordmacu/nexo-plugin-email nexo plugin install lordmacu/nexo-plugin-browser nexo plugin install lordmacu/nexo-plugin-google # Gmail/Calendar/Drive/Sheets nexo plugin install lordmacu/nexo-plugin-web-search # Brave/Tavily/DuckDuckGo/Perplexity nexo plugin listAll six also ship to crates.io:
cargo install nexo-plugin-web-search(etc.) drops the binary in$HOME/.cargo/bin/and the daemon's discovery walker picks it up automatically.(Or build from source:
cargo install --git https://github.com/lordmacu/nexo-plugin-whatsapp.) Then reference the channel in your agent YAML, as shown below.
WhatsApp sales agent — qualify leads + book demos
⏱ Build time · 1 afternoon · ⚙️ Layer · agent + WhatsApp plugin
Ana takes inbound WhatsApp messages, qualifies the prospect with a tool that calls your CRM, and books a calendar slot. Persona prompt
- 2 tools (
crm_lookup,calendar_book) + a YAML — that's it.
# config/agents.d/ana-sales.yaml
agents:
- id: ana-sales
persona_path: ./personas/ana-sales.md
llm: minimax-m2.5
channels:
- whatsapp:sales-line
tools: [crm_lookup, calendar_book, send_quote]
memory: { long_term: true, vector: true }
→ WhatsApp sales agent recipe → WhatsApp plugin docs
Email triage agent — auto-reply + escalate
⏱ Build time · 1 day · ⚙️ Layer · agent + email plugin + skill bundle
Sweeps Gmail every 5 minutes, classifies inbound messages (invoice / support / spam / sales), auto-replies to the easy buckets, escalates the rest to a human via Telegram with a 1-paragraph summary.
agents:
- id: triage-bot
persona_path: ./personas/triage.md
channels: [email:inbox, telegram:ops-team]
skills: [classify-email, draft-reply, escalate-to-human]
→ Email plugin docs → Skill catalog
Google Workspace agent — Gmail + Calendar + Drive
⏱ Build time · 1 afternoon · ⚙️ Layer · agent + Google plugin
OAuth-authenticated agent that can search Gmail, schedule
calendar events, and pull docs from Drive — all through the
generic google_call tool that wraps any *.googleapis.com
endpoint. Token state lives in the agent's workspace; access
tokens auto-refresh.
cargo install nexo-plugin-google
nexo # daemon discovers + spawns
nexo-plugin-google --oauth-once <agent_id> \
--client-id-file ./secrets/google_client_id.txt \
--client-secret-file ./secrets/google_client_secret.txt \
--token-file ./data/workspace/<agent_id>/google_tokens.json \
--scopes gmail.readonly,calendar,drive.readonly \
--workspace-dir ./data/workspace/<agent_id>
agents:
- id: gws
persona_path: ./personas/gws.md
google_auth:
client_id_file: ./secrets/google_client_id.txt
client_secret_file: ./secrets/google_client_secret.txt
scopes: [gmail.readonly, calendar, drive.readonly]
tools: [google_auth_status, google_call]
→ Google plugin docs → Source · github.com/lordmacu/nexo-rs-plugin-google
Customer support copilot — Telegram bot with KB + handoff
⏱ Build time · 2-3 days · ⚙️ Layer · agent + Telegram + vector memory + MCP
Telegram bot answers from your knowledge base (sqlite-vec). When the LLM's confidence drops, it hands off to a human and posts the transcript to your support channel.
agents:
- id: support-copilot
persona_path: ./personas/support.md
channels: [telegram:support-bot]
memory:
vector: true
vector_collections: [kb-faqs, kb-troubleshooting]
tools: [escalate_to_human, search_kb]
→ Telegram plugin → Vector search
Multi-agent systems
Multi-agent CRM — intake, qualifier, closer
⏱ Build time · 3-5 days · ⚙️ Layer · 3 agents + agent-to-agent delegation
Three coordinated agents over NATS:
- Intake picks up inbound on every channel, normalizes, hands off
- Qualifier scores the lead (BANT or your framework), tags
- Closer (only on hot leads) drafts proposal + books call
Communicate via agent.route.<target_id> topics with a
correlation_id to match responses.
flowchart LR
WA[WhatsApp] --> INTAKE[Intake]
EMAIL[Email] --> INTAKE
INTAKE -->|hot lead| QUAL[Qualifier]
QUAL -->|score >= 70| CLOSER[Closer]
CLOSER --> CALENDAR[Calendar tool]
QUAL -->|score < 70| NURTURE[(Drip campaign queue)]
→ Agent-to-agent delegation → Multi-agent coordination
Internal ops bot — Slack via MCP + AWS tools + cron
⏱ Build time · 1-2 days · ⚙️ Layer · agent + MCP + cron skills
A bot in your team's Slack (via MCP server) that answers "what's broken in prod", schedules nightly DB snapshots, and posts the daily cost report at 9 AM.
agents:
- id: ops-bot
persona_path: ./personas/ops.md
channels: [mcp:slack-team]
tools: [aws_logs, aws_cost, db_snapshot]
cron:
- "0 9 * * *" # daily cost report
- "0 2 * * *" # nightly DB snapshot
→ MCP channels → Cron schedule tools
SaaS products
WhatsApp meta-creator SaaS — clients build their own agents
⏱ Build time · 4-8 weeks · ⚙️ Layer · microapp + multi-tenant + WhatsApp Web UI
A SaaS where end-users sign up and build their own WhatsApp agent through a WhatsApp-Web-style React UI. Each client gets isolated state, their own agents, their own knowledge base. The framework runs out of view; the microapp owns the UX.
#![allow(unused)] fn main() { // Provision a tenant from the microapp backend admin.create_tenant(TenantSpec { id: "client-acme".into(), plan: "pro".into(), quotas: Quotas { agents: 10, llm_tokens_month: 5_000_000 }, }).await?; }
→ Microapps · getting started
→ agent-creator reference
→ Multi-tenant SaaS
Vertical SaaS — sales / support / marketing extension pack
⏱ Build time · 2-3 weeks · ⚙️ Layer · extension + multi-tenant
Bundle your domain expertise as an extension: 5 tools + 3 advisors
- 8 skills + an MCP server adapter. Operators run
nexo ext install ./your-packand your vertical lights up across all their tenants.
→ Extension manifest → Extension templates
Personas — pre-built agent packs
Persona packs bundle a ready-to-run agent (system prompt + plugin
bindings + workspace seed + secrets templates) you install with one
command. Distinct from plugins (plugins register CODE; personas
register CONFIG). Authored against the v2 manifest schema, published
as GitHub Releases, installed via nexo persona install.
# Browse + install:
nexo persona install lordmacu/nexo-persona-cody
nexo persona list
Available today:
| Pack | Persona | Channels | Use case |
|---|---|---|---|
lordmacu/nexo-persona-cody | Cody — programmer pair | Telegram, WhatsApp | Drives Claude Code goals from chat. Reads PHASES.md, dispatches one phase at a time, audits the diff before declaring done. Self-modify by default (with git-worktree isolation); production opts out via NEXO_DISALLOW_SELF_MODIFY=1. |
lordmacu/nexo-persona-ana-template | Ana — sales / lead capture | Hardened single-tool template for inbound WhatsApp lead capture. allowed_tools whitelist + outbound_allowlist.whatsapp defense-in-depth: a jailbroken prompt cannot exfiltrate leads to an attacker number. Operator customizes the advisor phone + sales script before going live. | |
lordmacu/nexo-persona-marketing-multiclient-template | Multi-client marketing | configurable | Three distinct agents (intake / retention / exec) on one daemon, each with its own LLM (MiniMax M2.5 / Claude Haiku 4.5 / DeepSeek v4 flash) + own proactive cadence + own daily turn budget. Demonstrates the multi-tenant single-install pattern. |
More on the way — see the Cody README
for the v2 manifest shape if you want to publish your own. Inner-
loop dev with nexo persona run /path/to/local/pack boots the
daemon against an unpackaged dir.
→ Installing personas (full guide)
Specialized agents
Browser scraping agent — URL → structured data
⏱ Build time · 1-2 days · ⚙️ Layer · agent + browser plugin
Receives URLs (via webhook / Telegram / API), uses the browser plugin (Chrome DevTools Protocol) to render JS-heavy pages, extracts structured data, publishes results back to a topic. Useful for price monitoring, competitive intel, lead enrichment.
agents:
- id: scraper
persona_path: ./personas/scraper.md
channels: [webhook:scrape-requests]
tools: [browser_navigate, browser_screenshot, browser_extract_text]
Lead notification poller — RSS / API → Telegram alert
⏱ Build time · half a day · ⚙️ Layer · poller + Telegram
A cron-style poller hits an external RSS feed / API every N minutes, dedupes against state, and pings your sales team in Telegram when something matches. Pure config — no LLM call needed on the hot path.
# config/pollers.yaml
pollers:
- id: linkedin-jobs
cron: "*/15 * * * *"
url: "https://linkedin.example/.../feed.atom"
filter:
keyword: ["VP Sales", "Head of Growth"]
publish: plugin.inbound.telegram.sales-alerts
→ Build a poller module → Pollers config
MCP server from Claude Desktop — expose your tools to Claude
⏱ Build time · 1 hour · ⚙️ Layer · agent as MCP server
Run nexo-rs as an MCP server. Claude Desktop (or any MCP client) sees every tool / agent / skill you've configured as if they were native. Build internal Claude integrations without writing TS.
→ MCP server from Claude Desktop → Agent as MCP server
Where to next
- Pick the closest match → follow its link → adapt to your data.
- Read the Quickstart first if you don't already have a binary running.
- Browse the Recipes section in the sidebar for end-to-end deploy walkthroughs.
- If you're building a SaaS, jump straight to Microapps · getting started.
Install nexo-rs (LLM-friendly guide)
Share this URL with any AI assistant to teach it how to install nexo-rs end-to-end on Linux or Termux:
https://lordmacu.github.io/nexo-rs/install-for-ai.htmlThe page is intentionally linear: copy-paste each block in order. No menus to navigate, no marketing, every command is deterministic.
What you are installing
- nexo-rs — a Rust multi-agent framework. Single
agentbinary- a
cloudflaredtunnel for the web admin UI.
- a
- Source: https://github.com/lordmacu/nexo-rs
- Docs (full): https://lordmacu.github.io/nexo-rs/
- License: dual MIT / Apache-2.0
Pick your platform
- Linux (Debian / Ubuntu / Arch / Fedora) → §A
- Termux (Android, no root) → §B
Skip the section that doesn't apply.
§A — Linux install
A.1. System packages
Debian / Ubuntu:
sudo apt update
sudo apt install -y build-essential pkg-config libsqlite3-dev git curl
Arch:
sudo pacman -Syu --needed base-devel pkgconf sqlite git curl
Fedora:
sudo dnf install -y @development-tools pkgconf-pkg-config sqlite-devel git curl
A.2. Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source "$HOME/.cargo/env"
rustup component add rustfmt clippy
A.3. Clone + build
git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
cargo build --release --bin agent
The compiled binary is at ./target/release/agent. Copy it into
PATH (optional):
sudo install -m 0755 target/release/agent /usr/local/bin/agent
A.4. First-run wizard
agent setup
Follow the interactive prompts. Defaults are sane. The wizard
writes config/agents.d/<your-agent>.yaml, IDENTITY.md,
SOUL.md, and any channel YAMLs you opt into.
A.5. Run
agent
Or, for the web admin (loopback HTTP + Cloudflare tunnel):
agent admin
The admin command prints a one-time URL + password to stdout. Open the URL, log in, and configure from the browser.
A.6. (Optional) systemd service
sudo useradd -r -s /bin/false -d /srv/nexo-rs nexo
sudo mkdir -p /srv/nexo-rs
sudo cp -r config target/release/agent /srv/nexo-rs/
sudo chown -R nexo:nexo /srv/nexo-rs
sudo tee /etc/systemd/system/nexo-rs.service > /dev/null <<'EOF'
[Unit]
Description=nexo-rs agent
After=network.target
[Service]
Type=simple
User=nexo
WorkingDirectory=/srv/nexo-rs
ExecStart=/srv/nexo-rs/agent --config /srv/nexo-rs/config
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now nexo-rs
Logs: journalctl -u nexo-rs -f.
A.7. (Optional) NATS broker
A single-process install does not need NATS — the runtime falls back to in-process channels. Add NATS only when scaling beyond one host:
curl -L -o /tmp/nats.tar.gz \
https://github.com/nats-io/nats-server/releases/download/v2.10.20/nats-server-v2.10.20-linux-amd64.tar.gz
tar -xzf /tmp/nats.tar.gz -C /tmp
sudo mv /tmp/nats-server-*/nats-server /usr/local/bin/
sudo systemctl enable --now nats-server # if you have a unit file
Then in config/broker.yaml set kind: nats and url: nats://127.0.0.1:4222.
§B — Termux install (Android, no root)
B.1. Termux from F-Droid
Install Termux from https://f-droid.org/en/packages/com.termux/. Do not install from the Google Play Store — that build is outdated.
Open Termux. Then:
pkg update
pkg upgrade -y
B.2. Build dependencies
pkg install -y rust git curl sqlite openssl clang pkg-config
Optional extras (only the ones you'll use):
# media transcoding + OCR + youtube downloads
pkg install -y ffmpeg tesseract yt-dlp
# tmux for long-running tunnels and ssh
pkg install -y tmux openssh
# headless Chromium for the browser plugin
pkg install -y tur-repo
pkg install -y chromium
# Termux:API for sensors / SMS / clipboard
pkg install -y termux-api
# (also install the Termux:API companion app from F-Droid)
B.3. Clone + build
cd ~
git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
cargo build --release --bin agent
B.4. First-run wizard
./target/release/agent setup
B.5. Run
./target/release/agent
Or with the admin UI (the cloudflared tunnel works on Termux):
./target/release/agent admin
B.6. Keep running with the screen off
Termux apps get killed on doze unless you disable battery optimizations and acquire a wake lock:
-
Disable optimizations: Android Settings → Apps → Termux → Battery → Unrestricted.
-
Wake lock: in Termux, type:
termux-wake-lock -
(Optional) auto-restart on boot: install Termux:Boot from F-Droid, then create
~/.termux/boot/00-nexo-rs:mkdir -p ~/.termux/boot cat > ~/.termux/boot/00-nexo-rs <<'EOF' #!/data/data/com.termux/files/usr/bin/sh termux-wake-lock cd ~/nexo-rs ./target/release/agent --config ./config >> ~/nexo-rs/agent.log 2>&1 EOF chmod +x ~/.termux/boot/00-nexo-rs
B.7. Termux-specific tip — Chromium flags
The browser plugin (plugins: [browser]) needs the right Chromium
launch flags on Termux. The defaults already cover Android; nothing
extra to set. Just make sure chromium is on PATH (it is, after
pkg install chromium).
Config layout (both platforms)
After agent setup runs, the project tree looks like:
nexo-rs/
├── config/
│ ├── agents.yaml # opt-in dev defaults
│ ├── agents.d/ # your agents land here
│ │ └── <slug>.yaml
│ ├── broker.yaml # NATS or local
│ ├── llm.yaml # provider keys + model
│ └── plugins/ # one YAML per channel plugin
├── secrets/ # mode 0600 token files (gitignored)
├── data/ # SQLite databases (memory, taskflow, transcripts)
├── target/release/agent # the built binary
└── agent.log # if you redirected stdout
Edit YAML by hand or use the web admin (agent admin).
Troubleshooting
cargo buildfails with linker errors on Linux — installbuild-essentialandpkg-config(§A.1).cargo buildhitsout of memoryon Termux — close other apps, or build with one job:cargo build --release -j 1.agentexits immediately withfailed to load config— runagent setupfirst; the wizard creates the missing files.- WhatsApp QR pairing fails on Termux — make sure the device is on the same network as your phone, then open the QR pairing URL the daemon prints.
- Admin tunnel URL doesn't respond — Cloudflare's quick tunnel
occasionally rotates; restart
agent adminand copy the new URL.
Useful commands after install
agent --help # all subcommands
agent doctor capabilities --json # which env toggles are armed
agent setup doctor # audit configured secrets
agent ext doctor --json # extension health
agent flow list # taskflow admin
agent dlq list # dead-letter queue
Full reference: https://lordmacu.github.io/nexo-rs/cli/reference.html
When asking an AI for help
Paste this URL into your prompt:
Install nexo-rs from https://lordmacu.github.io/nexo-rs/install-for-ai.html
on this machine. The OS is <Linux distro / Termux>. Stop after each
section to confirm output looks right.
The page above is the canonical, copy-paste-friendly install path. The full mdBook (https://lordmacu.github.io/nexo-rs/) covers the same ground in more depth — link there once the agent is up.
Installation
Pick the channel that matches your environment. Every channel
produces the same nexo binary; the differences are in how it
gets onto your machine and which dependencies come bundled.
Channel matrix
| Channel | When to pick it | Time to first run | Needs Rust? |
|---|---|---|---|
Pre-built binary — Linux/macOS (install.sh) | Just want nexo on PATH, fast | ~10 s | No |
Pre-built binary — Windows (install.ps1) | Native Windows, PowerShell | ~10 s | No |
crates.io (cargo install nexo-rs) | You already have a Rust toolchain | ~3-5 min build | Yes |
Debian / Ubuntu (.deb) | systemd host, apt integration | ~10 s | No |
Fedora / RHEL / Rocky (.rpm) | systemd host, dnf integration | ~10 s | No |
Termux (.deb, aarch64) | Phone-hosted personal agent | ~10 s | No |
| Docker (GHCR) | Production, CI, "just works" + bundled Chrome/ffmpeg/… | ~30 s | No |
| Nix flake | NixOS, reproducible dev shell | ~3-5 min cold | (Nix) |
| Native (no Docker), from source | Track main, full control | ~10-15 min | Yes |
Quickest path — pre-built binary
Linux / macOS (also Windows from Git Bash):
curl -fsSL https://lordmacu.github.io/nexo-rs/install.sh | bash
Windows (PowerShell):
irm https://lordmacu.github.io/nexo-rs/install.ps1 | iex
Detects your OS + arch (Linux x86_64 / aarch64 static-musl,
macOS Intel / Apple Silicon, Windows x86_64-MSVC), downloads the
matching release artifact from GitHub Releases,
verifies its sha256, drops nexo (nexo.exe on Windows) onto your
PATH, then installs the bundled channel plugins + a persona. Falls
back to cargo install nexo-rs → cargo install --git if there's
no pre-built binary for your platform. Every release artifact is
cosign-signed.
Then:
nexo # boots the daemon — zero config required
Windows users: download the .zip from
Releases, or
run the installer under WSL (it then sees Linux).
From crates.io
If you already have a Rust toolchain:
cargo install nexo-rs
Builds + installs the nexo binary into $CARGO_HOME/bin. The
whole nexo-* workspace ships to crates.io, so this resolves
cleanly without a git checkout.
From source
For contributors and operators who want to track main directly:
git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
cargo build --release --bin nexo
./target/release/nexo --help
The workspace compiles ~45 crates and produces the nexo binary
plus two example smoke-test bins (integration-browser-check,
llm_smoke). Toolchain is pinned to Rust 1.80 (MSRV) via
rust-toolchain.toml — no manual channel selection needed. For
faster iterative builds use cargo build --profile release-fast
(same opt-level, no LTO, ~50 % quicker).
Prerequisites
- Rust 1.80+ (
rustuprecommended) - NATS is optional — the daemon defaults to
broker.type: local(an in-process stdio bridge, no external server). Run a NATS server only for multi-host clusters:
See broker shapes / broker.yaml.docker run -p 4222:4222 nats:2.10-alpine nexo set-broker nats --url nats://localhost:4222 - Git (the memory subsystem uses per-agent workspace-git)
- Chrome / Chromium (only if you plan to use the browser plugin)
Verification
./target/release/nexo --version
cargo test --workspace --lib
nexo --version prints the build provenance line (commit + build
timestamp) so a bug report carries enough context to reproduce.
Bootstrap script
For native or Termux installs, ./scripts/bootstrap.sh automates
the whole process — installs the system deps, downloads NATS if
not present, scaffolds config/, and runs the setup wizard.
./scripts/bootstrap.sh # interactive
./scripts/bootstrap.sh --yes # accept all defaults
The script auto-detects Termux ($PREFIX set) and switches to
pkg install + broker.type: local so you don't need root or
NATS on a phone.
Next steps
- Quick start — first agent running in five minutes
- Setup wizard — pair channels and wire secrets
- Docker — compose stack, secrets, GHCR pulls
- Nix flake —
nix run, dev shell - Native install — detailed no-Docker setup
- Termux install — phone-hosted personal agent
Zero-config quickstart
The fastest path from an installed nexo binary to a running
daemon — no YAML editing required, no NATS server, no API key
needed to boot. (Install: curl … install.sh | bash,
cargo install nexo-rs, or a .deb/.rpm — see
Installation.) Once running, configure
incrementally via nexo init (scaffold sample YAMLs),
nexo set-broker (switch broker mode), or the operator UI
(admin RPCs).
Total wall-clock time: 30 seconds from installed binary to live daemon serving health + admin RPCs.
This page reflects the post-Phase-92/93/94/95 ergonomics. For the classical full-control walkthrough (manually edit each YAML, pair channels, talk to a Telegram bot end-to-end), see Quickstart.
1. Run the daemon
nexo
That's it. The daemon discovers its config dir per Phase 92.9 precedence, falls back to baked-in defaults for any YAML that's missing (Phase 93), and starts serving on the default health + admin endpoints.
Expected log:
WARN nexo_config: config dir not found — booting with
Default::default() for every YAML
(0 agents, BrokerKind::Local, 0 llm
providers, sqlite memory at default path)
INFO nexo: broker ready kind=Local url=
INFO nexo: long-term memory ready path=./data/memory.db
INFO plugins.discovery: plugin registry wire complete loaded=0
INFO nexo: pairing initialised
INFO nexo: agent ready — waiting for shutdown signal
What's happening:
- 0 agents — daemon waits for
nexo/admin/agents/upsertvia the operator UI. BrokerKind::Local— in-processtokio::mpsc. No NATS server needed. Subprocess plugins bridge through stdio (Phase 92).- 0 LLM providers — any tool call to an LLM fails loud
with
provider_not_found; the daemon stays up. - SQLite memory at
./data/memory.db— auto-created.
The daemon is now ready to accept admin RPCs to populate state. The next sections walk through adding config either via the operator UI (recommended) or by hand-editing YAMLs.
2. Operator UI (recommended)
If you have the agent-creator-microapp extension installed,
it exposes a web UI for managing agents, LLM providers,
channels, and credentials. Every UI action is an admin RPC
behind the scenes:
| UI route | Admin RPC |
|---|---|
| New agent | nexo/admin/agents/upsert |
| Add LLM provider + key | nexo/admin/llm_providers/upsert |
| Pair WhatsApp | nexo/admin/pairing/start |
| Register credentials | nexo/admin/credentials/register |
| Set marketing rules | nexo/admin/marketing/rules/upsert |
Each RPC writes to the corresponding YAML on disk and
notifies the running daemon via config_watch, so changes
take effect without restart for hot-reloadable subsystems.
If you don't yet have a microapp installed, see
agent-creator-microapp
or skip ahead to YAML scaffolding.
3. Scaffold sample YAMLs
When you want to configure something specific and don't want to read the source for field shapes, ask the daemon to write heavily-commented templates:
nexo init # writes 19 sample YAMLs to ~/.config/nexo/
nexo init --yaml broker # only broker.yaml
nexo init --yaml broker,llm # comma-separated names
nexo init --yaml plugins # all plugins/*.yaml templates
nexo init --output /etc/nexo # custom target dir
nexo init --force # overwrite existing files
nexo init --yaml llm --stdout # emit to stdout (for piping)
Templates cover the four required configs (agents,
broker, llm, memory) plus every optional subsystem
(extensions, mcp, runtime, pollers, taskflow,
transcripts, pairing, webhook_receiver) and the plugin
- persona dirs.
Edit any of them, then restart the daemon (or let
config_watch pick up the change). Empty fields stay at
their defaults — you only fill in what you actually want.
Example: adding a MiniMax LLM provider after nexo init:
nexo init --yaml llm --output ~/.config/nexo
# Edit ~/.config/nexo/llm.yaml — uncomment the minimax block
# Set MINIMAX_API_KEY in your env (the YAML uses ${MINIMAX_API_KEY})
export MINIMAX_API_KEY=your-key
nexo
4. Switch the broker at runtime
broker.yaml type: is the single most operator-tweaked
field. Skip the YAML edit and use the dedicated subcommand:
nexo set-broker local # stdio bridge (default)
nexo set-broker nats --url nats://localhost:4222 # multi-host cluster
nexo set-broker local --no-signal # edit YAML only, no respawn
The subcommand edits broker.yaml in your resolved config
dir (auto-creating the file with defaults if missing) and
sends SIGTERM to the running daemon by default. The
supervisor loop (dev-daemon.sh, systemd, etc.) respawns
and picks up the new config — ~3 second blackout.
--no-signal skips the kill; you control the restart
timing yourself.
When to pick which broker shape: broker shapes architecture.
5. Override config via env vars (12-factor)
For Docker / Kubernetes / CI deployments where YAMLs live in secret mounts at non-canonical paths:
NEXO_BROKER_YAML=/run/secrets/broker.yaml \
NEXO_LLM_YAML=/run/secrets/llm.yaml \
NEXO_AGENTS_YAML=/etc/cfg/agents.yaml \
nexo
Each NEXO_<NAME>_YAML env points at an absolute path. If
set, that file wholesale replaces the YAML the daemon
would otherwise load from the config dir.
Currently supported (Phase 94): NEXO_AGENTS_YAML,
NEXO_BROKER_YAML, NEXO_LLM_YAML, NEXO_MEMORY_YAML.
6. Layered overrides (Kustomize-style)
For ConfigMap base + Secret overlay deployments where you want to override specific fields (not whole files):
nexo --config /etc/nexo --override-from /run/secrets
The daemon loads each YAML from /etc/nexo/<name>.yaml,
then deep-merges the same-named file from /run/secrets/<name>.yaml
on top (per-field for mappings; wholesale replace for
sequences and scalars).
Example:
# /etc/nexo/broker.yaml (committed to git, in ConfigMap)
broker:
type: nats
url: nats://placeholder
persistence:
enabled: true
# /run/secrets/broker.yaml (mounted from Kubernetes Secret)
broker:
url: nats://prod-cluster.example.com:4222
Effective config the daemon sees:
broker:
type: nats # from base
url: nats://prod-cluster.example.com:4222 # from overlay
persistence:
enabled: true # from base (overlay didn't touch)
The same chain applies to all four required YAMLs. Both env
vars and --override-from compose with Phase 93 defaults:
when neither layer has a value, the daemon's Default::default()
takes over.
Config dir discovery
When --config <dir> is not passed explicitly, the daemon
resolves the config dir in this order:
NEXO_CONFIG_DIRenv var./configrelative to cwd (legacy, only when present)$XDG_CONFIG_HOME/nexoor$HOME/.config/nexo./configas a last-resort error path
Subcommands that read or write config (nexo init,
nexo set-broker) auto-create the directory and any
missing files when they need to — operators don't run
mkdir first.
Composition summary
The four ergonomic phases stack like this:
┌────────────────────────────────────────────────────────────┐
│ Phase 95: nexo init │
│ Scaffolds sample YAMLs with field-level docs. │
│ ↓ │
│ Operator edits YAML OR uses admin RPC OR set-broker │
│ ↓ │
│ Phase 94: env / override-from │
│ NEXO_<NAME>_YAML overrides; --override-from deep-merges.│
│ ↓ │
│ Phase 92.9: --config / NEXO_CONFIG_DIR / XDG default │
│ Base config dir resolution. │
│ ↓ │
│ Phase 93: zero-config defaults │
│ Any YAML still missing → Default::default(). │
│ ↓ │
│ Phase 92: subprocess broker bridge │
│ `broker.yaml type: local` works even with extracted │
│ subprocess plugins. No NATS server required. │
└────────────────────────────────────────────────────────────┘
Each layer is optional; pick the ones your deployment needs.
Where to next
- Want the classical walkthrough? → Quickstart shows manual YAML editing + Telegram bot end-to-end.
- Deploy targets in detail →
.deb,.rpm, Termux, Nix, native. - Add an agent at runtime → agents.yaml
or the operator UI's
/agents/newpage. - Broker shapes → broker-shapes.md covers when to pick NATS vs local vs embedded.
- Microapp framework → Microapps · getting started for building operator-facing UIs on top of the daemon.
Quickstart
Goal: by the end of this page you have a running nexo-rs daemon with one agent that replies on Telegram (or WhatsApp) when you send it a message.
Total wall-clock time on a fresh laptop: ~10 minutes. The first
cargo build is the slow step — pre-built binaries skip it
entirely.
In a hurry? Phase 92-95 added a much shorter path:
nexoalone now boots a working daemon with zero YAMLs + zero external broker. See zero-config quickstart for the 30-second version. This page covers the classical full-control walkthrough.
What you'll have at the end
You (in Telegram) → "what's the weather in Bogotá?"
Your agent (Ana) → "Looking it up..." (via tool)
Your agent (Ana) → "Currently 18 °C, light rain."
Plus everything wired together — the broker (local stdio bridge by default, or NATS), an LLM provider, a channel plugin, the agent runtime, memory. From here you can swap personas, add tools, pair more channels, or move to a multi-tenant deployment.
1. Install the binary
Pick one — the one-liner is the fastest, no Rust toolchain needed:
# Pre-built binary (Linux x86_64/aarch64, macOS Intel/Apple Silicon).
# Detects your platform, verifies sha256, drops `nexo` on PATH.
curl -fsSL https://lordmacu.github.io/nexo-rs/install.sh | bash
nexo --version
Other paths:
# From crates.io (needs a Rust toolchain)
cargo install nexo-rs
# Debian/Ubuntu (.deb), Fedora/RHEL (.rpm), Termux (aarch64 .deb) —
# grab the file for your arch from the latest release, e.g.:
# https://github.com/lordmacu/nexo-rs/releases/latest
sudo apt install ./nexo-rs_0.1.6_amd64.deb # Debian/Ubuntu
sudo dnf install ./nexo-rs-0.1.6-1.x86_64.rpm # Fedora/RHEL
pkg install ./nexo-rs_0.1.6_aarch64.deb # Termux
# Docker
docker pull ghcr.io/lordmacu/nexo-rs:latest
# From source (track main)
git clone https://github.com/lordmacu/nexo-rs.git
cd nexo-rs && cargo build --release && ./target/release/nexo --version
→ More installers: Installation, .deb, .rpm, Termux, Nix.
2. Start NATS (optional)
Phase 92 onwards, NATS is optional. Subprocess plugins
bridge through the daemon's stdio JSON-RPC channel when
broker.yaml type: local, so single-host dev deployments
don't need an external broker server at all. Skip this step
unless you're building a multi-host cluster.
For multi-host / prod-like setups, start NATS:
docker run -d --name nexo-nats -p 4222:4222 nats:2.10-alpine
# OR native install — see broker-shapes architecture doc
Then later, after the daemon is running:
nexo set-broker nats --url nats://localhost:4222
→ Broker shapes explains when to pick which.
3. Provide an LLM key
Pick one provider. MiniMax M2.5 is the primary; Anthropic and OpenAI-compatible APIs are first-class alternatives.
# Option A — MiniMax (default in shipped config)
export MINIMAX_API_KEY=your-key
export MINIMAX_GROUP_ID=your-group-id
# Option B — Anthropic
export ANTHROPIC_API_KEY=sk-ant-...
# Option C — any OpenAI-compatible endpoint
export OPENAI_API_KEY=sk-...
export OPENAI_BASE_URL=https://api.openai.com/v1
The shipped config/llm.yaml reads each via ${ENV_VAR} — no
hardcoded keys.
4. Install the channel plugin + pair it
Channels are subprocess plugins (Phase 81.18 onward). Easiest is Telegram — no QR code, no Signal protocol, just a bot token from BotFather:
# Cargo install drops the binary in ~/.cargo/bin/ — the daemon's
# discovery walker scans that directory on boot, no YAML edit
# required (Phase 81.33 Stage 8 auto-detection).
cargo install nexo-plugin-telegram
nexo plugin list
# Tell BotFather to /newbot, save the token:
export TELEGRAM_BOT_TOKEN=123456:ABC-DEF...
For WhatsApp: cargo install nexo-plugin-whatsapp, then
the setup wizard walks you through QR
pairing.
For Google (Gmail / Calendar / Drive):
cargo install nexo-plugin-google, then run
nexo-plugin-google --oauth-once <agent_id> --device (or omit
--device to use the loopback browser flow). See the
Google plugin docs for the full CLI flag
list.
For Web Search (Brave / Tavily / DuckDuckGo / Perplexity):
cargo install nexo-plugin-web-search, then populate
<config_dir>/plugins/web-search.yaml::instances[].providers
with API key file refs. See the
Web Search plugin docs. DuckDuckGo
works with no API key as the fallback provider.
Six canonical plugins live on crates.io:
whatsapp/telegram/email/browser/google/web-search.
How the daemon finds your plugin
The discovery walker (Phase 81.33 Stage 8) probes every search path on boot. Defaults out of the box:
| Path | Use |
|---|---|
$HOME/.cargo/bin | cargo install nexo-plugin-X lands here |
$HOME/.local/share/nexo/plugins | XDG-style per-user install |
/usr/local/libexec/nexo/plugins | system-wide install |
In each path the walker looks for two shapes:
- A directory containing a
nexo-plugin.tomlmanifest +bin/<plugin-id>entrypoint (classic layout — used when you want to ship multiple files together). - A bare executable named
nexo-plugin-<id>. The walker invokes the binary with--print-manifest(2s timeout), parses stdout as TOML, and registers the plugin if validation passes. This is the layoutcargo installproduces.
Operators can append paths via
config/plugins/discovery.yaml:
discovery:
search_paths:
- /opt/nexo-plugins # site-specific install root
# Default paths above are STILL scanned — supply
# `search_paths: []` to opt out entirely.
auto_detect_binaries: true # opt out by setting to false
disabled: [] # plugin ids to skip
allowlist: [] # whitelist when non-empty
→ Authoring your own plugin: Plugin SDKs → Rust SDK
documents the print_manifest_if_requested(MANIFEST) call that
makes binaries discoverable.
5. Drop a minimal agents.yaml
Scaffold every YAML the daemon knows (heavily commented, sane defaults filled in):
nexo init --output ./config
# Writes 19 sample YAMLs you can edit in place.
# Or just the ones you need: nexo init --yaml broker,llm,agents
Then add an agent to config/agents.yaml (or drop it in
config/agents.d/ana.yaml — the runtime merges that directory in,
alphabetical, and hot-reloads it):
agents:
- id: ana
model:
provider: minimax # minimax | anthropic | openai | gemini | deepseek
model: MiniMax-M2.5
plugins: [telegram] # the plugin you installed in step 4
inbound_bindings:
- plugin: telegram # which channel may trigger this agent
system_prompt: |
You are Ana, a helpful assistant. You answer concisely. You
speak Spanish if the user does, English otherwise. When you
don't know something, say so — don't make it up.
(YAML config uses #[serde(deny_unknown_fields)] — a typo'd key
fails fast at boot rather than being silently ignored. Full field
list: agents.yaml reference.)
6. Run the daemon
nexo --config ./config
First boot prints a startup summary. With the defaults from
nexo init (broker type: local), look for something like:
✓ broker ready — kind=Local (stdio bridge, no NATS server)
✓ plugin telegram — registered remote tools (registered_count=6)
✓ Telegram bot @YourBotName online
✓ Loaded 1 agent(s): ana
✓ LLM provider: minimax-m2.5 ready
✓ Memory: SQLite at ./data/memory.db
nexo-rs v0.1.6 ready
(If you'd run nexo set-broker nats … in step 2, the first line
reads broker ready — kind=Nats url=nats://… instead.) If
anything is missing, the log line tells you exactly what to fix —
missing env var, wrong YAML key, channel pair failure.
7. Talk to it
Open Telegram, search for your bot's name, send hola. Within
seconds you'll see Ana's reply — the LLM round-trip plus any tools
the agent decided to call.
You: hola
Ana: ¡Hola! ¿En qué te puedo ayudar?
You: ¿qué clima hace en Bogotá?
Ana: Déjame revisarlo...
Ana: En Bogotá ahora hay 18 °C con lluvia ligera.
(Weather requires a web_fetch or weather tool — see agents.yaml to wire one up.)
What you just ran
sequenceDiagram
participant U as You
participant CH as Telegram plugin (subprocess)
participant B as Broker (local stdio bridge, or NATS)
participant A as Ana (agent runtime)
participant L as MiniMax M2.5
U->>CH: "hola"
CH->>B: publish plugin.inbound.telegram
B->>A: deliver to ana
A->>L: chat.completion(messages, tools)
L-->>A: assistant turn
A->>B: publish plugin.outbound.telegram
B->>CH: deliver
CH-->>U: "¡Hola! ¿En qué te puedo ayudar?"
Every arrow is observable: nexo doctor plugins, the daemon log
(plugin.inbound.* / plugin.outbound.* lines), and — in NATS
mode — topic subscribers.
Where to next
You picked the simplest possible path. Common next moves:
- See real product shapes → What you can build — gallery of 10 deployable use cases.
- Multiple agents on multiple channels → drop more YAML files in
config/agents.d/. Hot-reload picks them up without a restart. → Drop-in agents - Add tools your agent can call → wire a built-in tool, write a custom one, or install an extension pack. → agents.yaml reference
- Build a plugin in your language → Plugin contract (Rust, Python, TypeScript, PHP).
- Build a SaaS on top → Microapps · getting started.
- Production deploy → Hetzner, Fly.io, AWS EC2.
Platform support
Honest matrix of what runs on what, plus the prerequisites each operating system needs for the optional voice / browser / WhatsApp features.
Daemon binary (nexo)
The core daemon — the agent loop, NATS bus, plugin supervisor,
admin API, MCP client/server, memory layer, taskflow runtime —
ships as a single static binary. It compiles against pure-Rust TLS
(rustls) and a bundled SQLite C source, so no system OpenSSL or
libsqlite is required at runtime.
| Platform | Arch | Daemon | How to install |
|---|---|---|---|
| Linux (any glibc / musl distro) | x86_64 | ✅ | curl -fsSL https://lordmacu.github.io/nexo-rs/install.sh | bash · or .deb / .rpm · or cargo install nexo-rs |
| Linux (any glibc / musl distro) | aarch64 | ✅ | curl -fsSL https://lordmacu.github.io/nexo-rs/install.sh | bash · or .deb / .rpm · or cargo install nexo-rs |
| macOS | x86_64 (Intel) | ✅ | curl -fsSL https://lordmacu.github.io/nexo-rs/install.sh | bash · or cargo install nexo-rs |
| macOS | aarch64 (Apple Silicon) | ✅ | curl -fsSL https://lordmacu.github.io/nexo-rs/install.sh | bash · or cargo install nexo-rs |
| Windows | x86_64 | ✅ | Download the .zip from Releases, or cargo install nexo-rs (the bash installer doesn't run natively) |
| Windows (WSL) | x86_64 | ✅ | Same install.sh one-liner as the Linux rows |
| Docker (any host) | amd64 + arm64 | ✅ | docker pull ghcr.io/lordmacu/nexo-rs:latest |
| Android (Termux) | aarch64 | ✅ | pkg install ./nexo-rs_<ver>_aarch64.deb (download from Releases) — or pkg install rust && curl -fsSL https://lordmacu.github.io/nexo-rs/install.sh | bash to build from source |
Installer. The
install.shone-liner detects your OS + arch and downloads the matching pre-built tarball from the latest GitHub release (Linux x86_64 / aarch64 static-musl, macOS Intel / Apple Silicon), verifies its sha256, and dropsnexoon your PATH — no Rust toolchain needed. It falls back tocargo install nexo-rs→cargo install --gitfor platforms with no pre-built binary. Every release artifact (tarball,.deb,.rpm) carries a.sha256sidecar and a cosign signature.
Native Windows (cmd.exe / PowerShell, no WSL): grab the release
.zip or cargo install nexo-rs. The shell installer is bash-only
by design — use it under WSL if you prefer the one-liner.
Optional features — what compiles per OS
The daemon's default feature set works on every platform above. A
microapp built on top of nexo-microapp-sdk
can opt into extra features that pull additional system
dependencies; this is what changes per OS.
| Feature | What it enables | Linux | macOS | Windows | Termux |
|---|---|---|---|---|---|
stt-candle | Default-track — inbound voice-note transcription via HuggingFace Candle (pure Rust) | ✅ | ✅ | ✅ | ✅ |
stt | Legacy — same surface via whisper.cpp C++ binding (whisper-rs) | ✅ | ✅ | ⚠️ needs VS Build Tools 2022 + CMake | ⚠️ needs cmake + clang packages |
stt-cloud | Cloud STT (native variant) — SttProvider trait + OpenAI Whisper-1 + Groq Whisper-large-v3 (REST). CompositeProvider fallback chain. Pulls reqwest with rustls-tls | ✅ | ✅ | ✅ | ✅ |
stt-cloud-wasm | Cloud STT (wasm32 variant) — same trait + REST providers as stt-cloud, but reqwest pulled without rustls-tls (browser fetch API handles TLS). Use this for wasm32-unknown-unknown microapps | — (use stt-cloud) | — (use stt-cloud) | — (use stt-cloud) | — (use stt-cloud) |
stt-cloud-anthropic | Adds Anthropic voice_stream WebSocket leg on top of stt-cloud (Claude.ai OAuth-gated; conversation engine + Deepgram Nova 3) | ✅ | ✅ | ✅ | ✅ |
stt-cloud-local-candle | Bridge — LocalCandleProvider so the Candle backend joins a CompositeProvider chain as the offline fallback leg + *_then_candle convenience constructors | ✅ | ✅ | ✅ | ✅ |
voice | Outbound voice replies via Microsoft Edge TTS + pure-Rust opus encoder | ✅ | ✅ | ✅ | ✅ |
wizard | First-run LLM key probe via reqwest (rustls-tls only) | ✅ | ✅ | ✅ | ✅ |
enrichment | Disposable-domain classifier + tenant-keyed cache | ✅ | ✅ | ✅ | ✅ |
tracking | HMAC-signed message + link tokens | ✅ | ✅ | ✅ | ✅ |
email-template | Block-based email composer + render + asset store | ✅ | ✅ | ✅ | ✅ |
STT backend choice (stt-candle vs stt)
Phase 91 introduced the pure-Rust Candle backend
(stt-candle) as the default track. The legacy whisper-rs path
(stt) is retained for one stability window — Phase 91.12 drops
it once telemetry confirms the migration.
Pick the right one:
stt-candle(recommended for every target) — HuggingFace Candle ML framework, no C++ build chain. Works out of the box on Linux, macOS, Windows, Termux / Android NDK. Model format is HuggingFace SafeTensors (openai/whisper-tinyand friends); the SDK auto-fetches the weights + tokenizer + config from HF Hub on first call whenTranscribeConfig::model_idis set, or loads from a local directory pinned viaTranscribeConfig::model_path(air-gapped deployments).stt(legacy) —whisper-rsbinding to whisper.cpp. Slightly faster on CPU, but the C++ build chain requires a per-target toolchain and breaks Android NDK / WASM cross-compile entirely. Keep it only if you've already shipped GGML.binmodels you can't easily migrate yet.
Both backends share the audio-decode pipeline (ogg-opus → s16
PCM → f32) and the public TranscribeConfig / transcribe_file
signature, so swapping is a Cargo feature change with no code
edits at consumer sites.
GPU acceleration (opt-in, stt-candle-* sub-features)
The default stt-candle build is CPU-only pure-Rust so it
cross-compiles to every target the workspace ships. Hardware
acceleration is opt-in per build target:
| Cargo feature | Backend | Platform |
|---|---|---|
stt-candle-metal | Apple Metal | macOS / iOS |
stt-candle-cuda | NVIDIA CUDA | Linux + Windows |
stt-candle-accelerate | Apple Accelerate (BLAS) | macOS |
Mix at most one per build. The audio decode + tokenizer pipeline stays identical — only the Tensor backend swaps.
Migration from a stt (whisper-rs) deployment
If you already ship a GGML .bin file and want to switch to
stt-candle:
# 1. Download the equivalent SafeTensors model from HF Hub.
huggingface-cli download openai/whisper-tiny \
--local-dir ./data/whisper-tiny
# 2. Point your microapp config at the new directory.
# Either:
# TranscribeConfig.model_path = "./data/whisper-tiny"
# or, to auto-fetch on first call (HF Hub cache):
# TranscribeConfig.model_id = Some("openai/whisper-tiny")
# 3. Flip the Cargo feature.
# Before: nexo-microapp-sdk = { features = ["stt"] }
# After: nexo-microapp-sdk = { features = ["stt-candle"] }
The whisper-rs path keeps working unchanged during the
transition. Do not enable both features at once in a production
build — stt-candle wins the public re-export when both are on,
so the legacy path becomes effectively unreachable through the
default API.
stt (legacy) — when you still need the C++ toolchain
If you stay on the stt feature, the original platform caveats
still apply:
- Linux:
apt install clang cmake(or your distro's equivalent). Most dev machines already have it. - macOS: Xcode Command Line Tools —
xcode-select --install. Provides clang + cmake. - Windows: Visual Studio Build Tools 2022 (the "Desktop
development with C++" workload, or just MSVC + CMake from the
individual components page) — no full Visual Studio IDE
required. Plus
cmakefrom https://cmake.org/download/. After install, open a "Developer Command Prompt for VS 2022" the first time socl.exeis on PATH. - Termux:
pkg install cmake clangfrom inside the Termux shell. Note that whisper.cpp performance on Android / Termux is noticeably lower than desktop CPUs; for production STT in Termux, considerstt-candle(which compiles trivially in Termux) or routing transcription to an upstream daemon.
Once the C++ build succeeds the first time, subsequent rebuilds are cached — operators usually pay this cost once during initial setup and never again.
Cloud STT (stt-cloud*) — REST + WebSocket backends
For deployments where on-device inference isn't a good fit (SaaS hot path, WASM browser microapps, metered cellular devices) the SDK ships a cloud STT path. Three providers, one fallback chain primitive, three one-line convenience constructors:
| Cargo feature | What it adds |
|---|---|
stt-cloud | SttProvider trait + CompositeProvider fallback chain + OpenAiProvider (Whisper-1 REST) + GroqProvider (Whisper-large-v3 REST) + transcribe_file_with_chain helper |
stt-cloud-anthropic | Adds AnthropicVoiceStream — full WebSocket client for wss://api.anthropic.com/api/ws/speech_to_text/voice_stream (OAuth-gated; the same conversation engine + Deepgram Nova 3 stack Claude Code itself uses for voice input) |
stt-cloud-local-candle | Adds LocalCandleProvider so the local Candle backend joins fallback chains as the offline-backup leg, plus anthropic_then_candle / openai_then_candle / groq_then_candle convenience constructors |
Cloud-first with local fallback — one line
When stt-cloud-local-candle is on, compose any cloud primary
with a local Candle backup in one call:
#![allow(unused)] fn main() { use std::sync::Arc; use nexo_microapp_sdk::stt::{TranscribeConfig, cloud}; let candle_cfg = Arc::new(TranscribeConfig { model_id: Some("openai/whisper-tiny".into()), lang_hint: Some("es".into()), ..Default::default() }); // Anthropic voice_stream → Candle fallback: let chain = cloud::anthropic_then_candle(oauth_token, candle_cfg.clone()); // Or OpenAI / Groq REST → Candle fallback: // let chain = cloud::openai_then_candle(api_key, candle_cfg.clone()); // let chain = cloud::groq_then_candle(api_key, candle_cfg); let transcript = cloud::transcribe_file_with_chain( std::path::Path::new("/tmp/voice-note.ogg"), &chain, Some("es"), ).await?; }
The fallback fires on transport errors (HTTP 5xx, network
unreachable, WebSocket disconnect). Hard audio errors
(EmptyAudio, UnsupportedFormat, Decode) short-circuit —
the next leg would hit the same problem on the same bytes.
Anthropic voice_stream — Claude.ai OAuth required
AnthropicVoiceStream connects to the same endpoint Claude
Code uses internally:
wss://api.anthropic.com/api/ws/speech_to_text/voice_stream.
Requires a Claude.ai subscriber OAuth token (not a regular
Anthropic API key — different auth surface).
Wire format (linear16 PCM @ 16 kHz mono, JSON control frames,
binary audio frames). The SDK collapses the streaming
endpoint to a one-shot call: open WS, send buffer, send
{"type":"CloseStream"}, drain until the 4-trigger finalize
state machine resolves (PostCloseStreamEndpoint @ ~300 ms /
NoDataTimeout @ 1.5 s / SafetyTimeout @ 5 s / WsClose). Live
push-to-talk streaming is a deferred follow-up — see
FOLLOWUPS.md
91.x.wasm.phase-4b.streaming.
WASM (wasm32-unknown-unknown) — REST cloud works, voice_stream deferred
The pure-Rust local backends (stt-candle Candle + stt
whisper-rs) don't compile for wasm32-unknown-unknown today
— the inference stack depends on crates that need kernel
networking (mio) or aren't WASM-clean (opus-wave,
tokenizers with onig, Candle's GEMM kernels).
REST cloud STT works on wasm32. Enable stt-cloud-wasm
(the wasm-clean sibling of stt-cloud — reqwest pulled
without rustls-tls, browser fetch API handles TLS). OpenAI
Whisper-1 + Groq Whisper-large-v3 + the CompositeProvider
fallback chain are fully supported in browser microapps.
SttProvider trait drops Send + Sync bounds + uses
async_trait(?Send) on wasm32 because the wasm-bindgen
fetch backend returns futures holding js-sys types that
aren't Send (single-threaded execution model — the bounds
were a native-only thing anyway).
Cross-target microapps select the right feature per-target in their own Cargo.toml:
[target.'cfg(target_arch = "wasm32")'.dependencies]
nexo-microapp-sdk = { workspace = true, features = ["stt-cloud-wasm"] }
[target.'cfg(not(target_arch = "wasm32"))'.dependencies]
nexo-microapp-sdk = { workspace = true, features = ["stt-cloud", "stt-cloud-anthropic", "stt-cloud-local-candle"] }
stt-cloud-anthropic (voice_stream WebSocket) is still
native-only — tokio-tungstenite drags TCP types absent on
wasm32. Browser microapps
wanting voice_stream would need a web-sys::WebSocket-based
swap-in (filed as 91.x.wasm.phase-4c).
Voice (TTS) is portable everywhere
The voice feature uses pure-Rust crates (opus-wave, symphonia,
ogg) plus a websocket call to Microsoft Edge's TTS endpoint. No
C/C++ build, no system audio framework — works the same on Linux,
macOS, Windows, and Termux.
Channels — what Rust + the host OS support
Channels (WhatsApp / Telegram / browser / email) ship as standalone subprocess plugins. Each plugin is its own Rust binary and inherits the same OS support matrix as the daemon:
| Channel | Linux | macOS | Windows | Termux | Notes |
|---|---|---|---|---|---|
| ✅ | ✅ | ✅ | ✅ | Uses Signal Protocol via the wa-agent upstream crate; pure Rust, all-platform | |
| Telegram | ✅ | ✅ | ✅ | ✅ | Bot API long-poll; pure Rust |
| Browser | ✅ | ✅ | ✅ | ⚠️ Chrome must be in PATH; Termux needs pkg install chromium | |
| ✅ | ✅ | ✅ | ✅ | IMAP poll + lettre SMTP; rustls-tls everywhere |
Browser channel caveat — Chromium availability
The browser plugin spawns a Chromium instance via Chrome DevTools
Protocol. The plugin doesn't bundle Chromium; it shells out to
whatever Chrome / Chromium / Edge is in PATH:
- macOS:
brew install --cask google-chromeor use an existing Chrome install (/Applications/Google Chrome.app/...path is auto-detected). - Windows: install Chrome from https://www.google.com/chrome/ and let the plugin auto-detect at default install path.
- Linux servers (headless): install via your distro
(
apt install chromium) — the plugin runs Chromium headless by default. - Termux:
pkg install chromium— note that Termux's chromium package is significantly older than upstream and some CDP features may misbehave.
What's intentionally NOT in scope today
| Wanted by users? | Why deferred |
|---|---|
Homebrew formula (brew install nexo-rs) | Requires the macOS targets to land first + a release of the binary on those targets. The tap repo is created; the formula auto-publish will turn on as part of the Phase 27.2 follow-up. |
npm install -g @nexo-rs/cli | The @nexo-rs/cli npm scope is reserved with a placeholder; the real CLI shim ships when cargo dist re-enables npm in dist-workspace.toml installers. |
| Native Windows MSI / PowerShell installer | Same dist-workspace dependency. The .zip from GH Releases works in the meantime. |
| Apple Silicon / Intel Mac via Homebrew | Tap exists, formula not auto-pushed yet. Curl installer covers both Intel + Apple Silicon directly. |
Reporting platform-specific issues
If nexo --version runs but a particular feature breaks on your
OS, file an issue with the version line + the relevant build
channel (printed by nexo version in verbose mode):
nexo version | head -5
# nexo 0.1.6
# git_sha: …
# channel: tarball-x86_64-apple-darwin
# target: x86_64-apple-darwin
Tag the issue with os:macos, os:windows, os:termux, etc., so
we can track per-platform regressions across releases.
Setup wizard
The setup wizard is the recommended way to configure nexo-rs on a fresh install. It pairs channels, writes secrets, and patches the YAML config files so the runtime boots with everything it needs.
./target/release/agent setup
Run it from the repo root (or wherever your config/ directory lives).
What the wizard does
flowchart TD
START([agent setup]) --> MENU{Menu}
MENU --> LLM[LLM provider]
MENU --> WA[WhatsApp pairing]
MENU --> TG[Telegram bot]
MENU --> GOOG[Google OAuth]
MENU --> MEM[Memory DB location]
MENU --> INFRA[NATS + runtime]
MENU --> SKILLS[Enable / disable skills]
LLM --> WRITE1[Write secrets/<br/>patch llm.yaml]
WA --> QR[Scan QR<br/>write session dir]
TG --> TOKEN[Ask bot token<br/>write secret]
GOOG --> OAUTH[Open browser<br/>PKCE flow]
MEM --> WRITE2[Patch memory.yaml]
INFRA --> WRITE3[Patch broker.yaml]
SKILLS --> WRITE4[Patch extensions.yaml]
WRITE1 --> DONE([Done])
QR --> DONE
TOKEN --> DONE
OAUTH --> DONE
WRITE2 --> DONE
WRITE3 --> DONE
WRITE4 --> DONE
Every step is optional. You can run setup repeatedly — each section
is idempotent.
Steps in detail
LLM provider
Prompts for the default provider (MiniMax, Anthropic, OpenAI-compat,
Gemini). Writes the API key to ./secrets/<provider>_api_key.txt and
ensures config/llm.yaml references it via ${file:...} or the
corresponding env var.
WhatsApp pairing (multi-instance)
Per-agent. Asks which agent you are pairing and which instance label
to use (personal, work, …). Each instance gets its own session
dir under ./data/workspace/<agent>/whatsapp/<instance> and an
allow_agents list (defense-in-depth ACL). The wizard:
- Normalises
config/plugins/whatsapp.yamlto sequence form (legacy single-mapping entries are auto-converted on first edit). - Upserts the entry by instance label.
- Writes
credentials.whatsapp: <instance>on the chosen agent's YAML —agents.yamlif the agent lives there, otherwise the matchingagents.d/*.yaml. - Launches the pairing loop and renders the QR as Unicode blocks. Scan with WhatsApp → Settings → Linked Devices.
- Runs the credential gauntlet so any drift surfaces immediately.
Re-run the wizard once per number you want to pair; instance labels are append-friendly.
Telegram bot (multi-instance)
Same shape as WhatsApp. Asks for instance label (default
<agent>_bot) and bot token from @BotFather. Token lands at
./secrets/<instance>_telegram_token.txt with mode 0o600; the
YAML references it via ${file:...} so secrets never live in
telegram.yaml directly. Adds credentials.telegram: <instance>
on the agent.
Google OAuth
The wizard writes one entry per agent in
config/plugins/google-auth.yaml:
google_auth:
accounts:
- id: ana@google
agent_id: ana
client_id_path: ./secrets/ana_google_client_id.txt
client_secret_path: ./secrets/ana_google_client_secret.txt
token_path: ./secrets/ana_google_token.json
scopes: [https://www.googleapis.com/auth/gmail.modify]
Two consent flows are offered after the YAML is written:
- Device-code (default — works headless / over SSH): the wizard
prints
verification_url+ a 6-characteruser_code. Open the URL on any device, type the code, approve. The wizard pollsoauth2.googleapis.com/tokenuntil approval and persists the refresh_token attoken_path(mode0o600). - Skip and consent later via the
google_auth_startLLM tool — uses the loopback PKCE flow, requires a local browser.
Scopes are comma-separated at the prompt; defaults to
gmail.modify. Re-running with a different id adds a second
account; re-running with the same id overwrites in place.
Memory DB location
Lets you pick where the SQLite long-term memory file lives. Default is
./data/memory.db. Per-agent isolation is on by default — each agent
gets its own DB file under its workspace.
Infrastructure (NATS + runtime)
Asks for the NATS URL, optional user/password, and timeouts. Patches
config/broker.yaml.
Skills on/off
Lets you selectively disable shipped extensions you don't plan to use (reduces tool surface exposed to the LLM).
Files the wizard touches
| Target | What it writes |
|---|---|
config/llm.yaml | Provider entries, base_url, auth mode |
config/plugins/whatsapp.yaml | session_dir, media_dir |
config/plugins/telegram.yaml | token (via ${file:...}), allow-list |
config/plugins/google.yaml | OAuth bundle path, scopes |
config/memory.yaml | DB location |
config/broker.yaml | NATS URL, creds |
config/extensions.yaml | enabled/disabled list |
./secrets/* | Plaintext secret files (gitignored) |
Every YAML patch preserves existing keys and comments via the
yaml_patch module — your hand edits survive.
Re-running
Re-run agent setup as many times as you want. Paired channels are
detected and skipped unless you explicitly ask to re-pair. To wipe a
paired session:
./target/release/agent setup wipe whatsapp --agent ana
Troubleshooting
- WhatsApp QR expires too fast → the QR refreshes every ~20s; the wizard re-renders. Scan from the phone with a stable network.
- Google OAuth fails with
redirect_uri_mismatch→ the wizard binds to127.0.0.1:<port>; make sure your OAuth client allowshttp://127.0.0.1as a redirect URI. - NATS unreachable → the wizard will warn but still write config. The runtime's disk queue will drain once NATS comes back.
Verifying releases
Every Nexo release artifact is signed with Sigstore Cosign using keyless OIDC — no long-lived private key, no PGP key management, no out-of-band trust establishment. The signature is tied to the GitHub Actions workflow run that produced the artifact, and a public record lives in the Rekor transparency log.
Why keyless
Traditional signing requires a long-lived signing key. If it leaks, every past release becomes suspect. Keyless signing instead anchors each signature to:
- The GitHub Actions OIDC identity of the workflow run
(
https://token.actions.githubusercontent.com) - The specific repo + workflow file that ran
(
https://github.com/lordmacu/nexo-rs/.github/workflows/...) - The commit + ref the workflow built from
A short-lived certificate (10 min validity) is issued by Sigstore's
fulcio CA, the artifact is signed with it, and the whole bundle
is recorded in rekor (immutable). To forge a signature, an
attacker would need to compromise GitHub's OIDC infra and the
exact workflow path — and even then the forgery shows up in the
public log.
Install Cosign
# macOS:
brew install cosign
# Linux (Debian/Ubuntu):
curl -L "https://github.com/sigstore/cosign/releases/latest/download/cosign-linux-amd64" \
-o /usr/local/bin/cosign
chmod +x /usr/local/bin/cosign
# Linux (Fedora/RHEL):
sudo dnf install cosign
# Verify the install:
cosign version
Verify a Docker image
Every image at ghcr.io/lordmacu/nexo-rs is cosign-signed by the
docker.yml workflow. Verify any tag with:
cosign verify ghcr.io/lordmacu/nexo-rs:latest \
--certificate-identity-regexp 'https://github.com/lordmacu/nexo-rs/.*' \
--certificate-oidc-issuer https://token.actions.githubusercontent.com
A successful verification prints the full certificate + the Rekor entry URL. Anything else (signature missing, identity mismatch, broken cert chain) means don't trust this image — check the release notes, file an issue.
Verify a downloaded binary / .deb / .rpm / .tar.gz
The sign-artifacts.yml workflow attaches three files next to
every release asset:
<asset>.sig— the raw signature<asset>.pem— the leaf certificate<asset>.bundle— combined Sigstore bundle (preferred; carries the inclusion proof)
Verify with the bundle (recommended, single command):
cosign verify-blob \
--bundle nexo-rs_0.1.1_amd64.deb.bundle \
--certificate-identity-regexp 'https://github.com/lordmacu/nexo-rs/.*' \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
nexo-rs_0.1.1_amd64.deb
Or with the standalone .sig + .pem if you prefer:
cosign verify-blob \
--signature nexo-rs_0.1.1_amd64.deb.sig \
--certificate nexo-rs_0.1.1_amd64.deb.pem \
--certificate-identity-regexp 'https://github.com/lordmacu/nexo-rs/.*' \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
nexo-rs_0.1.1_amd64.deb
Verify in CI / scripted contexts
Drop this in a deploy pipeline:
#!/usr/bin/env bash
set -euo pipefail
ASSET="${1:?usage: $0 <asset-path>}"
BUNDLE="${ASSET}.bundle"
if [ ! -f "$BUNDLE" ]; then
echo "ERROR: $BUNDLE missing — refusing to deploy unsigned artifact" >&2
exit 1
fi
cosign verify-blob \
--bundle "$BUNDLE" \
--certificate-identity-regexp 'https://github.com/lordmacu/nexo-rs/.*' \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
"$ASSET" \
|| { echo "ERROR: signature verification failed for $ASSET" >&2; exit 2; }
Inspecting the transparency log
Every signature is searchable on Rekor:
# Search by artifact sha256:
cosign tree ghcr.io/lordmacu/nexo-rs:latest
The output shows every cosign-related artifact attached to the image (signatures, attestations, SBOMs) plus the Rekor log index where each was recorded.
What if verification fails
- Identity regex doesn't match — the asset may have been built from a fork / unofficial workflow. Re-download from the GitHub release page directly.
bundlefile missing — older releases (pre-Phase 27.3) don't have signatures. Tagv0.1.1is the first signed release.- Cert chain expired / revoked — Sigstore's
fulcioroot CA has a long lifespan, but the leaf cert is short-lived.cosignautomatically fetches the right TUF root; if you see chain errors runcosign initializeto refresh local trust roots. - Network errors talking to Rekor / Fulcio — both have CDN
in front. Retry, or use
--insecure-ignore-tlogfor local verification (drops the transparency log check — only safe in air-gapped trust contexts).
Out of scope (for now)
- Long-lived PGP keys for the apt / yum repos — needs Phase 27.4 signed-repo work to consume them on the user side. Until that ships, .deb / .rpm signatures live in the Cosign world only.
- A Homebrew bottle-signing path that lets
brewvalidate without the OIDC chain — Phase 27.6 follow-up.
Configuration layout
nexo-rs loads configuration from a single directory (passed via
--config <path>, default ./config). The runtime reads a small set
of required YAML files and a handful of optional ones.
Source: crates/config/src/lib.rs::AppConfig::load.
Directory tree
config/
├── agents.yaml # required — base agent catalog
├── agents.d/ # optional — drop-in agents, merged in alpha order
│ ├── ana.example.yaml # template (committed)
│ └── *.yaml # real definitions (gitignored)
├── broker.yaml # required — NATS / local broker + disk queue
├── llm.yaml # required — LLM providers
├── memory.yaml # required — short-term + long-term + vector
├── extensions.yaml # optional — extension search paths, toggles
├── mcp.yaml # optional — MCP servers the agent consumes
├── mcp_server.yaml # optional — expose this agent as an MCP server
├── tool_policy.yaml # optional — per-tool / per-agent policy
├── runtime.yaml # optional — hot-reload watcher settings
├── plugins/
│ ├── whatsapp.yaml
│ ├── telegram.yaml
│ ├── email.yaml
│ ├── browser.yaml
│ ├── google.yaml
│ └── gmail-poller.yaml
└── docker/ # optional — overrides for containerized runs
├── agents.yaml
├── llm.yaml
└── …
Required vs optional
The loader fails startup if any required file is missing or malformed.
Optional files return None when absent and unlock related features
only if present.
| File | Kind |
|---|---|
agents.yaml | required |
broker.yaml | required |
llm.yaml | required |
memory.yaml | required |
extensions.yaml | optional |
mcp.yaml | optional |
mcp_server.yaml | optional |
tool_policy.yaml | optional |
runtime.yaml | optional — process runtime knobs: hot-reload + cron policy (one-shot retries and optional cron tool-call execution). Defaults enable reload at 500 ms debounce, one-shot retries (3 attempts, exponential backoff), and keep cron tool-calls disabled. See Config hot-reload. |
plugins/*.yaml | optional (only needed for plugins you enable) |
Drop-in agents
Files under config/agents.d/*.yaml are merged into the base
agents.yaml in lexicographic filename order. Each file has the
same top-level shape (agents: [...]); entries append to the base
list.
Common patterns:
00-dev.yaml/10-prod.yaml— control override order by numeric prefix- Keep
agents.yamlpublic-safe and drop sensitive business content (sales prompts, pricing, phone numbers) into gitignoredconfig/agents.d/ana.yaml - Ship
config/agents.d/<name>.example.yamlas a template so the shape stays discoverable
Details in Drop-in agents.
Docker layout
config/docker/ mirrors the main layout and is consumed when the
compose file mounts it at /app/config/docker:
# docker-compose.yml
command: ["agent", "--config", "/app/config/docker"]
Secrets inside Docker containers live at /run/secrets/<name> — the
compose definitions use ${file:/run/secrets/...} references. See
LLM config — auth for the full secret
resolution rules.
Env vars and secrets in YAML
YAML values can reference env vars and files:
| Syntax | Meaning |
|---|---|
${VAR} | read env var, fail if unset or empty |
${VAR:-fallback} | env var if set and non-empty, else fallback |
${VAR-fallback} | env var if set (even empty), else fallback |
${file:./secrets/x} | read file contents, trimmed of whitespace |
Path-traversal rules for ${file:...}:
- Relative paths are rooted at the current working directory
..segments are rejected outright- Absolute paths must sit under one of these whitelisted roots:
/run/secrets/(Docker secrets)/var/run/secrets/(Kubernetes projected volumes)./secrets/(project-local)- the directory pointed at by
$CONFIG_SECRETS_DIR(operator-defined)
Everything else is refused at parse time with an explicit error naming the invalid path and the allowed roots.
Validation
All config structs deserialize with #[serde(deny_unknown_fields)], so
typos fail fast:
unknown field `modl`, expected `model`
at line 4, column 5 in config/agents.yaml
Missing required fields produce the same kind of message:
missing field `model`
at line 5, column 3 in config/agents.yaml
Env / file resolution errors identify the placeholder and the file:
env var MINIMAX_API_KEY not set (referenced in llm.yaml)
${file:../etc/passwd}: `..` not allowed in file reference (in broker.yaml)
Boot sequence
flowchart TD
START([agent --config path]) --> LOAD[AppConfig::load]
LOAD --> REQ{required files<br/>present & parseable?}
REQ -->|no| FAIL([fail fast, exit 1])
REQ -->|yes| OPT[read optional files]
OPT --> DROP[merge config/agents.d/]
DROP --> RESOLVE[resolve env / file placeholders]
RESOLVE --> VAL[struct-level validation<br/>deny_unknown_fields]
VAL --> SEM[semantic validation<br/>validate_agents, MCP headers]
SEM --> READY([AppConfig ready])
Next
- agents.yaml — full agent schema
- llm.yaml — LLM provider schema + auth modes
- broker.yaml — NATS + disk queue
- memory.yaml — short/long/vector
- Drop-in agents — merge order and patterns
agents.yaml
The agent catalog. One entry per agent; each entry declares the model, channels, tools, sandboxing, and behavioral knobs for that agent.
Source: crates/config/src/types/agents.rs.
Top-level shape
agents:
- id: ana
model:
provider: minimax
model: MiniMax-M2.5
plugins: [whatsapp]
inbound_bindings:
- plugin: whatsapp
allowed_tools:
- whatsapp_send_message
outbound_allowlist:
whatsapp:
- "573000000000"
system_prompt: |
You are Ana, …
Full field reference
All fields use #[serde(deny_unknown_fields)] — typos fail fast.
Identity & model
| Field | Type | Required | Default | Purpose |
|---|---|---|---|---|
id | string | ✅ | — | Unique agent id. Used as session key, subject suffix, workspace dir name. |
model.provider | string | ✅ | — | Provider key in llm.yaml (e.g. minimax, anthropic). |
model.model | string | ✅ | — | Model id understood by that provider. |
description | string | — | "" | Human-readable role. Injected into # PEERS for delegation discovery. |
Channels
| Field | Type | Default | Purpose |
|---|---|---|---|
plugins | [string] | [] | Plugin ids this agent wants to expose tools for (whatsapp, telegram, browser, …). |
inbound_bindings | array | [] | Per-plugin binding list. Empty = receive nothing from plugin.inbound.* (strict mode). |
Each inbound_bindings[] entry can override the agent-level
defaults for that channel: allowed_tools, outbound_allowlist,
skills, model, system_prompt_extra, sender_rate_limit,
allowed_delegates. Useful for running the same agent on two channels
with different rules. See Per-binding capability override
below for the full override surface and merge rules.
Binding match rules are strict on (plugin, instance):
instanceomitted/null only matchesplugin.inbound.<plugin>instance: fooonly matchesplugin.inbound.<plugin>.foo
Tool sandboxing
| Field | Type | Default | Purpose |
|---|---|---|---|
allowed_tools | [string] | [] | Build-time pruning of the tool registry. Glob suffix * allowed. Empty = all tools registered. |
tool_rate_limits | object | null | Per-tool rate limit patterns. Glob-matched. |
tool_args_validation.enabled | bool | true | Toggle JSON-schema validation of tool arguments. |
outbound_allowlist | object | {} | Per-plugin recipient allowlist (e.g. phone numbers, chat ids). Defense-in-depth for send tools. |
allowed_tools semantics:
- For legacy agents (no
inbound_bindings) the allowlist is applied at registry-build time — tools not matching the patterns are removed from the registry before the LLM sees them. - For agents with
inbound_bindingsthe base registry keeps every tool and enforcement happens per-binding at turn time (see Per-binding capability override) so a binding's override can both narrow AND expand within the registry. Defense-in-depth: the LLM only receives tools allowed by the matched binding, and the tool-call execution path rejects any hallucinated name outside the same allowlist.
In both modes the LLM never receives disallowed tool definitions; the difference is where the filter is applied.
System prompt & workspace
| Field | Type | Default | Purpose |
|---|---|---|---|
system_prompt | string | "" | Prepended to every LLM turn. Defines persona, rules, examples. |
workspace | path | "" | Directory with IDENTITY.md, SOUL.md, USER.md, AGENTS.md, MEMORY.md. Loaded at turn start. See Soul, identity & learning. |
extra_docs | [path] | [] | Workspace-relative markdown files appended as # RULES — <filename>. |
transcripts_dir | path | "" | Directory for per-session JSONL transcripts. Empty = disabled. |
skills_dir | path | "./skills" | Base directory for local skill files. |
skills | [string] | [] | Local skill ids to inject into the system prompt. Resolved from skills_dir. |
language | string | null | Output language for the LLM's reply. ISO code ("es", "en", "en-US") or human name ("Spanish", "español"). When set, the runtime renders a # OUTPUT LANGUAGE system block telling the model to keep workspace docs in English (single source of truth, plays nicely with recall + dreaming) but reply to the user in the configured language. Per-binding language overrides this for the matched channel. See Output language. |
Heartbeat
heartbeat:
enabled: true
interval: 30s
| Field | Type | Default | Purpose |
|---|---|---|---|
heartbeat.enabled | bool | false | Turn heartbeat on for this agent. |
heartbeat.interval | humantime | "5m" | Interval between on_heartbeat() fires. |
See Agent runtime — Heartbeat.
Runtime knobs
config:
debounce_ms: 2000
queue_cap: 32
| Field | Type | Default | Purpose |
|---|---|---|---|
config.debounce_ms | u64 | 2000 | Debounce window for burst-of-messages coalescing. |
config.queue_cap | usize | 32 | Per-agent mailbox capacity. |
sender_rate_limit.rps | f64 | — | Per-sender token-bucket refill rate. |
sender_rate_limit.burst | u64 | — | Bucket size. |
Agent-to-agent delegation
| Field | Type | Default | Purpose |
|---|---|---|---|
allowed_delegates | [glob] | [] | Peers this agent may delegate to. Empty = no restriction. |
accept_delegates_from | [glob] | [] | Inverse gate: peers allowed to delegate to this agent. |
Routing uses agent.route.<target_id> over NATS with a
correlation_id. See Event bus — Agent-to-agent routing.
Dreaming (memory consolidation)
dreaming:
enabled: false
interval_secs: 86400
min_score: 0.35
min_recall_count: 3
min_unique_queries: 2
max_promotions_per_sweep: 20
weights:
frequency: 0.24
relevance: 0.30
recency: 0.15
diversity: 0.15
consolidation: 0.10
Defaults shown. See Soul — Dreaming.
Workspace-git
workspace_git:
enabled: false
author_name: "agent"
author_email: "agent@localhost"
When enabled, the agent's workspace directory is a git repo that the
runtime commits to after dream sweeps, forge_memory_checkpoint, and
session close. Good for forensic replay.
Google auth (per-agent OAuth)
google_auth:
client_id: ${GOOGLE_CLIENT_ID}
client_secret: ${file:./secrets/google_secret.txt}
scopes:
- https://www.googleapis.com/auth/gmail.readonly
token_file: ./data/workspace/ana/google_token.json
redirect_port: 17653
Used by crates/plugins/google to run OAuth PKCE per agent.
Deprecated in Phase 17 — prefer declaring Google accounts in a
dedicated config/plugins/google-auth.yaml and binding them from
credentials.google (see next section). Inline google_auth still
boots with a warn so existing deployments keep working; it is
auto-migrated into the credential store at startup.
Credentials (per-agent WhatsApp / Telegram / Google)
Pins each agent to the plugin instance / Google account it may use for outbound traffic. The runtime resolves the target at publish time from the agent id — the LLM cannot pick the instance via tool args, closing the prompt-injection vector.
credentials:
whatsapp: personal # must match whatsapp.yaml instance label
telegram: ana_bot # must match telegram.yaml instance label
google: ana@gmail.com # must match google-auth.yaml accounts[].id
# Silence the "inbound ≠ outbound" warning when intentional:
# telegram_asymmetric: true
Validated at boot by the gauntlet (agent --check-config runs the same
checks without starting the daemon). Omitting credentials: keeps the
legacy single-account behavior for back-compat.
Full schema + migration guide:
config/credentials.md.
Relationship diagram
flowchart LR
AG[agent entry] --> MOD[model provider]
AG --> PL[plugins list]
AG --> IB[inbound_bindings]
AG --> AT[allowed_tools]
AG --> OA[outbound_allowlist]
AG --> WS[workspace]
AG --> HB[heartbeat]
AG --> DEL[delegation gates]
IB -->|per-binding override| AT
IB -->|per-binding override| OA
MOD -->|resolved from| LLM[llm.yaml]
PL -->|tools from| PLUG[plugins/*.yaml]
WS -->|files| SOUL[SOUL.md /<br/>IDENTITY.md /<br/>MEMORY.md]
Per-binding capability override
A single agent can expose distinct capability surfaces per
InboundBinding without running two agent processes. Typical use:
the same Ana agent answers WhatsApp with a narrow sales-only surface
and Telegram with the full catalogue.
Schema
Every inbound_bindings[] entry accepts the following optional
overrides. Unset fields inherit the agent-level value.
| Field | Type | Strategy | Notes |
|---|---|---|---|
allowed_tools | [string] | replace | ["*"] = every registered tool |
outbound_allowlist | object | replace (whole) | Whatsapp/telegram recipient lists |
skills | [string] | replace | Resolved from agent-level skills_dir |
model | object | replace | Must keep the same provider |
system_prompt_extra | string | append | Rendered as # CHANNEL ADDENDUM block |
sender_rate_limit | inherit | disable | {rps, burst} | 3-way | Untagged enum |
allowed_delegates | [string] | replace | Peer allowlist for the delegate tool |
language | string | replace | Output language for replies on this channel. Falls through to the agent-level language field when omitted. See Output language. |
Anything else (workspace, transcripts_dir, heartbeat, memory,
workspace_git, google_auth) stays at the agent level — identity
and persistent state do not change per channel.
Example
agents:
- id: ana
model: { provider: anthropic, model: claude-haiku-4-5 }
plugins: [whatsapp, telegram]
workspace: ./data/workspace/ana
skills_dir: ./skills
system_prompt: |
You are Ana.
allowed_tools: [] # agent-level = permissive; bindings narrow
outbound_allowlist: {}
inbound_bindings:
- plugin: whatsapp
allowed_tools: [whatsapp_send_message]
outbound_allowlist:
whatsapp: ["573115728852"]
skills: []
sender_rate_limit: { rps: 0.5, burst: 3 }
system_prompt_extra: |
Channel: WhatsApp sales. Follow the ETB/Claro lead flow.
- plugin: telegram
instance: ana_tg
allowed_tools: ["*"]
outbound_allowlist:
telegram: [1194292426]
skills: [browser, github, openstreetmap]
model: { provider: anthropic, model: claude-sonnet-4-5 }
allowed_delegates: ["*"]
sender_rate_limit: disable
system_prompt_extra: |
Channel: private Telegram. Full tool access allowed.
Boot-time validation
The runtime rejects configs with:
- Duplicate
(plugin, instance)tuples in the same agent. - Telegram
instancereferenced by a binding but not declared inconfig/plugins/telegram.yaml. - Binding
model.providerdifferent from the agent-level provider (the LLM client is wired once per agent). - Skills listed in a binding whose directory does not exist under
skills_dir.
A binding that sets no overrides is allowed but logs a warn.
Matching order
Bindings are evaluated top-to-bottom; the first match wins. Because
matching is strict on the instance axis, {plugin: telegram, instance: null}
does not capture plugin.inbound.telegram.admin traffic.
Runtime isolation
- Tool list shown to the LLM is filtered through the binding's
allowed_tools; tools hidden on WhatsApp remain invisible even if the LLM hallucinates the name. - Tool-call execution re-checks the allowlist and returns
not_allowedfor anything outside — stops hallucination loops without executing the forbidden tool. - Outbound tools (
whatsapp_send_message,telegram_send_message) readoutbound_allowlistfrom the matched binding, so WhatsApp sends on the sales channel cannot reach numbers that only the private channel allows. - Sender rate limit buckets are keyed per binding; flood on one channel cannot drain the quota on another.
Back-compat
Agents without inbound_bindings do not consume plugin inbound events.
Internal runtime paths that are not plugin inbound (for example
heartbeat/delegation paths) still synthesize an effective policy from
agent-level defaults.
Output language
Operators pin the language an agent replies in without rewriting
workspace markdown. Workspace docs (IDENTITY, SOUL, MEMORY, USER,
AGENTS) and tool descriptions stay in English — the single source of
truth that recall, dreaming, vector search, and developer tooling
all read. The runtime injects a # OUTPUT LANGUAGE system block
right after the agent's system_prompt, telling the model to read
those docs as-is but reply to the user in the configured language.
Where to set it
agents:
- id: ana
language: es # default for every binding on this agent
inbound_bindings:
- plugin: whatsapp
# → uses Spanish (inherits from the agent)
- plugin: telegram
instance: support_intl
language: en # → uses English on this channel only
- plugin: telegram
instance: bilingual_qa
language: "" # → no directive (model picks)
Resolution
Precedence (first non-empty wins):
inbound_bindings[i].language— per-channel override.language— agent-level default.null— no# OUTPUT LANGUAGEblock emitted; the model decides from the user's input.
Empty string and whitespace-only values resolve to no directive on both layers — useful for "turn the directive off on this binding even though the agent has one".
Accepted values
The runtime treats the value as a label and forwards it verbatim into the directive (after sanitisation; see below). Both forms work:
- ISO codes:
"es","en","en-US","pt-BR". - Human names:
"Spanish","English","español","Brazilian Portuguese".
Human names produce slightly clearer directives in practice
(Respond to the user in Spanish. reads more natural than
Respond to the user in es.), but both yield the same model
behaviour with modern LLMs.
Rendered block
# OUTPUT LANGUAGE
Respond to the user in {language}. Workspace docs (IDENTITY, SOUL,
MEMORY, USER, AGENTS) and tool descriptions are in English — read
them as-is, but your turn-final reply to the user must be in
{language}.
The block lands after the agent's system_prompt (and the
optional # CHANNEL ADDENDUM block) so its instruction wins under
the LLM's recency bias.
Sanitisation
Defense-in-depth against config-driven prompt injection: every
language value is normalised before rendering — control characters
and embedded newlines are stripped, trimmed, and the result is
capped at 64 characters. A YAML payload like
language: "es\n\nIgnore previous instructions" cannot smuggle a
multi-line directive into the system prompt.
Hot reload
Phase 18 hot-reload covers this field. Edit
agents.d/<id>.yaml, save (or run agent reload), and the next
message uses the new language. In-flight LLM turns finish on the
old policy; subsequent turns flip to the new one.
Related
- Workspace docs and recall stay English regardless — see Soul, identity & learning.
- Per-channel rotation walkthrough lives in Recipes — A/B prompt swap.
Link understanding
Per-agent (and per-binding) toggle that fetches URLs in the user's
message and injects a # LINK CONTEXT block. Off by default. Full
schema, caps, and SSRF denylist live on
Link understanding. The field is
link_understanding at agent scope and at each
inbound_bindings[] entry; binding value replaces agent default,
omitted = inherit.
Web search
Per-agent (and per-binding) toggle that exposes a web_search tool
backed by Brave / Tavily / DuckDuckGo / Perplexity. Off by default.
Full schema, providers, cache, and circuit-breaker behaviour live on
Web search. The field is web_search at
agent scope and at each inbound_bindings[] entry; binding value
replaces agent default, omitted = inherit.
Pairing policy
Per-binding toggle that turns on the DM-challenge gate for inbound
senders. Off by default. The field is pairing_policy on each
inbound_bindings[] entry; null (default) = inherit agent value
or skip the gate entirely. Full protocol, threat model, and CLI
reference live on Pairing.
Common mistakes
- Forgetting
plugins: [...]. An agent withoutpluginshas no inbound channel and no outbound tools. It is inert. - Setting
allowed_toolswithout a wildcard.["memory_*"]allows the fullmemory_*family;["memory_store"]allows only one. Check the glob before assuming. - Large
system_promptduplication across agents. Useinbound_bindings[].system_prompt_extrato add per-channel content without duplicating the whole prompt. - Sharing a WhatsApp session across agents. Each agent's
workspaceshould contain its ownwhatsapp/defaultsession; the wizard does this automatically, but pointing two agents at the same session dir will cause message cross-delivery. - Translating the workspace markdown to match
language. Don't. Workspace docs are the single source of truth read by recall, dreaming, and developer tooling — keep them in English. The# OUTPUT LANGUAGEblock tells the model to translate the reply on its way out.
Next
- Drop-in agents — merging multiple agent files
- llm.yaml — where
model.provideris resolved - Skills catalog — names that go in
allowed_tools
MiniMax M2.5
MiniMax M2.5 is the primary LLM provider for nexo-rs. It's the first provider implemented and the recommended default for new agents.
Source: crates/llm/src/minimax.rs, crates/llm/src/minimax_auth.rs.
Why it's primary
- Strong tool-calling support on both the OpenAI-compat wire and the Anthropic Messages wire
- Token Plan auth lets you run agents on a subscription without per-request billing headaches
- Aggressive price/performance for multi-agent deployments
If you don't have a specific reason to pick another provider, start with MiniMax.
Configuration
# config/llm.yaml
providers:
minimax:
api_key: ${MINIMAX_API_KEY:-}
group_id: ${MINIMAX_GROUP_ID:-}
base_url: https://api.minimax.io
rate_limit:
requests_per_second: 2.0
quota_alert_threshold: 100000
Per-agent selection:
# config/agents.d/ana.yaml
agents:
- id: ana
model:
provider: minimax
model: MiniMax-M2.5
Wire formats (api_flavor)
MiniMax exposes two HTTP shapes. The client auto-detects from
base_url but can be overridden via api_flavor.
api_flavor | Endpoint | Shape | When |
|---|---|---|---|
openai_compat (default) | {base_url}/text/chatcompletion_v2 | OpenAI chat completions | Regular API keys, most use cases |
anthropic_messages | {base_url}/v1/messages | Anthropic Messages | Token Plan / Coding keys served at api.minimax.io/anthropic |
Auto-detection: if base_url ends in /anthropic, the client picks
anthropic_messages automatically.
Authentication
Static API key
Simple path: put the key in env or a secrets file.
Env var precedence (first wins):
MINIMAX_CODE_PLAN_KEYMINIMAX_CODING_API_KEY./secrets/minimax_code_plan_key.txtapi_keyfield inllm.yaml
Token Plan OAuth bundle
For subscription-based access. The wizard writes a bundle to
./secrets/minimax_token_plan.json:
{
"access_token": "...",
"refresh_token": "...",
"expires_at": "2026-05-01T12:00:00Z",
"region": "https://api.minimax.io"
}
Auto-refresh: 60 seconds before expires_at, a background task
POSTs to {region}/oauth/token with grant_type=refresh_token and
rewrites the bundle atomically. Concurrent refreshes are serialized
behind a mutex — you never get two refresh calls in flight.
Mid-flight 401: if an API call returns 401 while holding what we thought was a valid token (clock skew, revocation), the client force-refreshes once and retries the request. A second 401 is surfaced as a credential error.
Shared OAuth client id for the MiniMax Portal flow:
78257093-7e40-4613-99e0-527b14b39113.
Request / response flow
sequenceDiagram
participant A as Agent loop
participant RL as RateLimiter
participant C as MiniMaxClient
participant AU as AuthSource
participant MX as MiniMax API
A->>C: chat(ChatRequest)
C->>RL: acquire()
C->>AU: fresh_bearer()
AU->>AU: refresh if <60s to expiry
AU-->>C: access_token
C->>MX: POST chatcompletion_v2 / v1/messages
alt 200
MX-->>C: ChatResponse
else 401
C->>AU: force_refresh()
C->>MX: retry once
else 429
MX-->>C: Retry-After
C-->>A: LlmError::RateLimit
else 5xx
MX-->>C: error body
C-->>A: LlmError::ServerError
end
Supported features
| Feature | OpenAI-compat | Anthropic-messages |
|---|---|---|
| Chat completions | ✅ | ✅ |
| Tool calling | ✅ | ✅ |
| Streaming (SSE) | ✅ | ✅ |
| Token usage in stream | ✅ (stream_options.include_usage) | ✅ native |
| Multimodal (images) | ✅ | ✅ |
| JSON mode | ✅ | limited |
Rate limiting
Per-provider token bucket. requests_per_second: 2.0 refills one slot
every 500 ms. Acquired before every request.
An optional quota_alert_threshold emits a structured warn log when
the remaining quota (if the provider reports it) crosses the threshold.
Useful for Prometheus alerting.
Error classification
| Response | Mapping | Behavior |
|---|---|---|
| 429 | LlmError::RateLimit { retry_after_ms } | Retried by the LLM retry layer (up to 5 attempts) |
| 5xx | LlmError::ServerError { status, body } | Retried (up to 3 attempts) |
| 401 | Internal auth refresh + single retry, then LlmError::CredentialInvalid | Fail-fast after refresh attempt |
| Other 4xx | LlmError::Other | Fail fast |
Common mistakes
- Forgetting
group_id. MiniMax requires a group id alongside the key for most endpoints. The wizard sets this; manual configs often miss it. - Pointing
base_urlat/anthropicwith a regular API key. That endpoint is for Token Plan / Coding keys only — regular keys will 401. Leavebase_urlathttps://api.minimax.io. - Refreshing the bundle manually mid-flight. The client already serializes refreshes. Editing the file while the agent runs can lead to an atomic write race — stop the agent, edit, restart.
Short-term memory
Per-session conversational buffer held entirely in memory. Tracks the turns of the ongoing conversation so the LLM has context on every completion request.
Source: crates/core/src/session/ (types.rs, manager.rs) — the
Session struct owns the short-term buffer.
What lives in a session
Each Session stores:
| Field | Type | Purpose |
|---|---|---|
history | Vec<Interaction> | FIFO of turns (role + content + timestamp) |
context | serde_json::Value | Free-form JSON blob for per-session state |
last_access | timestamp | Used by TTL sweeper and cap eviction |
An Interaction is {role: User | Assistant | Tool, content, timestamp}.
Sliding window — max_history_turns
short_term:
max_history_turns: 50
Hard cap, sliding FIFO. When history.len() > max_history_turns, the
oldest entry is removed on the next push:
flowchart LR
MSG[new turn] --> PUSH[history.push]
PUSH --> CHECK{len > max?}
CHECK -->|no| DONE[done]
CHECK -->|yes| DROP[history.remove(0)]
DROP --> DONE
Old content is lost, not promoted. If you need long-term
persistence, the agent must explicitly call the memory tool with
action remember. See Long-term memory.
Session cap and eviction
short_term:
max_sessions: 10000
Soft cap across the whole process. On overflow, the oldest-idle
session (lowest last_access) is evicted to make room. Eviction
fires the on_expire callbacks — used by workspace-git to
checkpoint before tearing down the session.
max_sessions: 0 disables the cap (unbounded). Leave it at the default
unless you have a specific reason — the cap is DoS protection against
a spammer rotating chat_ids.
TTL sweeper
short_term:
session_ttl: 24h
Sessions expire after session_ttl of inactivity. The sweeper runs
every ttl / 4 (so every 6 h with the default 24 h TTL) and drops
expired sessions.
stateDiagram-v2
[*] --> Active: first message
Active --> Active: message / event<br/>(last_access updated)
Active --> Expired: idle > session_ttl
Active --> Evicted: cap exceeded
Expired --> [*]: sweeper
Evicted --> [*]: on_expire callbacks fire
Expiry also fires on_expire — good place to hook session-close
commits to a workspace-git repo.
Relationship to other memory layers
flowchart LR
STM[short-term<br/>in-memory Vec] -.->|tool call:<br/>memory.remember| LTM[(long-term<br/>SQLite)]
LTM -.->|vector enabled| VEC[(sqlite-vec)]
STM -.->|transcripts_dir| TR[(JSONL transcripts)]
STM -.->|session close| WSG[(workspace-git)]
STM does not auto-promote to LTM. Promotion happens via:
- Explicit
memory.remembertool call from the agent - Dream sweeps (Phase 10.6) that scan recall-event signals and promote hot memories
- Session-close commits to workspace-git if enabled
Gotchas
- Lost turns are gone. Once a turn falls off the sliding window it is not recoverable. If it mattered, save it to LTM before the next turn.
max_sessions: 0has no DoS guard. Only do this in single-tenant setups where you control the sender id space.last_accessupdates on any access. That includes heartbeat ticks if they read the session — effectively keeping a session alive past its TTL as long as the agent is alive.
End-to-end WhatsApp channel: Signal Protocol pairing, inbound message bridge, outbound send/reply/reaction/media tools, optional voice transcription.
Source: standalone repo at
nexo-rs-plugin-whatsapp
(extracted from crates/plugins/whatsapp/ per Phase 81.19.a;
see PHASES.md
for the migration notes). The crate ships as a lib + bin
Shape B package: the lib re-exports WhatsappPlugin for an
Android embedded host tomorrow, the bin is the subprocess
entrypoint the daemon spawns per cfg.plugins.whatsapp entry
(Phase 81.18.b.2). Internally the plugin wraps the wa-agent
(a.k.a. whatsapp-rs) crate for Signal Protocol session
lifecycle, QR pairing and the Bot API surface.
Install (Phase 81.18.b.2 — operator action required)
The daemon stopped constructing WhatsappPlugin in-tree as of
Phase 81.18.b.2; it spawns the standalone subprocess binary
per cfg entry. Operators with cfg.plugins.whatsapp populated
must install the binary and surface its directory through
plugins.discovery.search_paths before starting the daemon, or
the discovery walker logs a clear warning and the plugin never
boots:
# Recommended — download the pre-built tarball from the plugin's
# GitHub Releases into the daemon's plugin dir:
nexo plugin install lordmacu/nexo-plugin-whatsapp
nexo plugin list
# Or build from source:
cargo install --git https://github.com/lordmacu/nexo-plugin-whatsapp
nexo plugin install lands the binary + plugin.toml under
<state_dir>/plugins/whatsapp/, which the daemon's discovery
walker scans by default — no search_paths edit needed. If you
build with cargo install --git instead, point discovery at the
install dir in agents.yaml:
plugins:
discovery:
search_paths:
- ~/.cargo/bin # or wherever you installed the binary
Each cfg.plugins.whatsapp[] entry maps to one subprocess; per-
instance state (session_dir Signal Protocol creds, media_dir,
instance topic suffix, bridge.response_timeout_ms,
acl.allow_list) is seeded into the child via
NEXO_PLUGIN_WHATSAPP_* env vars at spawn time. Multi-account
operators get true process isolation — one bot's
creds.json corruption can't take down the others.
The admin RPC /whatsapp/<instance>/pair* HTTP endpoints keep
working: a daemon-side broker subscriber
(spawn_whatsapp_pairing_state_subscriber) listens on
plugin.inbound.whatsapp.> and mirrors the subprocess's
Connected / Disconnected / Reconnecting / Qr events
into a daemon-owned PairingState per instance.
Known limitation (Phase 81.20.c follow-up)
Subprocess whatsapp instances do not currently surface
AgentEventKind::PeerTyping events on the SSE live transcript
stream. The daemon's AgentEventEmitter Arc doesn't cross the
process boundary; bridging typing events through the broker
ships in follow-up 81.20.c.typing-presence-rpc. Inbound
message routing, outbound dispatch, pairing UI, and reconnect
telemetry are unaffected.
Topics
| Direction | Subject | Notes |
|---|---|---|
| Inbound | plugin.inbound.whatsapp | Legacy single-account |
| Inbound | plugin.inbound.whatsapp.<instance> | Multi-account routing |
| Outbound | plugin.outbound.whatsapp | Legacy single-account |
| Outbound | plugin.outbound.whatsapp.<instance> | Multi-account routing |
During pairing the plugin also publishes qr lifecycle events on the
inbound topic so the wizard can render the QR.
Config
# config/plugins/whatsapp.yaml
whatsapp:
enabled: true
session_dir: "" # empty → per-agent default
media_dir: ./data/media/whatsapp
instance: default
acl:
allow_list: [] # empty + empty env = open ACL
from_env: WA_AGENT_ALLOW
behavior:
ignore_chat_meta: true
ignore_from_me: true
ignore_groups: false
bridge:
response_timeout_ms: 30000
on_timeout: noop # noop | apology_text
transcriber:
enabled: false
skill: whisper
public_tunnel:
enabled: false
only_until_paired: true
Key fields:
| Field | Default | Purpose |
|---|---|---|
session_dir | per-agent | Signal Protocol state. Each account needs its own dir. |
instance | None | Label for multi-account routing. Unlabelled keeps the legacy bare topic. |
allow_agents | [] | Agents permitted to publish from this instance. Empty = accept any agent holding a resolver handle. Defense-in-depth for the per-agent credentials binding. |
acl.allow_list | [] | Bare JIDs allowed to reach the agent. Empty + empty env = open. |
behavior.ignore_chat_meta | true | Skip muted / archived / locked chats on the phone. |
behavior.ignore_from_me | true | Drop the agent's own replies to prevent loops. |
behavior.ignore_groups | false | Skip group chats entirely when true. |
bridge.response_timeout_ms | 30000 | Per-message handler deadline. |
bridge.on_timeout | noop | noop (no reply) or apology_text. |
transcriber.enabled | false | Voice → text via skill. |
public_tunnel.enabled | false | Expose /whatsapp/pair through a Cloudflare tunnel. |
public_tunnel.only_until_paired | true | Tear down the tunnel after Connected. |
Pairing
Pairing is setup-time only. The runtime refuses to start without paired credentials.
sequenceDiagram
participant U as Operator
participant W as agent setup
participant WA as whatsapp-rs Client
participant P as Phone
U->>W: setup pair whatsapp --agent ana
W->>WA: new_in_dir(session_dir)
WA-->>W: QR image
W-->>U: render QR (Unicode blocks)
U->>P: Settings → Linked Devices → scan
P->>WA: pair
WA-->>W: Connected
W->>W: persist creds to session_dir/.whatsapp-rs/creds.json
- Credentials at
<session_dir>/.whatsapp-rs/creds.json - Daemon-collision check at
<session_dir>/.whatsapp-rs/daemon.jsonblocks a second process on the same account - Multi-account via
Client::new_in_dir()— no XDG_DATA_HOME mutation - Credential expiry mid-run (401 loop) → operator must re-pair; no runtime QR fallback
Tools exposed to the LLM
| Tool | Signature | Notes |
|---|---|---|
whatsapp_send_message | (to, text) | Send to arbitrary JID. |
whatsapp_send_reply | (chat, reply_to_msg_id, text) | Quote a specific inbound message. |
whatsapp_send_reaction | (chat, msg_id, emoji) | Emoji tap-back. |
whatsapp_send_media | (to, file_path, caption?, mime?) | File attachment. |
All tools honor the per-binding outbound_allowlist.whatsapp —
empty list = unrestricted, populated = hard allowlist.
Event shapes
Inbound payloads (on plugin.inbound.whatsapp[.<instance>]):
// message
{
"kind": "message",
"from": "573000000000@s.whatsapp.net",
"chat": "573000000000@s.whatsapp.net",
"text": "hi",
"reply_to": null,
"is_group": false,
"timestamp": 1714000000,
"msg_id": "3EB0..."
}
// media_received
{
"kind": "media_received",
"from": "...",
"chat": "...",
"msg_id": "...",
"local_path": "./data/media/whatsapp/abc.jpg",
"mime": "image/jpeg",
"caption": null
}
// qr (pairing only)
{"kind": "qr", "ascii": "...", "png_base64": "...", "expires_at": ...}
// lifecycle
{"kind": "connected" | "disconnected" | "reconnecting" | "credentials_expired"}
// observability
{"kind": "bridge_timeout", "msg_id": "...", "waited_ms": 30000}
Presence indicators
While the agent prepares a reply, the WhatsApp plugin pulses the
<chatstate> stanza on the peer phone so the user sees a live
"escribiendo…" / "grabando audio…" indicator instead of dead
silence. The wire shape matches what WhatsApp Web emits natively:
<!-- text reply (default) -->
<chatstate to="JID"><composing/></chatstate>
<!-- voice note about to be sent -->
<chatstate to="JID"><composing media="audio"/></chatstate>
<!-- pulse stops -->
<chatstate to="JID"><paused/></chatstate>
The plugin switches the media attr automatically based on the
outbound OutboundReplyKind:
- Text reply →
<composing/>for the LLM round-trip; pauses before the message lands. - Voice note (PTT) →
<composing/>while the LLM thinks, flips to<composing media="audio"/>~250 ms before the upload + ack so the peer client has time to repaint "grabando audio…", then pauses. - Image / video / document → not media-flagged in v1 (queued as follow-up).
Proactive voice notes (microapp-driven, no inbound trigger) get the same recording-presence wrap via the outbound dispatcher, so the indicator is consistent regardless of who initiated the send.
typing_mode knob
Plugin-instance YAML override. Default reproduces the historic behaviour.
whatsapp:
enabled: true
session_dir: ...
typing_mode: instant # default; see table below
| Value | v1 behaviour |
|---|---|
instant | Heartbeat starts the moment the handler is invoked. Recommended default. |
thinking | Documented for parity with future reasoning-stream support; v1 falls back to instant + warn-log. |
message | Documented for parity with future first-text-delta support; v1 falls back to instant + warn-log. |
never | Skips the heartbeat entirely. Use when the bot should stay invisible (no presence cycling at all). |
Unknown values warn-degrade to instant rather than failing
boot, so a YAML typo cannot wedge the daemon.
The keepalive cadence (10 s), TTL safety cap (60 s) and
consecutive-failure circuit breaker (2 strikes) are not exposed
as YAML knobs in v1 — the defaults are what every agent wants.
Crate consumers that need other values can pass a
PresenceHeartbeatConfig through
Session::chat_presence_heartbeat_with directly.
Old-client compatibility
Pre-2021 WhatsApp clients ignore the media attribute and paint
"escribiendo…" regardless. That's a degradation but harmless: the
voice note still arrives; only the indicator lies. Affects
<0.5 % of installs.
Idioma del agente y voz (locale BCP-47)
The agent's language field accepts a full BCP-47 locale —
es-AR, es-ES, es-US, en-GB, pt-BR, etc. — and the
runtime honours both the language and the region for three
things on every turn:
-
Per-locale system addendum locks the LLM into the regional register: voseo for
es-AR(vos,tenés,podés), tuteo + castellano vocab fores-ES(vosotros,vale,coger), Spanglish-aware fores-US(loanwords likeemail/parkingnot auto-translated), British spelling + vocab foren-GB, etc. Operators shippinglanguage: "es"(no region) get a Latam-neutral tuteo template. -
Voice-mode SSML tutorial — when voice mode is toggled for the conversation, the marker tutorial appended to the system prompt uses the locale's native register (so the examples don't teach the LLM a dialect it shouldn't speak).
-
Default Edge voice — when the per-conversation
voice_idis the install-wide default, the picker resolves a region-matched voice:Locale Voice es-ARes-AR-ElenaNeurales-MXes-MX-DaliaNeurales-ESes-ES-ElviraNeurales-COes-CO-SalomeNeurales-PEes-PE-CamilaNeurales-CLes-CL-CatalinaNeurales-USes-US-PalomaNeuralen-USen-US-AriaNeuralen-GBen-GB-SoniaNeuralen-AUen-AU-NatashaNeuralen-CAen-CA-ClaraNeuralpt-BRpt-BR-FranciscaNeuralpt-PTpt-PT-RaquelNeuralfr-FRfr-FR-DeniseNeuralfr-CAfr-CA-SylvieNeuralit-ITit-IT-ElsaNeuralde-DEde-DE-KatjaNeuralja-JPja-JP-NanamiNeuralzh-CNzh-CN-XiaoxiaoNeuralLanguage-only locales fall back to the canonical region (
es→es-MX,en→en-US,pt→pt-BR, …). Operators with a manually-pickedvoice_idkeep their choice; the picker only fires when the stored voice is the install default.
The supported locale set is closed (lives in
nexo_microapp_sdk::Locale); unsupported strings (klingon,
es-419, zh-Hant) are rejected by the admin RPC with
invalid_locale so a YAML typo cannot reach the daemon.
Behaviour change — language: "es" agents
Before this change, language: "es" agents inherited an
Argentine voseo flavour from the legacy voice-mode addendum
constant. The new behaviour routes language: "es" to the
Latam-neutral template (tuteo, no voseo). Operators who
want the previous Argentine flavour set language: "es-AR"
explicitly.
Gotchas
- Shared
session_diracross agents = cross-delivery. Each agent should point at its own<workspace>/whatsapp/default. The wizard does this automatically; manual configs need care. ignore_chat_meta: truesilently skips muted/archived chats. If a user archives a chat on the phone, the agent never sees it again until they unarchive.- Credential expiry is irreversible without re-pair.
whatsapp-rswill loop on 401. Watch forcredentials_expiredlifecycle events and alert.
See Setup wizard — WhatsApp pairing.
Skills catalog
nexo-rs uses "skill" to mean two different things. Both are covered on this page; gating semantics for each live in Gating by env / bins.
- Extension skills — shipped under
extensions/in the repo, discovered and spawned like any other stdio extension. 22 of them landed in Phase 13. - Local skills — markdown files under an agent's
skills_dir/that get injected into the system prompt at turn start.
The two overlap in name but not in mechanism:
| Extension skill | Local skill | |
|---|---|---|
| Where it lives | extensions/<id>/ with plugin.toml | skills/<name>/SKILL.md |
| How it's loaded | Extension discovery → stdio spawn | SkillLoader at turn time |
| What it produces | Tools in ToolRegistry | Text injected into the prompt |
| Gating | Warn + continue, tools still registered | Warn + skip entirely |
Extension skills (Phase 13)
All shipped as stdio extensions written in Rust. _common is a shared
Rust library (circuit-breaker primitives), not an extension itself.
Core utilities
| Id | Purpose | Requires |
|---|---|---|
weather | Current + forecast via Open-Meteo (no auth). | — |
openstreetmap | Forward / reverse geocoding via Nominatim. | — |
wikipedia | Article search + summaries. | — |
fetch-url | HTTP GET / POST with SSRF guard, retries, circuit breaker. | — |
rss | Fetch & parse RSS / Atom / JSON feeds. | — |
dns-tools | A/AAAA/MX/TXT/NS/SOA/SRV + reverse + whois. | — |
endpoint-check | HTTP probe (status + latency) + TLS cert inspection. | — |
pdf-extract | Extract text from PDFs. | — |
translate | LibreTranslate self-hosted or DeepL API. | — |
summarize | Chat-based text/file summary via OpenAI-compat endpoint. | — |
openai-whisper | Audio transcription via OpenAI-compat /audio/transcriptions. | — |
Search & knowledge
| Id | Purpose | Requires |
|---|---|---|
brave-search | Web search. | env BRAVE_SEARCH_API_KEY |
goplaces | Google Places text search + details. | — |
wolfram-alpha | Computational queries (short + full pods). | env WOLFRAM_APP_ID |
Infra & ops
| Id | Purpose | Requires | Write-gate |
|---|---|---|---|
github | REST API: PRs, checks, issues. | env GITHUB_TOKEN | — |
cloudflare | DNS, zones, cache purge. | env CLOUDFLARE_API_TOKEN | — |
docker-api | ps, inspect, logs, stats, start, stop, restart. | bin docker | env DOCKER_API_ALLOW_WRITE |
proxmox | Proxmox VE: nodes, VMs, containers, lifecycle. | env PROXMOX_TOKEN | env PROXMOX_ALLOW_WRITE, env PROXMOX_INSECURE_TLS for self-signed certs |
onepassword | 1Password secrets metadata; reveal gated. | bin op, env OP_SERVICE_ACCOUNT_TOKEN | env OP_ALLOW_REVEAL |
ssh-exec | Remote command execution with host allowlist. | bin ssh, scp | host allowlist in config |
tmux-remote | Drive tmux sessions (create, send keys, capture, kill). | bin tmux | — |
Media & content
| Id | Purpose | Requires |
|---|---|---|
msedge-tts | Text-to-speech via Edge Read Aloud. | — |
rtsp-snapshot | Frames / clips from RTSP or HTTP camera streams. | bin ffmpeg |
video-frames | Extract frames + audio from videos. | bin ffmpeg, ffprobe |
tesseract-ocr | OCR with language packs + PSM modes. | bin tesseract |
yt-dlp | Download video / audio / metadata. | bin yt-dlp |
spotify | Now-playing, search, play, pause, skip. | env SPOTIFY_ACCESS_TOKEN |
Google (phase 13.18)
Single google extension covering 32 tools across Gmail,
Calendar, Tasks, Drive, People, and Photos. Uses OAuth refresh-token
flow. Writes gated by five independent env flags:
GOOGLE_ALLOW_SEND— Gmail sendGOOGLE_ALLOW_CALENDAR_WRITEGOOGLE_ALLOW_DRIVE_WRITEGOOGLE_ALLOW_TASKS_WRITEGOOGLE_ALLOW_PEOPLE_WRITE
See Plugins — Google for the OAuth setup and
the generic google_call tool that fronts the extension.
LLM providers (phase 13.19)
anthropic and gemini are native LLM clients living under
crates/llm/, not extensions. See
LLM providers and children.
Templates
| Id | Purpose | Language |
|---|---|---|
template-rust | Copy-and-edit skeleton (ping, add). | Rust |
template-python | stdlib-only skeleton. | Python |
Local skills
Local skills are markdown files loaded by SkillLoader and injected
into the system prompt at turn time. Defined in the agent config:
# agents.yaml
agents:
- id: kate
skills_dir: ./skills
skills:
- weather
- github
- summarize
- google-auth
Each entry resolves to <skills_dir>/<name>/SKILL.md:
---
name: "Weather"
description: "Current conditions and forecasts"
requires:
bins: ["curl"]
env: ["WEATHER_API_KEY"]
max_chars: 5000
---
# Weather skill
Call `weather_forecast(city)` to get a 3-day forecast.
Use metric units. Default to the user's locale when unspecified.
Bundled local skills currently shipped in this repo:
| Id | Purpose |
|---|---|
loop | Bounded auto-iteration: run a prompt up to max_iters until until_predicate matches (regex, exit, or judge). |
stuck | Bounded auto-debug for repeated cargo build / cargo test failures via failing_command, max_rounds, focus_pattern, and evidence-first diagnosis. |
simplify | Bounded code simplification for a file/hunk via target, scope, max_passes, preserve_behavior (dead code, redundant guards, duplication, naming). |
verify | Bounded acceptance verification via acceptance_criterion, candidate_commands, max_rounds, judge_mode (command evidence + explicit judge decision). |
skillify | Capture a repeatable workflow and convert it into a reusable local SKILL.md with explicit inputs, steps, guardrails, and output contract. |
remember | Memory-hygiene review flow: classify/promote/dedupe/conflict-resolve memory artifacts before applying any changes. |
update-config | Safe config-edit skill for Nexo: map behavior changes to config/*.yaml, apply read-before-write merges, and surface hot-reload vs restart requirements. |
loop can be attached from setup wizard like any other skill (nexo setup →
Configurar agente → Skills) because it is registered in the setup skill
catalog and requires no secrets.
stuck is also attachable from setup wizard and requires no secrets.
simplify is also attachable from setup wizard and requires no secrets.
verify is also attachable from setup wizard and requires no secrets.
skillify is also attachable from setup wizard and requires no secrets.
remember is also attachable from setup wizard and requires no secrets.
update-config is also attachable from setup wizard and requires no secrets.
Loading flow
flowchart TD
CFG[agents.yaml skills: list] --> LOOP[for each name]
LOOP --> READ[read skills_dir/name/SKILL.md]
READ --> FM[parse YAML frontmatter]
FM --> GATE{bins on PATH<br/>AND env set?}
GATE -->|no| SKIP[warn + skip<br/>not injected]
GATE -->|yes| RENDER[render into prompt:<br/>heading + blockquote + body]
RENDER --> TRUNC[truncate to max_chars]
TRUNC --> INJECT[inject into system prompt]
Why local skills skip-on-miss (vs extensions warn-and-continue)
A local skill is a text instruction to the LLM describing a capability. If the backing bin/env isn't available the tool will fail — but worse, the LLM was told the capability exists and will repeatedly try to use it. Skipping the skill prevents lying to the model.
An extension is a registered tool. If the LLM invokes it and the backing bin is missing, the tool returns an error — the LLM observes and adapts. Warn-and-continue is fine.
See Gating for the full semantics.
How to pick
- Need the LLM to know how to do something (usage pattern, style rules, examples)? → local skill.
- Need the LLM to do something (make a call, return data)? → extension skill.
- Both? → ship the extension and write a local skill next to it that explains when to use it.
Plugin quickstart — zero to installed in 10 minutes
Phase 31.9. Linear, copy-paste path from empty directory to a plugin running inside an operator's daemon. Take this page once end-to-end before reading Plugin authoring overview, the Rust SDK reference, or the Plugin contract — those documents make sense faster after you have shipped one toy plugin.
Single-language path (Rust). Python / TypeScript / PHP
quickstarts share the exact same shell commands; only the
--lang flag and the in-repo source tree differ. Pointers to the
sister SDKs appear at the bottom.
What you build
A plugin called hello_plugin that echoes every event arriving
on its inbound topic onto plugin.inbound.hello_plugin_echo. By
the end of this page:
- A new GitHub repository under your account holds the plugin's source.
- A signed (optional) GitHub release ships per-target tarballs.
- An operator on a separate host runs
nexo plugin install <you>/<repo>and the daemon spawns your plugin inside the next 10 seconds, then logs the handshake.
This is the same pipeline that ships the in-house plugins — see
github.com/lordmacu/nexo-plugin-browser for a real-world
output of the same quickstart, scaled up to 12 tools.
Prerequisites
| Tool | Version | Why |
|---|---|---|
| Rust toolchain | 1.80+ (rustup recommended) | Build the plugin binary. |
nexo CLI | 0.1.6+ on PATH | plugin new, plugin run, plugin install. |
git | any | Push to GitHub. |
| GitHub account + a repo you can push to | — | Releases host the install artifacts. |
cosign (optional) | 2.x | Sign releases for operators on --require-signature. Skip until step 9. |
Verify each:
cargo --version # cargo 1.80+
nexo --version # nexo 0.1.6+
git --version
gh auth status # if using `gh` CLI for the repo create
If nexo is not yet on PATH:
curl -fsSL https://lordmacu.github.io/nexo-rs/install.sh | bash
(or cargo install nexo-rs) — then make sure ~/.cargo/bin is on
PATH.
1. Scaffold
The CLI bundles the same Rust template that produces the
in-tree template-plugin-rust — one command lands a fresh
project on disk:
nexo plugin new hello_plugin --lang rust --owner alice
cd hello_plugin
Flags that matter:
<id>— the plugin's globally unique id. Must satisfy^[a-z][a-z0-9_]{0,31}$. It becomes the prefix for any tool name, channel kind, or config namespace your plugin contributes.--lang rust— switch topython,typescript, orphpfor the matching template. The remaining steps are identical.--owner alice— your GitHub username. Used in the generated README + CI workflow's release URL.--description "..."— optional one-liner; flows into the manifest + README + Cargo description.--git— runsgit initfor you and stages the initial commit.--dest /custom/path— emit elsewhere (default is./<id>/).
Re-running the command on a non-empty directory aborts; pass
--force only if you mean it.
2. Inspect what landed
hello_plugin/
├── Cargo.toml # name = "hello_plugin", bin name matches plugin.id
├── nexo-plugin.toml # manifest — read by the daemon at handshake
├── README.md # operator-facing docs (edit these later)
├── scripts/
│ └── pack-tarball.sh # the per-target tarball packer the CI uses
├── src/
│ └── main.rs # tokio::main + PluginAdapter
└── tests/
└── pack_tarball.rs # regression test for the asset shape
Two files you must know intimately. Open both:
nexo-plugin.toml:
[plugin]
id = "hello_plugin"
version = "0.1.0"
name = "Hello Plugin"
description = "Echoes inbound events back onto the broker."
min_nexo_version = ">=0.1.0"
[plugin.requires]
nexo_capabilities = ["broker"]
[[plugin.channels.register]]
kind = "hello_plugin_inbound"
description = "Inbound events the plugin emits onto the broker."
[plugin.entrypoint]
command = "./bin/hello_plugin" # resolved relative to this file
plugin.entrypoint.command = "./bin/hello_plugin" is the
Phase 31.1.c install convention. The daemon's
discovery walker reads the manifest, then spawns whatever
command resolves to, relative to the manifest's containing
directory. The pack-tarball step (step 8) copies your release
binary into bin/hello_plugin so this entrypoint resolves on
the operator host.
src/main.rs (truncated):
use nexo_broker::Event; use nexo_microapp_sdk::plugin::{BrokerSender, PluginAdapter}; const MANIFEST: &str = include_str!("../nexo-plugin.toml"); #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { tracing_subscriber::fmt() .with_writer(std::io::stderr) .init(); PluginAdapter::new(MANIFEST)? .on_broker_event(handle_event) .on_shutdown(|| async { Ok(()) }) .run_stdio() .await?; Ok(()) } async fn handle_event(topic: String, event: Event, broker: BrokerSender) { let echo = Event::new( "plugin.inbound.hello_plugin_echo", "hello_plugin", serde_json::json!({ "echoed_from": topic, "echoed_payload": event.payload, }), ); let _ = broker.publish("plugin.inbound.hello_plugin_echo", echo).await; }
The contract is small: build a PluginAdapter from your manifest
text, register handlers, call run_stdio().await. The SDK owns
the JSON-RPC envelope — you only see decoded Events.
Stdout discipline — every byte on stdout must be a JSON-RPC frame. Use
eprintln!/tracing::*for plugin-side logs; a strayprintln!will corrupt the wire and the daemon will tear the subprocess down at handshake.
3. Build
cargo build
A debug binary lands at target/debug/hello_plugin in well
under a second on a warm cache (mold + sccache, configured
machine-wide on the dev box, are not required — vanilla cargo
works too).
4. Smoke-test the handshake
Two probes the daemon performs at boot. Both must pass before
the plugin shows up in nexo plugin list.
4.a — --print-manifest
The discovery walker (Phase 81.33 Stage 8) invokes each
nexo-plugin-* binary with --print-manifest and reads the
embedded TOML from stdout. Confirm yours obeys:
./target/debug/hello_plugin --print-manifest
Expected output: verbatim contents of nexo-plugin.toml,
followed by exit 0. The scaffold wires the
print_manifest_if_requested(MANIFEST) call into the first line
of main(); if you see logs, JSON-RPC frames, or empty stdout,
the helper is missing.
4.b — initialize handshake
Hand-feed a JSON-RPC initialize frame to verify the wire shape:
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' \
| ./target/debug/hello_plugin
Expected output (one line of JSON, formatted here for readability):
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"manifest": { "plugin": { "id": "hello_plugin", ... } },
"server_version": "hello_plugin-0.1.0",
"tools": []
}
}
If you see anything else — extra blank lines, panic backtrace, malformed JSON — fix that before moving on. The daemon will reject the same frame your terminal saw.
5. Local dev loop
Boot a daemon with this directory injected at the head of
plugins.discovery.search_paths. No install, no GitHub round
trip, no signature verification — pure inner-loop dev:
nexo plugin run .
Expected stderr trace:
INFO local plugin override applied (plugin_id=hello_plugin)
INFO subprocess plugin spawned (id=hello_plugin, pid=...)
INFO hello_plugin starting
INFO subprocess plugin handshake ok (id=hello_plugin, version=0.1.0)
The plugin is now live inside the daemon. Ctrl+C tears both processes down cleanly.
Edit src/main.rs, re-run cargo build, and the daemon's
hot-reload walker (Phase 81.10) re-spawns the subprocess
automatically — no daemon restart.
For isolated contract debugging without your real
agents.yaml, add --no-daemon-config. See
Local dev loop conventions
for the inner-loop reference.
6. Customize the handler
Replace the body of handle_event with whatever your plugin
should do — call a third-party API, persist to disk, trigger a
downstream agent. Re-publish the API's reply back through
broker.publish so agents can observe it.
Eight common shapes (channel plugin, poller, hybrid bridge, etc.) are pre-baked in Patterns. Browse those once before designing — most plugins fit one of those shapes exactly.
7. Push to GitHub
gh repo create alice/hello_plugin --public --source=. --push
Or, with plain git:
git init && git add . && git commit -m "initial plugin"
git remote add origin git@github.com:alice/hello_plugin.git
git push -u origin main
The scaffolded repo already contains
.github/workflows/release.yml (the Phase 31.2 template). The
workflow fires on tag push (v*).
8. Cut a release
Bump the version in Cargo.toml and nexo-plugin.toml (must
match), then tag and push:
# bump both files first — keep them in lock-step
git add Cargo.toml nexo-plugin.toml
git commit -m "v0.1.0"
git tag v0.1.0
git push origin main v0.1.0
Watch the workflow in the GitHub Actions tab. The four jobs shipped by the template:
validate-tag— refuses to release if the manifest version does not match the tag.build— compiles the plugin forlinux-x64andmacos-arm64(extend the matrix yourself for more targets), runspack-tarball.shto produce thebin/hello_plugin+nexo-plugin.tomlarchive layout operators expect, and uploads<id>-<version>-<target>.tar.gzplus.sha256sidecars.sign(optional, gated by repo variableCOSIGN_ENABLED) — cosign keyless signature against the GitHub Actions OIDC identity. See Signing & publishing for the full keyless tutorial.release— creates the GitHub Release, attaches every tarball, every.sha256, the barenexo-plugin.toml(sonexo plugin installcan resolve manifest URL early), and the optional cosign material.
When the workflow turns green, the release page contains the
exact asset shape nexo plugin install resolves against. See
Publishing a plugin for the full asset naming
convention.
9. Install on an operator host
Three install routes, depending on how you shipped the plugin:
9.a — cargo install (zero config)
Recommended for crates.io-published plugins. The daemon's
discovery defaults already cover $HOME/.cargo/bin:
cargo install nexo-plugin-hello_plugin
nexo plugin list
The discovery walker invokes the binary with --print-manifest
on next daemon boot (or next hot-reload tick under Phase 81.10),
extracts the embedded TOML, and registers the plugin. No
operator YAML edit. Authoring detail:
Auto-discovery quickstart.
9.b — nexo plugin install (GitHub releases)
When you ship tarballs to GitHub releases instead of (or in addition to) crates.io:
nexo plugin install alice/hello_plugin@v0.1.0
The installer (Phase 31.1.c):
- Hits
https://api.github.com/repos/alice/hello_plugin/releases/tags/v0.1.0. - Picks the tarball matching the host's target triple.
- Downloads tarball +
.sha256, verifies the digest. - Optionally verifies cosign signature (default off; flip on
with
--require-signatureafter configuring trusted keys — see Plugin trust). - Extracts to
<state_root>/plugins/hello_plugin-0.1.0/{nexo-plugin.toml, bin/hello_plugin, .nexo-install.json}. - Records the install in the per-host install ledger.
The daemon's
plugins.discovery.search_paths defaults include
$HOME/.local/share/nexo/plugins and
/usr/local/libexec/nexo/plugins. Move (or symlink) the
extracted directory under one of those to skip the YAML edit,
or point a custom search_paths entry at
<state_root>/plugins/.
9.c — Drop-in (manual)
Copy the binary into any default search path:
cp ./target/release/hello_plugin ~/.cargo/bin/nexo-plugin-hello_plugin
chmod +x ~/.cargo/bin/nexo-plugin-hello_plugin
Re-discovery happens on next daemon boot (or hot-reload).
10. Verify
nexo plugin list
Expected output:
ID VERSION TARGET CHANNEL INSTALLED
hello_plugin 0.1.0 x86_64-unknown-linux-gnu latest 2026-05-07T18:20:42+00:00
Daemon stderr should show the handshake within ~5 seconds:
INFO subprocess plugin spawned (id=hello_plugin, pid=...)
INFO subprocess plugin handshake ok (id=hello_plugin, version=0.1.0)
Publish anything onto an inbound topic the plugin subscribed to
(e.g. agent broker publish-equivalent inside your microapp)
and the echo lands on plugin.inbound.hello_plugin_echo.
Iterate
# bump in source, push tag
git tag v0.1.1 && git push origin v0.1.1
# operator-side
nexo plugin upgrade hello_plugin # pulls latest tag
# or pin: nexo plugin upgrade hello_plugin --version v0.1.1
nexo plugin upgrade (Phase 31.8) atomically swaps the on-disk
copy + restarts the subprocess inside the daemon. To roll back,
re-run install against the older tag.
To remove:
nexo plugin remove hello_plugin
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
nexo plugin install errors no asset matching target <triple>. | CI matrix did not build for the operator's host. | Add the missing triple to the workflow matrix, re-tag. |
Daemon stderr shows subprocess exited at handshake (status=...). | Plugin wrote non-JSON-RPC bytes to stdout (most likely a stray println!) or panicked before the handshake. | Re-run ./target/debug/<id> against a synthetic frame from step 4 — the panic is reproducible there. |
nexo plugin list does not show your plugin after install. | Daemon's plugins.discovery.search_paths does not include <state_root>/plugins/. | Add it to config/plugins/discovery.yaml and restart. |
nexo plugin install errors signature required. | Operator runs with --require-signature and your release was unsigned. | Sign with cosign — see Signing & publishing. |
Plugin runs locally with nexo plugin run but the published binary panics on the operator host. | Per-target build skipped a runtime dep (e.g. linked OpenSSL on Linux but not on the operator's distro). | Switch to vendored-openssl or static-link in Cargo.toml; rebuild. |
Operator on --require-signature rejects your release with cosign verify failed. | Trusted-keys file does not include your identity issuer. | Operator adds your GitHub identity to config/extensions/trusted_keys.toml. See Plugin trust. |
Going deeper
You shipped one plugin. From here:
- Plugin authoring overview — the full picture (plugin vs extension vs microapp, plugin config dir, sandboxing, contributing tools / channel kinds / LLM providers / memory backends / hooks).
- Rust SDK reference — full
PluginAdaptersurface, manifest schema, per-target tarball convention. - Plugin contract — the wire spec every SDK implements. Read this once and you can debug any plugin in any language.
- Patterns (8 common shapes) — pre-baked designs for channels, pollers, hybrid bridges.
- Publishing a plugin — full asset naming convention and the 4-job CI workflow shape.
- Signing & publishing — cosign keyless tutorial.
Other languages
Same flow, swap step 1's --lang:
| SDK | Scaffold | Reference |
|---|---|---|
| Python | nexo plugin new hello --lang python --owner alice | Python SDK |
| TypeScript | nexo plugin new hello --lang typescript --owner alice | TypeScript SDK |
| PHP | nexo plugin new hello --lang php --owner alice | PHP SDK |
Steps 2–10 read identically; the source tree differs (no
Cargo.toml, language-appropriate runtime). The wire contract
is the same.
Plugin authoring overview
Phase 31.9. Entry point for authors building anything that extends nexo-rs from the outside. This page gets you to the right deeper guide in 60 seconds.
Read this when
- You want to add capability to nexo-rs and have not yet picked between a plugin, an extension, or a microapp.
- You have picked "plugin" and need to know which language SDK to start with.
- You want a 5-minute end-to-end smoke test before committing to a language choice.
Plugin vs Extension vs Microapp
nexo-rs ships three extension surfaces. They differ in who owns the runtime, who owns the UI, and how operators install them.
| You're building | Use | Owns UI? | Owns auth/billing? | Common languages |
|---|---|---|---|---|
| New channel (Slack, Discord, IRC) or poller | Plugin | No (daemon owns I/O) | No (operator config) | Rust, Python, TypeScript, PHP |
Bundle of skills, advisors, prompts, or YAML config that operators nexo ext install | Extension | No | No | YAML + small Rust stubs |
| End-product on top of nexo-rs (multi-tenant SaaS, internal tool, white-label deploy) | Microapp | ✅ yes | ✅ yes | Any language with a NATS client |
If you are still unsure:
- Plugin if your code is reactive (
broker.eventfires → you do something) and ships as a binary the daemon spawns. - Extension if your code is declarative (skills + agents +
prompts) and ships as a tarball operators install with
nexo ext install. - Microapp if your code is the product. End users see your UI, your domain, your billing — nexo-rs is invisible infrastructure.
This page covers plugins. For extensions, jump to Manifest reference. For microapps, jump to Microapps · getting started.
Pick a language
All four SDKs implement the same wire contract — your choice
is purely about ergonomics. Operators don't care which SDK you
picked; they just run nexo plugin install <owner>/<repo>.
| Language | Best for | Runtime deps | Per-target binaries? | SDK reference |
|---|---|---|---|---|
| Rust | Performance, single static binary, zero runtime deps. | None — cargo build produces a static ELF/Mach-O. | ✅ yes (one tarball per Rust target) | Rust SDK |
| Python | Existing scripts, ML ecosystem, fast iteration. | python3.11+ on operator host. | No (noarch — single tarball) | Python SDK |
| TypeScript | Existing Node servers, npm ecosystem, frontend devs. | node 20+ on operator host. | No (noarch) | TypeScript SDK |
| PHP | Existing Composer / Symfony / Laravel codebase. | php 8.1+ (Fibers required) on operator host. | No (noarch) | PHP SDK |
Cross-cutting reference: Plugin contract is the wire spec all four SDKs implement. Read it once and you understand every SDK.
SDK packages
nexo plugin new --lang <lang> vendors the SDK for you, so you
don't normally install it by hand — but if you're wiring the SDK
into an existing project, the published packages are:
| Language | Package (registry) | Add to a project |
|---|---|---|
| Rust | nexo-microapp-sdk (feature plugin) + nexo-broker | cargo add nexo-microapp-sdk -F plugin && cargo add nexo-broker |
| Python | nexoai — import name stays nexo_plugin_sdk | pip install nexoai |
| TypeScript | nexo-plugin-sdk | npm install nexo-plugin-sdk |
| PHP | nexo/plugin-sdk | composer require nexo/plugin-sdk |
The Python / TypeScript / PHP SDKs live in one mono-repo —
lordmacu/nexo-plugin-sdks
(per-language release tags python-v* / ts-v* / php-v*). The Rust
SDK ships from this repo (crates/microapp-sdk, feature plugin).
Microapp, not plugin? The product layer uses the same
nexo-microapp-sdkcrate without thepluginfeature (itsadmin/voice/stt/wizard/eventsmodules instead), plus the@lordmacu/nexo-microapp-ui-reactReact kit for the frontend. See Microapps · getting started and theagent-creatorreference microapp.
5-min quickstart
The shortest path from zero to a running plugin uses Rust
because the toolchain ships with cargo. Adapt the
nexo plugin new --lang <other> step for Python / TypeScript
/ PHP — the rest is identical.
For the full zero-to-installed flow (scaffold → publish to
crates.io → operator-side cargo install nexo-plugin-X →
daemon auto-discovery), see the linear
Plugin quickstart (10 min). This section is
the abridged inner-loop version.
# 1. Scaffold from the bundled template (Phase 31.6).
nexo plugin new my_plugin --lang rust --owner alice
cd my_plugin
# 2. Build (under a second on a warm cache).
cargo build
# 3. Boot the daemon with this directory injected at the head
# of plugins.discovery.search_paths. No install, no verify,
# no GitHub round-trip — pure inner-loop dev.
nexo plugin run .
Expected stderr trace from step 3:
INFO local plugin override applied (plugin_id=my_plugin)
INFO subprocess plugin spawned (id=my_plugin, pid=...)
INFO my_plugin starting
INFO subprocess plugin handshake ok (id=my_plugin, version=0.1.0)
The plugin is now live. Publishing any event on a topic the
plugin's manifest registers (default
plugin.inbound.my_plugin_echo) reaches the handler in
src/main.rs::handle_event.
To exit, send Ctrl+C — the daemon issues a shutdown
request, the plugin's on_shutdown runs, and both processes
return cleanly.
Plugin config dir
Phase 81.4 — operators place per-plugin YAML config under
<config_dir>/plugins/<plugin_id>/. The daemon reads every
*.yaml / *.yml file in that directory at boot, deep-merges
them alphabetically, resolves ${ENV_VAR} placeholders, and
(when your manifest declares a schema_path) validates the
merged tree against your JSONSchema before calling
init(). Validation failure aborts plugin load with
InitOutcome::Failed; the daemon continues without the plugin.
Multi-file sharding lets operators split sensitive settings from declarative ones:
<config_dir>/plugins/slack/
01-credentials.yaml # api_token: "${SLACK_BOT_TOKEN}"
02-channels.yaml # channels: [...]
03-allowlist.yaml # rate limits per channel
Mappings deep-merge across files (later wins per-key).
Arrays full-replace — they don't concat — so an operator
override file completely substitutes the array from earlier
files. Comment-only and non-.yaml files are ignored.
Declare your config schema in nexo-plugin.toml:
[plugin.config]
schema_path = "config.schema.json" # relative to plugin root
hot_reload = true # parsed; wiring lands in 81.4.b
The schema validator currently supports the JSONSchema subset
type / required / properties / additionalProperties /
enum. Plugins needing oneOf / $ref / pattern will get
richer validation in a future 81.4.c slice — for now, those
keywords pass through silently.
Inside your plugin, consume ctx.plugin_config (an
Arc<serde_yaml::Value>):
#![allow(unused)] fn main() { let api_token = ctx .plugin_config .get("api_token") .and_then(serde_yaml::Value::as_str) .ok_or_else(|| anyhow::anyhow!("api_token missing"))?; }
When the operator hasn't placed any config files, the value is
an empty mapping — your plugin sees Value::Mapping(empty),
not Null. Plugins with all-optional fields boot cleanly
without operator action.
Contributing channel kinds
Phase 81.24 — subprocess plugins that declare
[plugin.extends].channels = [...] automatically get a
host-side RemoteChannelAdapter registered for each kind. The
daemon's ChannelAdapterRegistry routes outbound dispatches to
your subprocess via three JSON-RPC methods:
channel.start { kind, instance }— subscribe outbound topics + begin publishing inbound (default 30 s timeout)channel.stop { kind }— release resources (30 s)channel.send_outbound { kind, msg }— send one outbound message; reply with{ message_id, sent_at_unix }(60 s)
Wire spec + error codes: Plugin contract §5.x.
Sketch (Rust subprocess plugin) — handle each request from your adapter's reader loop:
#![allow(unused)] fn main() { match method { "channel.start" => reply_ok(id, serde_json::json!({ "ok": true })), "channel.stop" => reply_ok(id, serde_json::json!({ "ok": true })), "channel.send_outbound" => { let msg = params.get("msg").cloned().unwrap_or_default(); // Forward `msg` to your provider's API; map the API's // response into OutboundAck. let ack = send_to_slack(msg).await?; reply_ok(id, serde_json::json!({ "message_id": ack.id, "sent_at_unix": ack.ts, })); } _ => reply_method_not_found(id, method), } }
For typed errors (rate-limit, recipient invalid, etc.), reply
with the channel-specific error codes from the contract table —
the host's adapter maps them to ChannelAdapterError variants
the agent runtime understands.
The matching SDK helpers (handle_channel_start /
handle_channel_send_outbound etc.) ship in Phase 81.24.b.
Until then, hand-handle the JSON-RPC frames using the SDK's
existing primitives.
Contributing LLM providers
Phase 81.25 — subprocess plugins that declare
[plugin.extends].llm_providers = [...] get one host-side
RemoteLlmFactory registered into LlmRegistry per provider
name. When the agent runtime resolves
model.provider = "<name>", the factory builds a
RemoteLlmClient that translates trait calls into llm.chat
JSON-RPC requests over your subprocess plugin's stdio pipe.
[plugin.extends]
llm_providers = ["cohere", "mistral"]
Two modes supported on the wire:
- Sync —
params.stream = false; reply once withWireChatResponse(default 60 s timeout). - Streaming —
params.stream = true; emit zero or morellm.chat.delta { request_id, chunk }notifications + one final response carryingusage/finish_reason(default 300 s timeout).
Wire spec + error codes: Plugin contract §5.y.
Sketch (Rust subprocess plugin) — handle llm.chat from your
adapter's reader loop:
#![allow(unused)] fn main() { match method { "llm.chat" => { let provider = params["provider"].as_str().unwrap_or(""); let stream = params["stream"].as_bool().unwrap_or(false); let request = serde_json::from_value::<WireChatRequest>( params["request"].clone() )?; if stream { // Emit zero or more deltas send_notification("llm.chat.delta", json!({ "request_id": id, "chunk": { "type": "text_delta", "delta": "Hello" }, })); // ...then the final response. reply_ok(id, /* WireChatResponse with usage + finish_reason */); } else { // Sync: call your provider's API, build WireChatResponse. let resp = call_my_provider(provider, &request).await?; reply_ok(id, resp); } } _ => reply_method_not_found(id, method), } }
For typed errors (rate-limit, auth failed, model not found),
reply with the LLM-specific error codes from the contract table
— the host's RemoteLlmClient surfaces them as anyhow::Error
with operator-greppable messages.
The matching SDK helpers (PluginAdapter::handle_llm_chat,
streaming sender, etc.) ship in Phase 81.25.b. Until then,
hand-handle the JSON-RPC frames.
Contributing hook handlers
Phase 81.27 — subprocess plugins that declare
[plugin.extends].hooks = [...] get one host-side
RemoteHookHandler registered into HookRegistry per hook
name. When the daemon fires that hook, the handler translates
the call into a hook.on_hook JSON-RPC request over your
subprocess plugin's stdio pipe.
[plugin.extends]
hooks = ["before_message", "after_message"]
Wire spec + error semantic: Plugin contract §5.z.
The reply shape is the existing HookResponse struct:
#![allow(unused)] fn main() { HookResponse { abort: bool, // legacy block signal reason: Option<String>, // operator-readable override: Option<Value>, // key-by-key mutation decision: Option<String>, // "allow" | "block" | "transform" transformed_body: Option<String>,// for "transform" do_not_reply_again: bool, // anti-loop signal } }
Sketch (Rust subprocess plugin) — handle each hook by name:
#![allow(unused)] fn main() { match method { "hook.on_hook" => { let hook_name = params["hook_name"].as_str().unwrap_or(""); let event = params["event"].clone(); let response = match hook_name { "before_message" => check_pii(&event)?, // your logic "after_message" => log_audit(&event)?, _ => HookResponse::default(), // Continue }; reply_ok(id, serde_json::to_value(&response)?); } _ => reply_method_not_found(id, method), } }
Continue-on-error semantic — the host swallows every
dispatch failure (timeout, malformed reply, JSON-RPC error)
and returns HookResponse::default() so the registry's fire
loop keeps iterating. Failures land in tracing::warn! for
operator debugging but never break the agent flow. This means:
- Returning
-32601 method_not_foundfor an unknown hook is fine — host logs + continues. - A hung subprocess hook eventually times out (5s default;
NEXO_PLUGIN_HOOK_TIMEOUT_MSenv override) and the agent proceeds. - Returning a malformed
HookResponsestill continues; only well-formed responses withabort: trueordecision: "block"actually block.
Hooks fire on the message hot path — keep handler latency
low (<50 ms typical). Use the decision: "transform" path
sparingly: every transform rewrites the event payload for
subsequent handlers.
Contributing memory backends
Phase 81.26 — subprocess plugins that declare
[plugin.extends].memory_backends = [...] get one
host-side RemoteVectorBackend registered into the daemon's
VectorBackendRegistry per backend name. v1 covers VECTOR
storage only — short/long-term memory keep their SQLite
implementation; plugins replace only the vector index. Primary
use case: Pinecone / Qdrant / Weaviate / pgvector.
[plugin.extends]
memory_backends = ["pinecone"]
Three wire methods (default timeouts 30s upsert/delete, 10s search):
memory.vector_upsert { backend, collection, records }→{ count }memory.vector_search { backend, collection, query }→{ matches: [...] }memory.vector_delete { backend, collection, ids }→{ count }
NEXO_PLUGIN_MEMORY_TIMEOUT_MS env overrides all three.
Wire spec + error codes: Plugin contract §5.w.
Sketch (Rust subprocess plugin) — handle each method by name:
#![allow(unused)] fn main() { match method { "memory.vector_upsert" => { let collection = params["collection"].as_str().unwrap_or(""); let records: Vec<VectorRecord> = serde_json::from_value( params["records"].clone() )?; let count = my_pinecone_client.upsert(collection, records).await?; reply_ok(id, serde_json::json!({"count": count})); } "memory.vector_search" => { let collection = params["collection"].as_str().unwrap_or(""); let query: VectorQuery = serde_json::from_value( params["query"].clone() )?; let matches = my_pinecone_client.search(collection, query).await?; reply_ok(id, serde_json::json!({"matches": matches})); } "memory.vector_delete" => { let collection = params["collection"].as_str().unwrap_or(""); let ids: Vec<String> = serde_json::from_value( params["ids"].clone() )?; let count = my_pinecone_client.delete(collection, ids).await?; reply_ok(id, serde_json::json!({"count": count})); } _ => reply_method_not_found(id, method), } }
For typed errors (collection-not-found, dimension-mismatch,
rate-limited, write-failed), reply with the memory-specific
error codes from the contract table — the host's
RemoteVectorBackend surfaces them as anyhow::Error with
operator-greppable messages.
v1 limitation: registered backends are NOT yet consumed at
runtime — LongTermMemory.recall_vector still uses sqlite-vec.
Operators audit registered backends today via
wire.vector_backend_registry.names(). Consumer-side dispatch
(agents.yaml.<id>.vector_backend = "pinecone") lands in
Phase 81.26.b.
Contributing tools
Phase 81.29 — subprocess plugins can expose tools that the
daemon's LLM picks via function-calling. Each tool name lives
in [plugin.extends].tools = [...] plus the subprocess
advertises the matching schema at handshake.
[plugin]
id = "browser"
# ... other manifest fields ...
[plugin.extends]
tools = ["browser_navigate", "browser_click"]
The subprocess MUST advertise these tools in the
initialize-reply's tools array:
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"manifest": { "plugin": { "id": "browser", ... } },
"server_version": "browser-0.1.1",
"tools": [
{
"name": "browser_navigate",
"description": "Navigate to a URL",
"input_schema": {
"type": "object",
"properties": { "url": { "type": "string" } },
"required": ["url"]
}
}
]
}
}
When the agent's LLM picks a tool the daemon issues a
tool.invoke JSON-RPC request to the subprocess:
{
"jsonrpc": "2.0",
"id": 80,
"method": "tool.invoke",
"params": {
"plugin_id": "browser",
"tool_name": "browser_navigate",
"args": { "url": "https://example.com" },
"agent_id": "shopper"
}
}
The plugin replies with the tool's result (typically a
ToolResponse-shaped object). Errors use the
-33401..=-33405 band documented in
Plugin contract §5.t.
Validation rules (host-side):
- Tool names MUST satisfy the per-plugin namespace policy from
Phase 81.3 (
<plugin_id>_*orext_<plugin_id>_*). - The advertised
toolsarray MUST be a subset ofextends.tools— drift in this direction is a hard failure at handshake. - Manifest entries WITHOUT an advertised counterpart are
tolerated but logged at warn; runtime calls yield
-33401 ToolNotFound.
Plugin-side responsibilities:
- Validate args against the published
input_schemabefore executing (defense in depth — host already validates host- side, but plugins should re-check). - Return
-33402 ToolArgumentInvalidwithdetails: <Value>pointing to the offending field if validation fails. - Return
-33404 ToolUnavailablewithdata: { retry_after_ms: <u64> }for transient failures (rate-limits, locked resources).
Default timeout: 60 s (matches the LLM band — tools span
fast browser_click to slow browser_navigate). Operator
override via NEXO_PLUGIN_TOOL_TIMEOUT_MS.
v1 limitations — see follow-ups in FOLLOWUPS.md:
streaming tools (chunked outputs via tool.invoke.delta),
per-tool timeout knobs in manifest, SDK helper
PluginAdapter::on_tool(name, handler).
Sandboxing your plugin
Phase 81.22 — Linux subprocess plugins can opt into bubblewrap
isolation by declaring [plugin.sandbox] in nexo-plugin.toml.
Default is disabled, so existing plugins keep today's behavior;
opt in when you want defense-in-depth.
[plugin.sandbox]
enabled = true
network = "deny" # "deny" | "host"
fs_read_paths = ["/etc/ssl/certs"] # absolute paths only
fs_write_paths = ["${state_dir}"] # ${state_dir} = per-plugin
# state root, the only safe
# writable place by default
drop_user = true # nobody:nogroup uid mapping
Linux prereq: install bubblewrap (apt install bubblewrap
on Debian/Ubuntu, available on Arch + Alpine + Fedora). The
daemon discovers bwrap once at boot via PATH lookup. Without
bwrap, sandbox-enabled plugins log a warning and run unsandboxed
unless NEXO_PLUGIN_SANDBOX_REQUIRE=1 is set (in which case
boot fails).
macOS: not yet enforced. The daemon logs a tracing::warn!
at every spawn and runs the plugin without sandbox. Native
sandbox-exec integration is deferred to follow-up 81.22.macos
(deprecated-API risk noted).
Network policy:
network = "deny"→ plugin runs in an isolated network namespace with only loopback. Use the host's daemon-mediated RPCs (llm.complete,memory.recall,broker.publish) for any external IO. Recommended default.network = "host"→ plugin shares the daemon's network. Operator must opt in viaNEXO_PLUGIN_SANDBOX_HOST_NET_ALLOW=1; manifest validation rejects this otherwise. Use only when daemon-mediated RPCs cannot satisfy your plugin's IO needs.
Filesystem allowlist:
fs_read_paths= host paths bound read-only into the sandbox (bwrap --ro-bind). Common:/etc/ssl/certsfor outbound TLS verification.fs_write_paths= host paths bound read-write (bwrap --bind). The literal${state_dir}token expands at spawn time to<state_root>/plugins/<plugin_id>— that is your plugin's per-instance owned scratch space. Only token recognized; only valid infs_write_paths.
Hard denylist (compile-time, not configurable): allowlist entries that equal or include any of these paths are rejected at manifest validation:
/etc/shadow, /etc/sudoers, /etc/sudoers.d
/proc/sys, /proc/kcore, /proc/kallsyms
/sys/firmware, /sys/kernel
/dev/mem, /dev/kmem, /dev/port
/var/run/docker.sock, /run/docker.sock,
/private/var/run/docker.sock
/root, /boot
Operator capability env knobs:
| Env var | Effect |
|---|---|
NEXO_PLUGIN_SANDBOX_REQUIRE=1 | Refuse to spawn any plugin without sandbox.enabled = true. |
NEXO_PLUGIN_SANDBOX_HOST_NET_ALLOW=1 | Permit network = "host" manifests. Default off. |
Recommended pattern: enabled = true, network = "deny", fs_write_paths = ["${state_dir}"], no fs_read_paths unless
your plugin truly needs to read host config (e.g. CA bundles
for TLS). Use daemon-mediated RPCs for everything else.
v1 out of scope — see follow-ups in FOLLOWUPS.md:
granular network egress allowlist (81.22.b), per-syscall
seccomp filters (81.22.c), nexo agent doctor plugins sandbox
section (81.22.d), native macOS sandbox-exec (81.22.macos).
Future capability extensions
Phase 81.28 — subprocess plugins that contribute new
channel kinds, LLM providers, memory backends, or
HookInterceptor IDs declare them via an additive
[plugin.extends] manifest section:
[plugin.extends]
channels = ["slack"] # paired with Phase 81.24 wrapper
llm_providers = ["cohere"] # paired with Phase 81.25
memory_backends = ["pinecone"] # paired with Phase 81.26
hooks = ["pii_redact"] # paired with Phase 81.27
Each list names the IDs the plugin contributes. Validation rules + the canonical schema live in Plugin contract §2.1. Daemon dispatch wiring (actually populating the matching registry slots) ships per-registry across Phase 81.24-27 — the schema is shipped today so subprocess plugin authors can declare intent ahead of those wrappers landing.
Local dev loop conventions
nexo plugin run <path>— boots the daemon with one local plugin overriding discovery; the rest of the system (broker, agents, channels) runs as configured.nexo plugin run <path> --no-daemon-config— same, but clearscfg.agents.agentsso the plugin runs in isolation for contract debugging.- Rebuild → respawn — Phase 81.10 hot-reload re-walks
search_pathsperiodically, so a freshcargo buildtriggers the daemon to respawn the subprocess automatically. No--watchflag yet (Phase 31.7.b deferred).
Next steps
- Rust SDK — full Rust API + manifest example.
- Python SDK, TypeScript SDK, PHP SDK — language-specific references with the same shape.
- Plugin contract — wire spec; read this once and you can debug any SDK.
- Patterns (8 common shapes) — pre-baked designs for channel plugins, pollers, hybrid bridges.
- Publishing a plugin — asset naming convention + 4-job CI workflow shape.
- Signing & publishing — cosign
keyless tutorial that operators on
--require-signatureneed. - Plugin trust (
trusted_keys.toml) — operator-side verification policy your readers will configure to trust your releases.
Plugin manifest (Phase 81.13 unified)
Phase 81.13 unified the framework's two manifest parsers
(nexo-extensions::manifest Phase 11 + nexo-plugin-manifest
Phase 31.5+) into a single source of truth. Plugin authors now
ship one TOML manifest at the plugin root that declares
both the legacy contributions (tools / hooks / channels /
providers / pollers) AND the modern admin RPC + HTTP server
capabilities.
Filename
The canonical filename is plugin.toml. The framework also
accepts nexo-plugin.toml as a legacy fallback for one
deprecation cycle so existing plugins keep loading without an
immediate rename. When both files are present in the same plugin
root, plugin.toml wins and the daemon emits a warning.
Plugins authored after 81.13 should ship plugin.toml only.
Versioning
The TOML root may carry a manifest_version integer:
- omitted or
1→ legacy v1 shape (flat[capabilities],[transport],[meta],[mcp_servers],[outbound_bindings],[context],[requires]). The parser auto-translates to v2 in memory and emits a one-shot deprecation warn per plugin. 2→ canonical Phase 81.13 shape. New plugins should set this explicitly to opt out of the deprecation warn.
Unknown values produce a clear parse error.
ID regex
Plugin ids match ^[a-z][a-z0-9_-]{0,63}$ (lowercase, starts with
letter, body of letters/digits/underscores/hyphens, length 64).
Both agent_creator and agent-creator styles are valid; the
framework normalises neither so plugin authors get to pick.
Reserved ids that no plugin can claim (defended at boot):
agent, browser, core, email, heartbeat, memory,
telegram, whatsapp.
Where the legacy fields land
Pre-81.13 plugins kept their plugin.toml flat (Phase 11 shape).
Those still parse — the compat layer translates each section as
follows:
| v1 location | v2 location |
|---|---|
[plugin] | [plugin] (renames min_agent_version → min_nexo_version) |
[capabilities] | [plugin.capabilities] |
[capabilities.admin] | [plugin.capabilities.admin] |
[capabilities.http_server] | [plugin.capabilities.http_server] |
[transport] (kind = "stdio") | [plugin.entrypoint] |
[transport] (`kind = "nats" | "http"`) |
[meta] | [plugin.meta] |
[requires] (bins/env) | DROPPED with warn (preserved in 81.13.b) |
[mcp_servers] (top-level) | DROPPED with warn |
[outbound_bindings] (top-level) | DROPPED with warn |
[context] | DROPPED with warn |
[plugin] priority | DROPPED with warn |
[capabilities] tools/hooks/channels/providers/pollers | DROPPED with warn |
The "DROPPED" entries don't break boot — the parser logs the list
of legacy fields it saw + skipped per plugin. Consumers that
needed those fields keep reading them via the legacy
nexo-extensions::manifest::ExtensionManifest::from_path path,
which still parses the v1 shape directly.
Single-file canonical example
manifest_version = 2
[plugin]
id = "agent_creator"
version = "0.0.35"
name = "Agent Creator"
description = "Operator UI microapp."
min_nexo_version = ">=0.1.0"
[plugin.entrypoint]
command = "./agent-creator"
args = []
[plugin.capabilities.admin]
required = ["agents_crud", "skills_crud", "llm_keys_crud"]
optional = ["channels_crud", "auth_rotate", "secrets_write"]
[plugin.capabilities.http_server]
port = 8765
bind = "127.0.0.1"
token_env = "AGENT_CREATOR_TOKEN"
health_path = "/healthz"
[plugin.meta]
author = "Cristian García"
license = "MIT OR Apache-2.0"
homepage = "https://example.com"
repository = "https://github.com/x/y"
Pre-81.13 example (still valid via compat)
[plugin]
id = "agent-creator"
version = "0.0.34"
name = "Agent Creator"
description = "Operator UI microapp."
[capabilities]
tools = ["agent_list", "agent_get"]
hooks = ["before_message"]
[capabilities.admin]
required = ["agents_crud"]
optional = ["channels_crud"]
[transport]
kind = "stdio"
command = "./agent-creator"
[meta]
author = "Cristian"
The framework parses this as manifest_version = 1 (auto-
detected), translates to v2 in memory + emits a deprecation warn
once at boot. Operator can migrate at their own pace.
Deferred (sub-phase 81.13.b)
- Preserve legacy
mcp_servers/outbound_bindings/context/requires.bins+env/capabilities.tools+hooks+channels+providers+pollers/transport.kind=nats|http/plugin.priorityin the canonical v2 shape so the migrator stops dropping them. - Hard removal of
nexo-plugin.tomlfilename +manifest_version = 1mode (target: 0.2.0). - JSON-Schema export for editor autocomplete (mirrors OpenClaw's
openclaw.plugin.json).
[plugin.config_schema] (Phase 93.1)
Plugins ship their config contract inside their own manifest so
the daemon never hardcodes a per-plugin field. Optional — plugins
without a config block (or those still on typed cfg.plugins.X
through the Phase 93.5 deprecation window) omit the section.
# Multi-instance plugin (telegram, whatsapp, …).
[plugin.config_schema]
shape = "array"
schema = """{
"type": "object",
"properties": {
"instance": { "type": "string" },
"bot_token_env": { "type": "string" },
"enabled": { "type": "boolean" }
},
"required": ["instance", "bot_token_env"]
}"""
# Single-instance plugin (email, browser, google).
[plugin.config_schema]
shape = "object"
schema = """{
"type": "object",
"properties": {
"imap_host": { "type": "string" },
"smtp_host": { "type": "string" },
"username_env": { "type": "string" }
},
"required": ["imap_host", "smtp_host", "username_env"]
}"""
Fields
| Field | Type | Default | Meaning |
|---|---|---|---|
schema | JSON-Schema string | — | Draft-07 subset (see config_schema). Root MUST be "type":"object" even when shape = "array" — the schema describes ONE element. |
shape | "object" | "array" | — | YAML wire shape at cfg.plugins.<plugin_id>. object = single map; array = Vec<map> for multi-instance plugins. |
hot_reload | boolean | true | Mirrors [ConfigSection::hot_reload]; plugins set false only if config touches state that requires a restart. |
Static validation
cargo run --bin nexo manifest validate rejects malformed
schemas with ManifestError::PluginConfigInvalidSchema { plugin_id, reason }:
- Empty
schemastring. schemais not valid JSON.schemaparses to a JSON value that is not an object.- Root object's
"type"is not"object".
Operator YAML against schema runs at boot (Phase 93.2) using
the same lightweight validator already shipping for install-time
microapp config (Phase 83.17).
Runtime delivery (Phase 93.2)
Once schema validation passes, the host calls the plugin's
NexoPlugin::configure(value) async hook with the operator's
YAML slice. The trait method has a default no-op so plugins
that haven't migrated keep working through the Phase 93.5
deprecation window.
Subprocess plugins receive the same value over their stdio
JSON-RPC channel as a plugin.configure request:
{
"jsonrpc": "2.0",
"id": 3,
"method": "plugin.configure",
"params": { "value": <operator-YAML-as-JSON> }
}
The host BUFFERS the value during the brief window between
configure(value) and the child's spawn completing; the
buffered value is delivered automatically after initialize
acks. Plugin SDKs should treat plugin.configure as
re-entrant — hot-reload sends a fresh request when the
operator's YAML changes.
Three error categories from PluginConfigureError:
| Variant | Source | Meaning |
|---|---|---|
SchemaValidation | host | Operator YAML failed [plugin.config_schema] walker before the plugin ran. |
PluginRejected | plugin | Plugin's own runtime check (typed deserialise, secret resolve, probe) failed. |
SubprocessRpc | host | Subprocess plugin didn't ack plugin.configure (transport, timeout, error). |
Configure-then-init is the boot order: init's registrations
may inspect what configure accepted, so the plugin sees a
consistent world from the first init call onward.
Operator config delivery (Phase 93.3)
The daemon walks <config_dir>/plugins/*.yaml at boot and feeds
each file into cfg.plugins.entries.<plugin_id> keyed by the
filename stem. A new community plugin lands by dropping
<config_dir>/plugins/slack.yaml; the daemon discovers it,
matches the file's stem to manifest.plugin.id == "slack", and
routes the parsed value into NexoPlugin::configure(value).
Zero daemon-side edits.
Outer-wrapper-key strip is conservative — only when the YAML's single top-level key matches the filename stem:
# config/plugins/slack.yaml
slack: # ← stripped
token_env: SLACK_BOT_TOKEN
becomes entries["slack"] == { token_env: SLACK_BOT_TOKEN }. If
the operator's top-level key doesn't match the stem, the whole
mapping is preserved verbatim — plugins decide how to interpret.
discovery.yaml is filtered (framework-internal). Parse failures
log tracing::warn! on the plugins.config target and skip the
file; other plugins still boot. Init-loop emits a
tracing::info! when both entries.<id> AND the legacy
plugins/<id>/*.yaml subdir populate — operator-visible
deprecation-window state; entries always wins.
[plugin.pairing.adapter] — manifest-driven pairing adapter
Phase 81.33.b.real (Stage 1 of the plugin auto-discovery design). Status: shipped 2026-05-15.
Plugins that expose a channel with a DM-pairing flow declare the
broker dispatch contract for the daemon's
PairingChannelAdapter in
nexo-plugin.toml. The daemon constructs a
GenericBrokerPairingAdapter from the manifest and registers it
into the shared PairingAdapterRegistry. This replaces the
previous design where the daemon had to hardcode
Arc::new(<plugin>PairingAdapter::new(broker)) blocks for every
canonical plugin.
Manifest section
[plugin.pairing.adapter]
channel_id = "whatsapp"
broker_topic_prefix = "plugin.whatsapp"
# Optional knobs:
# format_challenge_text_kind = "broker" # default: trait's built-in formatter
# normalize_cache_ttl_seconds = 3600 # default: unbounded
Required fields:
channel_id— stable string id matching what the gate stores inpairing_pending.channelandpairing_allow_from.channel. The registry uses this as the key underregister(). Plugins that also ship[plugin.pairing]UI (kind = "qr" | "form" | "info" | "custom") should use the samechannel_idso the registry + UI agree.broker_topic_prefix— broker subject prefix the daemon dispatches JSON-RPC requests under (see contract below).
Optional fields:
format_challenge_text_kind—"default"(the value the trait already supplies) or"broker"(asks the plugin per challenge). Default isdefault; channels needing custom challenge formatting (e.g. Telegram MarkdownV2 escape) flip tobroker.normalize_cache_ttl_seconds— TTL fornormalize_sendercache entries.None(default) = unbounded (cache lives the daemon's lifetime). Set when the plugin can return different canonical forms for the samerawover time.
Broker JSON-RPC dispatch contract
Daemon publishes JSON-RPC request messages on
<broker_topic_prefix>.pairing.<method>. Plugin subscribes,
handles, replies. All payloads are JSON.
<prefix>.pairing.normalize_sender
Request. { "raw": "<raw-sender>" }.
Reply. { "normalized": "<canonical>" } to accept, or
{ "normalized": null } to reject (the gate treats reject as
Decision::Drop).
Examples:
| Plugin | Input | Output |
|---|---|---|
573001112222@c.us | +573001112222 | |
573001112222@s.whatsapp.net | +573001112222 | |
| telegram | @User_Name | @user_name |
| telegram | 12345678 | 12345678 |
| telegram | not_a_handle | null (reject) |
The daemon caches (raw → normalized) in memory. Cache hits
do NOT round-trip the broker. With normalize_cache_ttl_seconds
unset the cache grows bounded by distinct-sender count;
typically < 10⁴ entries.
<prefix>.pairing.send_reply
Request. { "account": "<inst>", "to": "<sender>", "text": "<challenge>" }.
Reply. { "ok": true } or
{ "ok": false, "error": "<message>" }.
Delivers the pairing challenge text. account is the
multi-instance discriminator (operators set it via
config/plugins/<channel>.yaml's instance field).
<prefix>.pairing.send_qr_image
Request. { "account": "<inst>", "to": "<sender>", "png_base64": "<base64>" }.
Reply. { "ok": true } or
{ "ok": false, "error": "<message>" }.
Plugin decodes the base64 PNG and delivers it as a media
message. Plugins whose channel cannot send media should reply
{ "ok": false, "error": "channel does not support media" } —
the trait's default impl bails for plugins that haven't overridden,
but explicit failure with a clear error helps operators diagnose.
<prefix>.pairing.format_challenge_text (only when format_challenge_text_kind = "broker")
Request. { "code": "<setup-code>" }.
Reply. { "text": "<formatted-challenge>" }.
When the manifest sets format_challenge_text_kind = "broker"
the daemon issues this RPC instead of using the trait's
built-in formatter. Used for channels that need plugin-specific
escaping (Telegram MarkdownV2, Discord embed JSON, …).
Migration path
Plugins migrate by adding the manifest section in their next
release. Until then the daemon's legacy hardcoded
build_known_pairing_registry() registration (gated by
#[cfg(feature = "plugin-<id>")]) continues to serve. When a
plugin ships the manifest section, the registry's register()
overwrites by channel_id so the generic adapter wins
without operator action.
Canonical plugins
nexo-plugin-whatsapp— pending. When shipped, the daemon's L1654Arc::new(WhatsappPairingAdapter::new(broker))becomes dead code (still cfg-gated, removable in a follow-up).nexo-plugin-telegram— pending. Same shape, L1657.- Future plugins (signal, matrix, sms, discord) — drop manifest
in
plugins.discovery.search_paths; no daemon edit needed.
Implementing the broker side (plugin authors)
Subprocess plugins built on the nexo-microapp-sdk register the
three handlers (normalize_sender, send_reply, send_qr_image)
during PluginAdapter::init. Reference implementations land in
the next nexo-plugin-whatsapp + nexo-plugin-telegram release;
until then plugins can mirror the canonical
<channel>PairingAdapter::normalize_sender Rust logic into the
broker handler verbatim.
Daemon-side wiring
SubprocessNexoPlugin::build_pairing_adapter()
(crates/core/src/agent/nexo_plugin_registry/subprocess.rs)
checks self.cached_manifest.plugin.pairing.adapter. When
present, returns
Some(Arc::new(GenericBrokerPairingAdapter::from_manifest(broker, section))).
The daemon's boot loop (src/main.rs:6416) iterates every
loaded plugin handle and calls build_pairing_adapter(broker.clone()).
Same loop runs in the hot-spawn path (src/main.rs:7224+).
Dispatch-ctx mode (boot_dispatch_ctx_if_enabled /
autonomous-worker) uses only the legacy hardcoded registry today;
threading plugin_handles_cell into that function is a follow-up
if dispatch-ctx flows grow pairing-aware hooks.
Validation
cargo build --release-fast --bin nexo(default) — 3m09s.cargo build --release-fast --bin nexo --no-default-features— 2m50s (nexo-plugin-whatsapp+nexo-plugin-telegramabsent from the compile graph; manifest-section parsing still compiles because it's innexo-plugin-manifest).cargo nextest run --workspace— 6280/6280 (4 new tests forGenericBrokerPairingAdapter).cargo nextest run -p nexo-pairing— 67/67.cargo nextest run --no-default-features -p nexo-rs— 105/105.
Trade-offs
| Concern | Decision |
|---|---|
Sync normalize_sender blocking on broker | tokio::task::block_in_place + Handle::block_on. Requires multi-threaded runtime (every daemon hot path qualifies). Cache makes the cost a one-time miss per unique sender. |
| Cache eviction | Default unbounded; TTL knob if needed. Pairing-only volume keeps unbounded safe at scale. |
| Custom challenge text per channel | Optional format_challenge_text_kind = "broker". Default uses the trait's built-in formatter (covers 90% of channels). |
channel_id 'static lifetime | Box::leak at construction. One-time leak per plugin per daemon run; bounded by plugin count. |
| Manifest schema change without daemon recompile | New optional fields use #[serde(default)]; backward compatible. New required fields = manifest schema version bump. |
[plugin.pairing.trigger] — manifest-driven pairing trigger
Phase 81.20.x Stage 7 Phase 2. Status: introduced 2026-05-16.
Plugins that own a QR-pairing pump (e.g. WhatsApp wa-agent, future
Signal QR-link) declare which admin RPC methods the daemon should
forward pairing/start and pairing/cancel to. The daemon
constructs a BrokerPairingTrigger (in nexo-pairing) from the
manifest and inserts it into the dispatcher's
PairingChannelTriggers map under the same channel_id as the
sibling [plugin.pairing.adapter] section.
This replaces the previous design where the daemon had to import
nexo_plugin_whatsapp::pairing_trigger::WhatsappPairingTrigger and
call from_configs(...) — daemon no longer needs to link the
plugin crate to drive its pump.
Manifest section
[plugin.pairing]
kind = "qr"
label = "WhatsApp"
[plugin.pairing.trigger]
start_method = "nexo/admin/whatsapp/pairing/start"
cancel_method = "nexo/admin/whatsapp/pairing/cancel"
# Optional knob:
# timeout_seconds = 120 # default: PAIRING_DEFAULT_TIMEOUT (180s)
Required fields:
start_method— full admin RPC method name the daemon forwards the pump-start request to. MUST live under the plugin's own[plugin.admin] method_prefix(e.g."nexo/admin/<plugin_id>/pairing/start"). Forwarded viaPluginAdminRouterso the plugin's existing admin subscriber pipeline serves it.cancel_method— full admin RPC method name for pump cancellation. Same routing rules asstart_method.
Optional fields:
timeout_seconds— per-call broker forward timeout. Defaults toPAIRING_DEFAULT_TIMEOUT(180s), the upper bound for the whole pairing handshake.
Compatibility constraints
[plugin.pairing.trigger]is only valid withkind = "qr". Form and Info kinds are operator-driven and need no remote pump. Custom kinds use their ownnexo/notify/<rpc_namespace>/status_changedchannel. Manifest validation rejects mismatched combinations at boot.start_methodandcancel_methodMUST be non-empty.
Broker JSON-RPC dispatch contract
The daemon's BrokerPairingTrigger.start(ctx):
- Forwards
start_methodviaPluginAdminRouter, passing{ challenge_id, agent_id, instance }as params. - Plugin's admin handler spawns its pump (wa-agent QR pump for whatsapp). The handler returns immediately — the pump runs in the subprocess.
- The plugin publishes QR and state updates on
plugin.inbound.<channel_id>.<instance>.pairing.qrand.../pairing.state(new contract). Daemon's single generic subscriber updatesctx.store.update_qr+notify_status. - On
pairing/cancel, daemon forwardscancel_methodvia the same router. Plugin tears down the pump cleanly.
Inbound topics (plugin → daemon)
plugin.inbound.<channel_id>.<instance>.pairing.qr
plugin.inbound.<channel_id>.<instance>.pairing.state
QR payload: { "challenge_id": "<uuid>", "png_base64": "<base64>", "rotates_at": "<rfc3339>" }.
State payload:
{ "challenge_id": "<uuid>", "state": "Linked" | "Error" | "Pending", "error": "<msg-if-Error>" }.
Migration path
Plugins migrate by adding the manifest section in their next
release. Until a plugin ships the section, the daemon falls back
to its legacy hardcoded WhatsappPairingTrigger import (gated by
#[cfg(feature = "plugin-whatsapp")]). Once the plugin manifests
the section, the boot loop's generic registration overwrites by
channel_id, so the broker trigger wins without operator action.
Canonical plugins
nexo-plugin-whatsapp— pending v0.4.4. When shipped, the daemon'sWhatsappPairingTrigger::from_configsregistration insrc/main.rs(cfg-gated) becomes dead code; removable once v0.4.4 lands on crates.io.- Future pairing channels (signal, matrix, sms with QR-link) —
drop manifest into
plugins.discovery.search_paths; no daemon edit needed.
Implementing the broker side (plugin authors)
Subprocess plugins built on the nexo-microapp-sdk register the
start_method and cancel_method handlers via their existing
[plugin.admin] subscriber (the broker topic prefix is
<plugin.admin.broker_topic_prefix>.<suffix>, where <suffix> is
the trailing portion of the admin method after the plugin's
prefix).
Reference impl lands in nexo-plugin-whatsapp v0.4.4. The handler
should:
- Read
{ challenge_id, agent_id, instance }from params. - Spawn an async task that drives the pump (wa-agent's
pair_with_callback). - As QR frames rotate, publish to
plugin.inbound.whatsapp.<instance>.pairing.qr. - On connect success, publish state
Linked. On terminal error, publish stateErrorwith the error message. - Return
{ "ok": true }from the admin RPC reply (the start was accepted; success/failure flows through the inbound topics).
Cancellation: the daemon forwards cancel_method with
{ challenge_id }. Handler aborts the spawned pump task and
returns { "ok": true }.
Validation
cargo nextest run -p nexo-plugin-manifest— manifest schema- validator tests.
cargo nextest run -p nexo-pairing—BrokerPairingTriggerunit tests.cargo nextest run -p nexo-rs— daemon boot-loop integration.
Trade-offs
| Concern | Decision |
|---|---|
| Daemon import of plugin crate for trigger | Eliminated — the broker trigger forwards via existing PluginAdminRouter. |
Reuse of [plugin.admin] topic vs. dedicated [plugin.pairing.trigger] topic | Reuse. Cleaner: one less manifest section, one less subscriber to maintain in the plugin. Admin handler already exists in canonical plugins. |
| Plugin needs to push QR back to daemon | New plugin.inbound.<channel>.<inst>.pairing.{qr,state} broker topics. Daemon's single generic subscriber consumes both. |
| Backwards compat for plugins without the section | Daemon falls back to the legacy hardcoded trigger registration. Removed after the canonical plugin ships the new manifest. |
[plugin.public_tunnel] — manifest-driven Cloudflare quick tunnel
Phase 81.20.x Stage 7 Phase 2. Status: introduced 2026-05-16.
Plugins that expose HTTP routes the operator might want to reach
from outside the LAN (e.g. WhatsApp pairing while the daemon runs
on a desktop and the QR is scanned on a phone) declare this section.
The daemon spawns a Cloudflare quick
tunnel pointed
at its HTTP port; plugin routes exposed via
[plugin.http] become reachable at
https://*.trycloudflare.com/<plugin-mount-prefix>/....
Two-key opt-in
The daemon opens a public tunnel only when both are true:
- Plugin manifest:
[plugin.public_tunnel] enabled = true. - Operator capability env:
NEXO_PLUGIN_PUBLIC_TUNNEL_ALLOW=1.
A manifest declaration alone is not enough — the daemon still
honours the operator's hardening choice. Declaring the section
with enabled = false (or omitting it) keeps the plugin
forever-private even if the operator flips the env on for another
plugin.
Manifest section
[plugin.public_tunnel]
enabled = true
# Optional — when set, the daemon subscribes to this exact broker
# subject and closes the tunnel on the first published message.
# Wildcards (`*`, `>`) are rejected at manifest validation.
close_on_event = "plugin.lifecycle.whatsapp.tunnel_done"
Fields
enabled(bool, defaultfalse) — plugin-side opt-in. Whenfalse, the section behaves identically to "no section declared".close_on_event(string, optional) — literal broker subject. When set, the daemon subscribes once and tears the tunnel down on the first inbound message. Typical use: a pairing-channel plugin publishes<plugin>.tunnel_doneafter the operator completes pairing so the public URL stops responding immediately. WhenNone, the tunnel stays up for the daemon's lifetime.
Validation
Rejected at boot:
close_on_eventis the empty string or whitespace (PublicTunnelCloseEventEmpty).close_on_eventcontains a NATS wildcard (*or>). The daemon refuses wildcards so a stray plugin event can't race-close a healthy tunnel (PublicTunnelCloseEventWildcard).
URL sidecar file
When a tunnel comes up, the daemon writes the public URL to
$NEXO_HOME/state/tunnel.url (or ~/.nexo/state/tunnel.url).
The nexo pair start CLI reads this file directly so the operator
doesn't have to copy/paste from logs.
Migration from wa_tunnel_cfg
Earlier daemon revisions extracted public_tunnel from each
WhatsApp YAML entry under config/plugins/whatsapp.yaml and
spawned a Cloudflare tunnel inline. That orchestration was
removed (Phase 81.20.x Stage 7 Phase 2) — plugins now declare the
intent in their manifest, the daemon iterates uniformly, and the
operator's env flag stays authoritative.
Daemon-side wiring
main.rs iterates wire.plugin_handles after
wire_plugin_registry returns. For every plugin with
[plugin.public_tunnel] enabled = true, the daemon:
- Spawns
nexo_tunnel_quick::TunnelManager::new(8080).start(). - Logs the URL + writes it to the sidecar file.
- If
close_on_eventis set, spawns a broker subscriber that awaits one message then callstunnel.shutdown().await.
The capability gate is checked once at the top of the iterator block. When OFF, the iterator logs a single informational line ("declared by at least one plugin but env is not set") and skips every tunnel spawn — useful for hardened deployments that want a visible audit trail.
Threat model
https://*.trycloudflare.com/<plugin-prefix>/... is reachable
from anywhere the URL is shared. Cloudflare provides DDoS
protection + edge TLS but does NOT enforce authentication. The
plugin's HTTP handler is responsible for any access control on
the exposed paths.
Pairing pages are time-limited (the QR rotates ~every 20s, the
challenge expires in 5 minutes). When close_on_event is set,
the tunnel is teardown-immediate post-pairing — the URL becomes
404 the moment the plugin signals completion.
Validation commands
cargo nextest run -p nexo-plugin-manifest— manifest schema- validator tests.
- Manual smoke:
NEXO_PLUGIN_PUBLIC_TUNNEL_ALLOW=1 cargo run --bin nexowith a manifest declaring the section. Daemon log should show"public tunnel up"with the assigned URL.
[plugin.http] — daemon-proxied HTTP routes
Phase 81.33.b.real Stage 2 (Layer 8 of the plugin auto-discovery design). Status: shipped 2026-05-15.
Plugins that need to expose HTTP endpoints to operators or
external callers declare a mount prefix in their
nexo-plugin.toml. The daemon's HTTP server (port :8080)
matches every request against the registered prefixes and
forwards matches to the plugin's subprocess via broker
JSON-RPC. The plugin handles internal routing under the prefix.
Distinct from
[plugin.http_server],
which advertises a plugin-bound port the daemon does NOT proxy.
Manifest section
[plugin.http]
mount_prefix = "/whatsapp"
# Optional:
# timeout_seconds = 60 # default: 30
Fields:
mount_prefix(required) — path prefix the daemon mounts. Must start with/. Cannot contain?or#. The daemon matchespath.starts_with(mount_prefix).timeout_seconds(optional) — per-request broker RPC timeout. Default 30s. Plugins serving slow flows (image generation, OAuth dances) extend.
Broker JSON-RPC contract
Daemon → plugin on plugin.<id>.http.request:
{
"method": "GET",
"path": "/whatsapp/pair",
"query": "instance=default",
"headers": [["Host", "127.0.0.1:8080"], ["User-Agent", "..."]],
"body_base64": ""
}
Plugin replies:
{
"status": 200,
"headers": [["Content-Type", "text/html; charset=utf-8"]],
"body_base64": "<base64-encoded body bytes>"
}
Body bytes are base64-encoded so binary payloads (PNGs, PDFs, small file uploads) round-trip cleanly through JSON. The daemon decodes server-side and writes raw bytes back to the TCP stream.
Route matching
The router stores all prefixes in longest-first order. A request matches the most-specific prefix:
| Registered prefixes | Request | Matched plugin |
|---|---|---|
/api, /api/v1 | /api/v1/users | /api/v1 plugin |
/api, /api/v1 | /api/v2/users | /api plugin |
/api | /health | none (fallthrough) |
A miss falls through to the daemon's legacy hardcoded paths
(/health, /metrics, etc.).
Reserved prefixes. The router refuses to register any plugin mount under these daemon-internal paths:
/health— liveness checks/metrics— Prometheus scrape/pair— daemon's WS pairing companion/admin— admin RPC surface (port 9091, not 8080, but reserved here too for safety against future shared-port designs)/.well-known— protocol probes (RFC 8615)
A plugin manifest declaring any of these (or a sub-path like
/health/foo) is rejected at registration with a warn-level log;
the plugin's broker handler stays unhooked for that prefix and
the daemon's internal handler serves uninterrupted.
Plugins choosing non-reserved prefixes (/whatsapp, /oauth,
/api/v1/..., /healthy, etc.) register normally.
Error rendering
| Failure | Status | Body |
|---|---|---|
| Broker timeout (plugin slow / unresponsive) | 504 Gateway Timeout | {"error":"plugin gateway timeout"} |
| Plugin replied with malformed JSON | 502 Bad Gateway | {"error":"plugin reply malformed"} |
Plugin replied with status: 500 etc. | passes through | plugin's body verbatim |
The daemon writes the plugin's headers + body_base64 verbatim
for non-error replies — the plugin owns the response shape.
Limitations (Stage 2)
- No streaming responses. SSE / chunked transfer / progressive
rendering NOT supported. Plugin must buffer the full response
before replying. Use
[plugin.http_server](own port) for streaming endpoints. - No WebSocket upgrades. The daemon's
/pairWS handshake is daemon-internal (Phase 87 pairing companion). Plugins wanting WS endpoints bind their own port. - Body size cap. Daemon reads up to 16KB upfront. Larger bodies require streaming, which Stage 2 does not support.
- No request-id header injection. Plugins should generate
their own trace ids if needed; the daemon does not auto-stamp
X-Request-Id.
Implementing the plugin side
Subprocess plugins built on the Rust nexo-microapp-sdk
register a broker handler on plugin.<plugin_id>.http.request:
#![allow(unused)] fn main() { // Sketch (final SDK helpers ship alongside the next plugin release): ctx.broker .subscribe("plugin.<id>.http.request") .await? .for_each(|msg| async { let req: HttpRequest = serde_json::from_value(msg.payload)?; let res = my_router.dispatch(&req).await; broker .publish(&msg.reply_to.unwrap(), Event::new(/* ... */, res)) .await }); }
Reference impl lands with the next nexo-plugin-whatsapp release;
until then plugins copy the dispatch logic from the daemon's
current nexo_plugin_whatsapp::pairing::dispatch_route.
Migration status
Canonical plugin crates have NOT yet shipped a manifest revision
adding [plugin.http]. Daemon's legacy hardcoded
if let Some(rest) = path.strip_prefix("/whatsapp/") block
(gated by #[cfg(feature = "plugin-whatsapp")] via 93.12.c.2)
continues to serve. When a plugin ships the section, the router
matches before the legacy block and the plugin's broker handler
takes over without operator action; the legacy block becomes
dead code (still cfg-gated, removable in a follow-up after the
canonical plugins migrate).
Validation
cargo build --release-fast --bin nexo(default) — 3m11s.cargo build --release-fast --bin nexo --no-default-features— 2m53s.cargo nextest run --workspace— 6294/6294 (8 new tests inplugin_http::testscovering router matching, response decoding, broker error path).cargo nextest run --no-default-features -p nexo-rs— 105/105.cargo nextest run -p nexo-pairing— 75/75.
Trade-offs
| Concern | Decision |
|---|---|
| Sync HTTP handler blocking on broker | Daemon-side handler is async already; broker RPC stays async. No sync→async bridge needed. |
| Binary body via base64 | Acceptable for pairing pages (≤100KB) + OAuth callbacks. Streaming follow-up if large-upload demand appears. |
| Header passthrough | Daemon forwards all request headers; reply sets Content-Type from plugin response. Cookies need explicit Set-Cookie from plugin. |
| Mount prefix conflicts with daemon routes | Daemon reserves /health, /metrics, /pair, /admin, /.well-known. Router registration rejects any plugin asking for these prefixes or sub-paths (logged at warn level). A plugin cannot hijack health checks or admin RPCs even if its manifest declares the matching prefix. |
[plugin.admin] — daemon-forwarded admin RPC commands
Phase 81.33.b.real Stage 4 (Layer 9 of the plugin auto-discovery design). Status: shipped 2026-05-15.
Plugins that expose admin RPC commands (bot inspectors, account
listers, channel-specific control planes) declare a method
prefix in their nexo-plugin.toml. The daemon's admin
dispatcher matches every incoming method against the registered
prefixes and forwards matches to the plugin's subprocess via
broker JSON-RPC. The plugin handles internal dispatch.
Replaces the previous pattern where each plugin needed a
hardcoded .with_<plugin>_handle(Arc<dyn XxxHandle>) builder
method on the dispatcher (e.g. the legacy with_wa_bot_handle
integration for nexo/admin/whatsapp/bot/*).
Manifest section
[plugin.admin]
method_prefix = "nexo/admin/whatsapp/"
broker_topic_prefix = "plugin.whatsapp.admin"
# Optional:
# timeout_seconds = 30
Fields:
method_prefix(required) — admin RPC method prefix the plugin owns. Must start withnexo/admin/and end with/. Example:nexo/admin/whatsapp/.broker_topic_prefix(required) — broker subject prefix the daemon forwards under. Example:plugin.whatsapp.admin.timeout_seconds(optional) — per-method broker RPC timeout. Default 30s.
Broker JSON-RPC contract
Daemon → plugin on <broker_topic_prefix>.<verb>:
{
"method": "nexo/admin/whatsapp/bot/list",
"params": { "agent_id": "kate" }
}
Plugin replies:
{ "ok": true, "result": { "bots": [...] } }
{ "ok": false, "error": "session not connected" }
The daemon emits the broker subject by stripping method_prefix
from the incoming method, replacing / with ., and appending
to broker_topic_prefix. So
nexo/admin/whatsapp/bot/list →
bot/list → bot.list →
plugin.whatsapp.admin.bot.list.
Reserved prefixes
The router refuses to register any plugin whose method_prefix
collides with daemon-internal admin handlers. The reserved list:
nexo/admin/agents/nexo/admin/credentials/nexo/admin/pairing/nexo/admin/llm/nexo/admin/channels/nexo/admin/tenants/nexo/admin/memory/nexo/admin/sessions/nexo/admin/snapshots/nexo/admin/policy/
Comparison is bidirectional — nexo/admin/agents/sneaky/
(subpath of reserved) AND nexo/admin/ (super-prefix that
would shadow reserved) are both rejected. Plugins choosing
non-reserved prefixes (nexo/admin/whatsapp/,
nexo/admin/signal/, nexo/admin/oauth/) register normally.
Rejected registrations log at warn level; the daemon's internal handlers continue to serve uninterrupted.
Capability enforcement
Plugin admin methods that match the router are gated by the
existing channels_crud capability (reused so operators
already granted channel admin can call plugin admin without a
fresh capability). Per-plugin capability grants are a follow-up
when finer control is needed.
Error rendering
| Failure | Status |
|---|---|
| Broker timeout (plugin slow / unresponsive) | AdminRpcError::Internal("plugin admin forward failed: broker error: ...") → JSON-RPC error code -32603 |
| Plugin replied with malformed JSON | same |
Plugin replied with ok: false | AdminRpcError::Internal("<plugin error message>") |
The daemon does NOT translate plugin error strings; the typed error surfaces verbatim through the JSON-RPC response.
Migration status
The legacy with_wa_bot_handle builder on
AdminRpcDispatcher is preserved. Until nexo-plugin-whatsapp
ships a manifest revision with [plugin.admin], the daemon's
hardcoded nexo/admin/whatsapp/bot/list + bot/send match
arms continue to serve. When whatsapp ships the section, the
router matches BEFORE the typed match arms (generic forward
wins) and the legacy block becomes dead code (cfg-gated,
removable in follow-up).
Implementing the plugin side
Subprocess plugins built on nexo-microapp-sdk subscribe to
<broker_topic_prefix>.> and dispatch:
#![allow(unused)] fn main() { // Sketch (final SDK helpers ship with the next plugin release): ctx.broker .subscribe("plugin.whatsapp.admin.>") .await? .for_each(|msg| async { let topic = msg.topic.strip_prefix("plugin.whatsapp.admin.").unwrap(); match topic { "bot.list" => handle_bot_list(msg.payload).await, "bot.send" => handle_bot_send(msg.payload).await, _ => reply_err(msg, "method not found"), } }); }
Validation
cargo build --release-fast --bin nexo(default) — 3m10s clean.cargo build --release-fast --bin nexo --no-default-features— 2m53s clean.cargo nextest run --workspace— 6312/6312 (9 new tests inplugin_admin::tests).cargo nextest run -p nexo-pairing— covers router matching, reserved-prefix rejection, subpath/super-prefix safety, method-to-broker-suffix translation, broker error path.cargo nextest run --no-default-features -p nexo-rs— 105/105.cargo nextest run --no-default-features -p nexo-setup— 317/317.
Trade-offs
| Concern | Decision |
|---|---|
| Per-plugin capability gates | Reuse channels_crud. Follow-up if finer grant needed. |
| Plugin reply schema | { ok: bool, result: Value, error: String }. Single envelope; plugin owns the result shape per method. |
| Method-to-topic translation | / → . mechanical. Plugin's broker handler design is straightforward. |
| Reserved-prefix safety | Bidirectional collision check (both subpath AND super-prefix rejected). |
| Interior-mutability router | Arc<PluginAdminRouter> with internal RwLock<Vec<Route>> so daemon can populate AFTER wire_plugin_registry returns without rebuilding the dispatcher. |
[plugin.metrics] — Prometheus scrape contribution
Phase 81.33.b.real Stage 5 (Layer 11 of the plugin auto-discovery design). Status: shipped 2026-05-15.
Plugins exposing Prometheus metrics declare a broker topic the
daemon scrapes on every /metrics HTTP request. The plugin's
subprocess handles the scrape, returns Prometheus text, and the
daemon concatenates it into the aggregate response.
Replaces the previous pattern where each plugin's metrics call
was hardcoded inside src/main.rs::run_metrics_server (e.g. the
legacy nexo_plugin_email::metrics::render_prometheus(...)
direct call).
Manifest section
[plugin.metrics]
prometheus = true
broker_topic_prefix = "plugin.email"
# Optional:
# timeout_seconds = 5
Fields:
prometheus(defaultfalse) — opt the plugin into the/metricsscrape loop.broker_topic_prefix(required whenprometheus = true) — daemon publishes to<broker_topic_prefix>.metrics.scrape.timeout_seconds(optional, default 5s) — per-scrape broker RPC timeout. Scrapes happen per/metricsHTTP request so the daemon-side latency budget is tight; plugins exceeding the timeout warn-log + contribute nothing for that scrape.
Broker JSON-RPC contract
Daemon → plugin on <broker_topic_prefix>.metrics.scrape:
{}
Plugin replies:
{ "text": "# HELP <metric> ...\n# TYPE ...\n<metric> <value>\n..." }
Empty / missing text is treated as a successful scrape with no
metrics. The daemon does NOT validate Prometheus text shape —
plugin owns the surface entirely. Adding a trailing newline is
optional; the daemon appends \n if missing.
Daemon-side aggregation
run_metrics_server (src/main.rs:15097+) concatenates from:
nexo_core::telemetry::render_prometheus(nats_open)— daemon-internal counters.nexo_llm::telemetry::render_prometheus()— LLM provider stats.nexo_mcp::telemetry::render_prometheus()+ server-side dispatch metrics.nexo_poller::telemetry::render_prometheus()— Gmail / generic poller counters.nexo_plugin_email::metrics::render_prometheus(...)— legacy direct call, kept until email plugin migrates to broker scrape.nexo_tunnel_quick::metrics::render_prometheus_for(...)— tunnel supervisor counters.- Phase 5:
nexo_pairing::plugin_metrics::scrape_all(...)— every plugin that declared[plugin.metrics] prometheus = true.
Order matters for Prometheus scrape — duplicate metric names
across sources are not deduplicated; the LAST occurrence wins
when the scraper rebuilds its state. Plugins should namespace
their metrics with a prefix (my_plugin_<metric>) to avoid
collisions.
Failure isolation
One slow / unresponsive plugin does NOT stall the /metrics
response. Each scrape has its own timeout (default 5s). On
failure (timeout, broker error, malformed reply) the daemon:
- Logs a warn-level event with plugin id + error string.
- Contributes empty string for that plugin in the aggregate.
- Continues with the remaining plugins.
This trades immediate observability of plugin metric outages for operator UX — a watchdog scraping every 15s sees gaps when a plugin is unhealthy, but the daemon's own metrics (CPU, memory, LLM, MCP, tunnels) keep flowing.
Implementing the plugin side
Subprocess plugins subscribe to
<broker_topic_prefix>.metrics.scrape and reply:
#![allow(unused)] fn main() { // Sketch (final SDK helpers ship with the next plugin release): ctx.broker .subscribe("plugin.<id>.metrics.scrape") .await? .for_each(|msg| async { let text = my_metrics_module::render_prometheus(...); broker.publish( &msg.reply_to.unwrap(), json!({ "text": text }), ).await }); }
Reference impl lands with the next nexo-plugin-email release;
until then the daemon's legacy hardcoded call keeps email
metrics flowing.
Migration status
nexo-plugin-email— NOT migrated. Legacy in-process call atsrc/main.rs:15295continues to serve. When email ships the manifest section, BOTH paths fire (legacy direct call AND broker scrape) until the legacy call is retired in a follow-up.- Other canonical plugins — none currently expose Prometheus metrics. New plugins opting into metrics declare the manifest section directly with no legacy fallback to maintain.
Validation
cargo build --release-fast --bin nexo(default) — 3m clean.cargo build --release-fast --bin nexo --no-default-features— 3m01s clean.cargo nextest run --workspace— 6321/6321 (5 new tests inplugin_metrics::testscovering descriptor construction, empty-descriptors short-circuit, failure isolation across multiple plugins, broker error path).mdbook build docsclean.
Trade-offs
| Concern | Decision |
|---|---|
| Sequential vs concurrent scrape | Sequential. Concurrent would shave latency for n > 3 plugins but adds a futures dep edge. Acceptable at current scale (≤10 plugins typical). |
| Per-scrape timeout | 5s default. Plugins exceeding this contribute empty (warn-log). Trades immediate visibility for daemon /metrics SLO. |
| Duplicate metric name collisions | Daemon does NOT deduplicate. Plugins namespace with my_plugin_<metric> prefix per Prometheus convention. |
| Plugin reply shape | { text: String }. Simple envelope; daemon appends newline if missing. Adding labels / timestamps would be a follow-up if a plugin needs them. |
| Email migration fallback | Legacy nexo_plugin_email::metrics::render_prometheus call kept. When email ships manifest section both paths run until cleanup follow-up retires the legacy. |
[plugin.dashboard] — setup wizard surface
Phase 81.33.b.real Stage 6 (Layer 10 polish of the plugin auto-discovery design). Status: shipped 2026-05-15.
Plugins that want to appear in the setup wizard's channel
dashboard declare how the daemon detects their instances + auth
state via this manifest section. A generic
ManifestDashboardSource in nexo-setup consumes the section
and runs the right discovery / auth check, eliminating the need
for per-channel hardcoded
crates/setup/src/services/channels_dashboard.rs impls (the 3
canonical ones — telegram, whatsapp, email — stay as fallback
until canonical plugin crates ship manifest revisions).
Manifest section
[plugin.dashboard]
# Instance enumeration strategy:
[plugin.dashboard.layout]
kind = "single" # | "workspace_walk"
# Auth-state probe strategy:
[plugin.dashboard.auth_check]
kind = "file_presence" # | "session_dir_files"
path = "telegram_bot_token.txt" # for file_presence (relative to secrets_dir)
Telegram shape (single instance, file-presence auth)
[plugin.dashboard.layout]
kind = "single"
[plugin.dashboard.auth_check]
kind = "file_presence"
path = "telegram_bot_token.txt"
Email shape
[plugin.dashboard.layout]
kind = "single"
[plugin.dashboard.auth_check]
kind = "file_presence"
path = "email_password.txt"
Whatsapp shape (multi-instance via workspace walk, session-dir auth)
[plugin.dashboard.layout]
kind = "workspace_walk"
subdir = "whatsapp"
[plugin.dashboard.auth_check]
kind = "session_dir_files"
candidates = ["session.db", "state.db", "device.json", "registration.json"]
Field reference
layout
kind = "single"— exactly one instance labelled"default". Used by channels with one account per agent (telegram, email).kind = "workspace_walk",subdir = "<name>"— walk<workspace>/<agent>/<subdir>/<instance>/for every directory entry. Used by channels with multi-instance per-agent layouts (whatsapp).subdirmust be a single segment (no/).
auth_check
kind = "file_presence",path = "<rel>"— authenticated if<secrets_dir>/<rel>exists + is non-empty. Path must be relative (no leading/).<secrets_dir>is the operator's secrets root (typically~/.nexo/secretsor$NEXO_HOME/secrets).kind = "session_dir_files",candidates = [...]— authenticated if the per-instance directory contains ANY of the listed filenames. Only meaningful withlayout.kind = "workspace_walk"; the per-instance dir is<workspace>/<agent>/<subdir>/<instance>/. If the directory exists with OTHER files (but none of the candidates), reportsStale; empty dir reportsNotAuthenticated.
Daemon-side dispatch
crates/setup/src/services/channels_dashboard.rs ships:
pub trait ChannelDashboardSource—channel_id()+discover().pub struct ManifestDashboardSource— generic impl that consumes a parsedPluginDashboardSection+ plugin id.pub fn dashboard_sources_from_manifests(manifests) -> Vec<Box<dyn ChannelDashboardSource>>— helper that filters manifests + builds a source per declaring plugin.pub fn default_dashboard_sources() -> Vec<Box<dyn ChannelDashboardSource>>— the 3 hardcoded canonical impls (telegram, whatsapp, email). Kept for backwards compat until canonical plugin crates ship manifest revisions.
Operators combining both:
#![allow(unused)] fn main() { let mut sources = default_dashboard_sources(); sources.extend(dashboard_sources_from_manifests(&discovered_manifests)); let entries = detect_channels_with_sources(&sources, &config_dir, &secrets_dir)?; }
Migration
Canonical plugin crates currently ship NO [plugin.dashboard]
section. The 3 hardcoded sources in nexo-setup continue to
serve the wizard. When a plugin ships the section AND the
wizard caller discovers manifests + extends the source list,
the manifest-driven source contributes alongside the hardcoded
one. Once all 3 canonical plugins migrate, the hardcoded sources
can be retired in a Stage 7 cleanup follow-up.
New canonical channels added in the future (signal, sms, …) ship the manifest section directly + skip the hardcoded path entirely.
Validation
cargo build --release-fast --bin nexo(default) — 3m13s.cargo build --release-fast --bin nexo --no-default-features— 2m54s.cargo nextest run --workspace— 6334/6334 (13 new tests: 8 innexo-plugin-manifest::dashboard::tests, 5 innexo-setup::services::channels_dashboard::manifest_dashboard_tests).mdbook build docsclean.
Trade-offs
| Concern | Decision |
|---|---|
| Schema enumerates known shapes | 2 layouts (single / workspace_walk) + 2 auth checks (file_presence / session_dir_files). Covers the 3 canonical channels. New shapes = schema extension (typed enum variant + interpreter branch). |
| Plugin-side auth check via broker (alternative) | Rejected: the wizard runs WITHOUT the plugin subprocess alive in many scenarios (initial setup, secret rotation, plugin-binary-not-yet-installed). The daemon performing the FS check directly is more robust. |
channel_id 'static lifetime | Process-wide intern table keyed by plugin id; one-time leak per plugin per process. Bounded by plugin count. |
| Workspace walk path resolution | <config_dir>.parent() / data/workspace — matches the layout used by canonical plugins today. Manifest does NOT let plugins override the workspace root path (security: prevent arbitrary FS reads via a malicious manifest). |
| Symbol exports | ManifestDashboardSource + dashboard_sources_from_manifests are public so callers (admin wizard, setup CLI, future microapp) can wire them without re-implementing. |
nexo Plugin Contract
| Field | Value |
|---|---|
contract_version | 1.10.0 |
| Status | Stable |
| Authoritative reference | This document |
| Reference implementations | Host: crates/core/src/agent/nexo_plugin_registry/subprocess.rs. Rust child: crates/microapp-sdk/src/plugin.rs (feature plugin). Python / TypeScript / PHP children: github.com/lordmacu/nexo-plugin-sdks. See §11. |
This contract describes how an out-of-tree plugin binary communicates with the nexo daemon. A conforming plugin can be written in any language — Rust, Python, TypeScript, Go, etc. — as long as it implements the protocol defined here.
1. Transport
- Plugin runs as a child process of the daemon.
- Daemon writes to the child's
stdin. Child writes to itsstdout. stderris closed by the daemon (currently/dev/null— Phase 81.23 will collect it into structured tracing).- Each direction is a stream of newline-delimited UTF-8 lines.
- Each line is exactly one JSON-RPC 2.0 message — request, response, or notification.
- Lines must not exceed the platform pipe buffer (typically 4 KiB on Linux); fragmenting one JSON object across multiple lines is not supported.
2. Manifest
The plugin ships a nexo-plugin.toml file — schema defined by
the nexo-plugin-manifest crate. The fields relevant to this
contract are:
[plugin]
id = "slack" # ASCII slug, ^[a-z][a-z0-9_]{0,31}$
version = "0.2.0" # semver
name = "Slack Channel"
description = "..."
min_nexo_version = ">=0.1.0"
[plugin.requires]
nexo_capabilities = ["broker"]
# Phase 81.14 — subprocess entrypoint.
[plugin.entrypoint]
command = "/usr/local/bin/plugin-slack" # absolute path or PATH binary
args = ["--mode", "stdio"] # optional
env = { "RUST_LOG" = "info" } # optional, MUST NOT begin with "NEXO_"
# Phase 81.8 — channel kinds the plugin exposes. Drives the
# broker subscribe / publish allowlist (see §6).
[[plugin.channels.register]]
kind = "slack"
adapter = "SlackChannelAdapter"
The host parses this manifest at boot and uses
plugin.id to verify the child's identity in the initialize
handshake (§4.1). It uses plugin.entrypoint.command to spawn
the child process. Any env key beginning with NEXO_ is
rejected at boot — those names are reserved for the daemon's
own runtime configuration.
2.1 Extends section (Phase 81.28)
A subprocess plugin that contributes to a daemon-side registry
beyond [plugin.channels.register] declares its capabilities in
an additive [plugin.extends] section:
[plugin.extends]
channels = ["slack", "discord"] # paired with Phase 81.24 wrapper
llm_providers = ["cohere", "mistral"] # paired with Phase 81.25
memory_backends = ["pinecone", "qdrant"] # paired with Phase 81.26
hooks = ["pii_redact"] # paired with Phase 81.27
Each list names the IDs the plugin contributes to the matching registry. Validation rules:
- Each id MUST match
^[a-z][a-z0-9_]{0,31}$. - No duplicates within a single list.
- No cross-list duplicates — an id MUST occupy at most one of the four lists within a single plugin.
- All four fields default to empty; legacy manifests parse unchanged.
The four canonical sections are fixed in code
(EXTENDS_SECTIONS); adding a new capability surface requires a
manifest-crate change. This is intentional — the closed schema
keeps serde(deny_unknown_fields) defense intact and gates new
extension points behind a coordinated rollout.
[plugin.extends] is the declarative half of the
capability story. Daemon dispatch wiring — actually populating
LlmClientRegistry / memory backend store / HookInterceptor
registry — ships with Phase 81.24 (channels), 81.25 (LLM
providers), 81.26 (memory backends), and 81.27 (hooks).
Capability-negotiation handshake (verifying the subprocess's
initialize reply matches the declared extensions) is a
follow-up (81.28.b).
[plugin.extends].channels exists in parallel with
[plugin.channels.register] (§6 — topic allowlist). Use
extends for subprocess plugins routed through the future
remote ChannelAdapter wrapper; use register for in-tree
adapters that link directly into the daemon binary. Both
surfaces stay independent.
2.2 Sandbox section (Phase 81.22)
Subprocess plugins on Linux can opt into bubblewrap-based
isolation via an additive [plugin.sandbox] section. Default =
disabled — every existing manifest parses unchanged; the daemon
spawns the plugin as a normal child process.
[plugin.sandbox]
enabled = true # default false (opt-in)
network = "deny" # "deny" | "host"
fs_read_paths = ["/etc/ssl/certs"] # absolute, ro-bind into sandbox
fs_write_paths = ["${state_dir}"] # absolute, rw-bind. ${state_dir}
# token expands to the plugin's
# per-instance state root.
drop_user = true # default true; bwrap maps the
# child to nobody:nogroup (uid
# 65534) via --unshare-user.
When enabled, the daemon wraps the spawn Command with bwrap
flags:
- Process hardening:
--die-with-parent --unshare-pid --unshare-uts --unshare-ipc --new-session. - Filesystem skeleton:
--proc /proc --dev /dev --tmpfs /tmpplus read-only binds for/usr /bin /sbin /lib /lib64 /etc/ssl. The plugin command's parent dir is also auto-bound read-only so the binary is reachable inside the sandbox. - Network:
--unshare-netfornetwork = "deny". Fornetwork = "host"the operator must set theNEXO_PLUGIN_SANDBOX_HOST_NET_ALLOW=1capability env var; the manifest validator otherwise rejects the field. - User:
--unshare-user --uid 65534 --gid 65534whendrop_user = true. - Allowlist: each
fs_read_pathsentry becomes--ro-bind <path> <path>; eachfs_write_pathsentry becomes--bind <path> <path>after${state_dir}expansion.
Operators control sandbox enforcement via two env knobs:
| Env var | Purpose |
|---|---|
NEXO_PLUGIN_SANDBOX_REQUIRE | When 1, the daemon refuses to spawn any plugin without sandbox.enabled = true. Strict-mode operator gate. |
NEXO_PLUGIN_SANDBOX_HOST_NET_ALLOW | When 1, manifests declaring network = "host" validate. Default off. |
Hard denylist (compile-time const) — operator-supplied allowlists that equal or include any of these paths are rejected at manifest load:
/etc/shadow,/etc/sudoers,/etc/sudoers.d/proc/sys,/proc/kcore,/proc/kallsyms/sys/firmware,/sys/kernel/dev/mem,/dev/kmem,/dev/port/var/run/docker.sock,/run/docker.sock,/private/var/run/docker.sock/root,/boot
Validation errors surface as ManifestError::Sandbox* variants
(SandboxAllowlistTouchesDenylist, SandboxRelativePath,
SandboxInvalidStateDirInterpolation,
SandboxHostNetworkWithoutCapability).
Platform support: Linux requires bubblewrap in PATH
(apt install bubblewrap). macOS is currently a no-op + tracing::warn!
log per spawn — native sandbox-exec integration is deferred to
follow-up 81.22.macos. With NEXO_PLUGIN_SANDBOX_REQUIRE=1 on
macOS, the daemon refuses to spawn (treats macOS as unsupported).
Out of scope for v1:
- Granular network egress allowlist (
network = "allowlist",network_allowlist = ["host:port"]) — defers to 81.22.b (slirp4netns + nftables). - Per-syscall seccomp filters — defers to 81.22.c.
- Cgroup / rlimit resource caps — Phase 81.21.c.
- Doctor CLI surface — defers to 81.22.d.
3. JSON-RPC envelope
All frames are valid JSON-RPC 2.0:
Request
{
"jsonrpc": "2.0",
"id": <integer or string>,
"method": "<method-name>",
"params": <object | null>
}
Response (success)
{
"jsonrpc": "2.0",
"id": <same as request>,
"result": <object | null>
}
Response (error)
{
"jsonrpc": "2.0",
"id": <same as request, null if request was un-parseable>,
"error": {
"code": <integer>,
"message": "<string>"
}
}
Notification
A notification is a request without an id field. The
peer must not reply.
{
"jsonrpc": "2.0",
"method": "<method-name>",
"params": <object | null>
}
The contract uses notifications for unidirectional broker events — see §5.
4. Lifecycle
4.1 initialize (host → child request)
After spawning the child, the daemon writes one initialize
request and awaits the response. The child must respond before
NEXO_PLUGIN_INIT_TIMEOUT_MS (default 5000) elapses or the
daemon kills it and surfaces PluginInitError::Other.
Request:
{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": { "nexo_version": "0.1.5" }
}
Response:
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"manifest": { "plugin": { "id": "slack", "version": "0.2.0", ... } },
"server_version": "slack-0.2.0"
}
}
The child must echo a manifest whose plugin.id matches the
id the daemon expected (the id under which the plugin was
registered in the factory registry). Mismatch is a hard failure
— the daemon kills the child and refuses to load the plugin.
This defends against an out-of-tree binary impersonating a
different plugin.
server_version is a free-form string identifying the running
binary; the SDK defaults it to <id>-<version> from the
manifest.
4.1.1 Tool catalog advertisement (Phase 81.29, optional)
Plugins declaring [plugin.extends].tools = [...] MUST include
a tools array in the initialize-reply result. Each entry is
a RemoteToolDef:
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"manifest": { "plugin": { "id": "browser", ... } },
"server_version": "browser-0.1.1",
"tools": [
{
"name": "browser_navigate",
"description": "Navigate to a URL",
"input_schema": {
"type": "object",
"properties": { "url": { "type": "string" } },
"required": ["url"]
}
}
]
}
}
Validation rules at the host:
- The
toolsfield is OPTIONAL whenextends.toolsis empty. Required (non-empty) when the manifest declares any tool. - Every advertised name MUST appear in
manifest.plugin.extends.tools. Drift in this direction (advertised but not declared) is a hard failure: the daemon kills the child and refuses to load. - Manifest entries WITHOUT an advertised counterpart are
tolerated but logged at warn — runtime calls to those tools
yield
-33401 ToolNotFound. namemust satisfy the per-plugin namespace rule (<plugin_id>_*orext_<plugin_id>_*).input_schemais an arbitrary JSONSchema object; the daemon caches it for arg validation before eachtool.invoke.
4.2 shutdown (host → child request)
The daemon sends shutdown when it wants the plugin to exit
gracefully. The child should flush state, then reply.
Request:
{
"jsonrpc": "2.0",
"id": 2,
"method": "shutdown",
"params": { "reason": "host requested" }
}
Response:
{
"jsonrpc": "2.0",
"id": 2,
"result": { "ok": true }
}
Reply with an error object instead of result if shutdown
fails — the host surfaces PluginShutdownError::Other to the
operator.
After the reply, the daemon waits 1 second for the process
to exit on its own. If the child is still alive, the daemon
sends SIGKILL. So: reply, then exit.
5. Broker bridge
The wire-level shape of the broker bridge is two notifications:
5.1 broker.event (host → child)
Whenever the daemon's broker delivers an event on a topic
matching one of the plugin's outbound subscriptions (derived
from manifest.channels.register[].kind — see §6), the daemon
sends:
{
"jsonrpc": "2.0",
"method": "broker.event",
"params": {
"topic": "plugin.outbound.slack.team_a",
"event": {
"id": "01940000-0000-0000-0000-000000000001",
"timestamp": "2026-05-01T00:00:00Z",
"topic": "plugin.outbound.slack.team_a",
"source": "agent.coordinator",
"session_id": "01940000-0000-0000-0000-000000000099",
"payload": { "text": "hello", ... }
}
}
}
The event field is a serialised nexo_broker::Event. The
plugin processes the event (e.g. forwards payload.text to
Slack's API) and may reply with a broker.publish
notification (§5.2) — but it is not required to reply.
5.2 memory.recall (child → host request) <Phase 81.20.a>
When the plugin needs to look up agent memory entries, it issues
a JSON-RPC request to the daemon. Unlike broker.event /
broker.publish which are notifications, this is a
request-response flow: the child sends with an id and
awaits the matching reply.
Child → host request:
{
"jsonrpc": "2.0",
"id": 42,
"method": "memory.recall",
"params": {
"agent_id": "ventas_v1",
"query": "user prefers concise answers",
"limit": 5
}
}
Host → child reply (success):
{
"jsonrpc": "2.0",
"id": 42,
"result": {
"entries": [
{
"id": "01940000-0000-0000-0000-000000000001",
"agent_id": "ventas_v1",
"content": "user prefers concise answers",
"tags": ["preference"],
"concept_tags": [],
"created_at": "2026-04-30T18:22:31Z",
"memory_type": null
}
]
}
}
Host → child reply (error):
-32601method not found (onlymemory.recallwired in 81.20.a;llm.complete/tool.dispatchship in 81.20.b/.c).-32602invalid params (missingagent_id/ wrong type forquery).-32603memory not configured (operator hasn't enabled long-term memory) OR memory backend returned an error.
limit defaults to 10, capped hard at 1000. The handler calls
LongTermMemory::recall(agent_id, query, limit) which already
expands the query with up to 3 derived concept tags so FTS5 hits
memories whose stored content diverges from the query surface.
5.3 llm.complete (child → host request) <Phase 81.20.b>
When the plugin needs an LLM completion, it issues a request and awaits the response.
Child → host request:
{
"jsonrpc": "2.0",
"id": 50,
"method": "llm.complete",
"params": {
"provider": "minimax",
"model": "minimax-m2.5",
"messages": [
{"role": "user", "content": "summarize this in one line: ..."}
],
"max_tokens": 1024,
"temperature": 0.7,
"system_prompt": "You answer concisely."
}
}
messages[].role is one of system, user, assistant, tool.
max_tokens defaults to 4096; temperature defaults to 0.7;
system_prompt is optional.
Host → child reply (success):
{
"jsonrpc": "2.0",
"id": 50,
"result": {
"content": "Concise reply text.",
"finish_reason": "stop",
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8
}
}
}
finish_reason is one of stop, length, tool_use,
other:<reason>.
Host → child reply (errors):
-32602invalid params (missingprovider/model/messages, malformed message, empty messages array).-32603LLM not configured (operator hasn't wired the registry to the subprocess pipeline) OR client build failed (provider name not registered, config invalid) ORchat()call returned an error.-32601provider returned tool calls instead of text — MVP surfaces this asnot_implemented. The tool-call wire shape (which lets the child re-submittool_resultfollow-ups) lands in a future contract bump.
Daemon-side caps max_tokens at u32::MAX. Streaming via
llm.complete.delta notifications is opt-in via
params.stream = true (Phase 81.20.b.c).
Streaming flow
When the request includes "stream": true, the host calls
LlmClient::stream instead of chat. Each text chunk arrives
as a notification correlated to the original request id:
{
"jsonrpc": "2.0",
"method": "llm.complete.delta",
"params": { "request_id": 50, "chunk": "hello" }
}
{
"jsonrpc": "2.0",
"method": "llm.complete.delta",
"params": { "request_id": 50, "chunk": " world" }
}
The final reply matches the original id but carries only
finish_reason + usage — content is omitted because the
child reassembled it from deltas:
{
"jsonrpc": "2.0",
"id": 50,
"result": {
"finish_reason": "stop",
"usage": { "prompt_tokens": 12, "completion_tokens": 7 }
}
}
Tool-call deltas in streaming mode are dropped (same scope as
the non-streaming MVP). If the provider returns ONLY tool calls
during a stream (no text), the final reply is -32601
not_implemented.
5.4 broker.publish (child → host)
When the plugin wants to push an event onto the broker (e.g. delivering an inbound message from Slack), it writes:
{
"jsonrpc": "2.0",
"method": "broker.publish",
"params": {
"topic": "plugin.inbound.slack.team_a",
"event": {
"id": "01940000-0000-0000-0000-000000000002",
"timestamp": "2026-05-01T00:01:00Z",
"topic": "plugin.inbound.slack.team_a",
"source": "slack",
"session_id": null,
"payload": { "from": "U01ABC", "text": "hi", ... }
}
}
}
The host validates the topic against the allowlist (§6)
before forwarding to the broker. Topics outside the
allowlist are dropped with a tracing::warn! log and never
reach the broker.
5.x Channel methods (Phase 81.24)
Subprocess plugins that contribute new channel kinds (declared
in [plugin.extends].channels, §2.1) implement three host-
initiated request methods. The host's RemoteChannelAdapter
wraps each ChannelAdapter trait method into a JSON-RPC
request; the child replies with the corresponding result or a
typed error.
Every payload carries kind so a single subprocess advertising
multiple kinds (extends.channels = ["slack", "discord"]) can
dispatch via one request handler.
channel.start
// host → child
{
"jsonrpc": "2.0",
"id": 42,
"method": "channel.start",
"params": {
"kind": "slack",
"instance": "primary" // null when no per-instance multiplexing
}
}
// child → host
{ "jsonrpc": "2.0", "id": 42, "result": { "ok": true } }
Subscribe to plugin.outbound.<kind> (or per-instance
plugin.outbound.<kind>.<instance> when instance is set) and
begin publishing inbound events. Default host-side timeout
30 seconds.
channel.stop
// host → child
{
"jsonrpc": "2.0",
"id": 43,
"method": "channel.stop",
"params": { "kind": "slack" }
}
// child → host
{ "jsonrpc": "2.0", "id": 43, "result": { "ok": true } }
Release resources, drop subscriptions, stop publishing inbound. Idempotent. Default host-side timeout 30 seconds.
channel.send_outbound
// host → child
{
"jsonrpc": "2.0",
"id": 44,
"method": "channel.send_outbound",
"params": {
"kind": "slack",
"msg": { "kind": "text", "to": "U123", "body": "hi" }
}
}
// child → host (success)
{
"jsonrpc": "2.0",
"id": 44,
"result": { "message_id": "1234.5678", "sent_at_unix": 1741032000 }
}
msg.kind is one of text, media, or custom (see
OutboundMessage in §3 for the full enum). Default host-side
timeout 60 seconds. Operator override via
NEXO_PLUGIN_CHANNEL_TIMEOUT_MS env (single value applied to
all 3 methods).
Channel-specific error codes
In addition to the JSON-RPC standard codes (§7), channel.*
methods MAY return:
| Code | Meaning | Maps to ChannelAdapterError |
|---|---|---|
-33001 | channel.connection_failed | Connection { source: <message> } |
-33002 | channel.authentication_failed | Authentication { reason: <message> } |
-33003 | channel.recipient_invalid | Recipient { recipient: <data.recipient>, reason: <data.reason | message> } |
-33004 | channel.rate_limited | RateLimited { retry_after_secs: <data.retry_after_secs> } |
-33005 | channel.unsupported_feature | Unsupported { feature: <data.feature | message> } |
Error example:
{
"jsonrpc": "2.0",
"id": 44,
"error": {
"code": -33004,
"message": "rate limited",
"data": { "retry_after_secs": 42 }
}
}
-32601 method_not_found from a child means the plugin declared
the kind in extends.channels but did not implement the
requested method; the host surfaces this as
ChannelAdapterError::Unsupported { feature: "<method>" }.
5.y LLM provider methods (Phase 81.25)
Subprocess plugins that contribute LLM providers (declared in
[plugin.extends].llm_providers, §2.1) implement one host-
initiated request method with two modes (sync + streaming). The
host's RemoteLlmClient wraps each LlmClient trait call into
a JSON-RPC request; the child replies with the corresponding
result or a typed error.
Every payload carries provider so a single subprocess
advertising multiple providers (extends.llm_providers = ["cohere", "mistral"]) can dispatch via one request handler.
llm.chat (non-streaming)
// host → child
{
"jsonrpc": "2.0",
"id": 50,
"method": "llm.chat",
"params": {
"provider": "cohere",
"model": "command-r",
"stream": false,
"request": {
"model": "command-r",
"messages": [{ "role": "user", "content": "hi" }],
"max_tokens": 1024,
"temperature": 0.7
}
}
}
// child → host
{
"jsonrpc": "2.0",
"id": 50,
"result": {
"content": { "type": "text", "text": "Hello world" },
"usage": { "prompt_tokens": 12, "completion_tokens": 4 },
"finish_reason": { "kind": "stop" }
}
}
The full request schema mirrors nexo_llm::types::ChatRequest
fields (messages / tools / max_tokens / temperature /
system_prompt / stop_sequences / tool_choice /
system_blocks / cache_tools). tool_choice serializes as
{"kind":"auto"|"any"|"none"|"specific","name":"<n>"?}.
The full result schema:
content—{type:"text", text:"..."}OR{type:"tool_calls", tool_calls:[{id, name, arguments}]}usage—{prompt_tokens, completion_tokens}finish_reason—{kind:"stop"|"tool_use"|"length"|"other","reason":"<r>"?}cache_usage— optional{cache_read_input_tokens, cache_creation_input_tokens, input_tokens, output_tokens}
Default host-side timeout 60 seconds.
llm.chat (streaming)
// host → child
{
"jsonrpc": "2.0",
"id": 51,
"method": "llm.chat",
"params": {
"provider": "cohere",
"model": "command-r",
"stream": true,
"request": { "...": "as above" }
}
}
// child → host: zero or more deltas
{
"jsonrpc": "2.0",
"method": "llm.chat.delta",
"params": {
"request_id": 51,
"chunk": { "type": "text_delta", "delta": "Hello" }
}
}
{
"jsonrpc": "2.0",
"method": "llm.chat.delta",
"params": {
"request_id": 51,
"chunk": { "type": "text_delta", "delta": " world" }
}
}
// child → host: final response (id matches request)
{
"jsonrpc": "2.0",
"id": 51,
"result": {
"content": { "type": "text", "text": "" },
"usage": { "prompt_tokens": 12, "completion_tokens": 4 },
"finish_reason": { "kind": "stop" }
}
}
chunk.type values:
text_delta—{ delta: "<text>" }tool_call_start—{ id, name }tool_call_args_delta—{ id, delta }tool_call_end—{ id }usage—{ usage: {prompt_tokens, completion_tokens} }end—{ finish_reason: {kind, reason?} }
Default host-side stream timeout 300 seconds. Operator override
via NEXO_PLUGIN_LLM_TIMEOUT_MS env (single value applied to
both sync + streaming).
LLM-specific error codes
In addition to the JSON-RPC standard codes (§7), llm.chat
MAY return:
| Code | Meaning |
|---|---|
-33101 | llm.connection_failed |
-33102 | llm.authentication_failed |
-33103 | llm.rate_limited (data.retry_after_secs) |
-33104 | llm.model_not_found |
-33105 | llm.context_overflow |
Error example:
{
"jsonrpc": "2.0",
"id": 50,
"error": {
"code": -33103,
"message": "rate limited",
"data": { "retry_after_secs": 30 }
}
}
The host surfaces these as anyhow::Error with messages
operators can grep ("rate limited", "authentication failed",
etc.). Structured retry-after info lands in the message string
for v1; future contract bumps may add a typed
LlmProviderError enum.
5.z Hook methods (Phase 81.27)
Subprocess plugins that contribute hook handlers (declared in
[plugin.extends].hooks, §2.1) implement one host-initiated
request method.
hook.on_hook
// host → child
{
"jsonrpc": "2.0",
"id": 60,
"method": "hook.on_hook",
"params": {
"plugin_id": "compliance_plugin",
"hook_name": "before_message",
"event": {
"sender": "alice",
"body": "ping"
}
}
}
// child → host (block)
{
"jsonrpc": "2.0",
"id": 60,
"result": {
"abort": true,
"reason": "PII detected",
"decision": "block"
}
}
// child → host (transform — rewrite payload)
{
"jsonrpc": "2.0",
"id": 60,
"result": {
"abort": false,
"decision": "transform",
"transformed_body": "[REDACTED]"
}
}
// child → host (allow / no-op)
{
"jsonrpc": "2.0",
"id": 60,
"result": {}
}
The result shape is HookResponse (defined in
crates/extensions/src/runtime/mod.rs). Fields:
abort: bool(legacy block signal — Phase 11.6)reason: Option<String>(operator-readable explanation)override: Option<JsonValue>(key-by-key event mutation; non-object values logged + ignored)decision: Option<"allow" | "block" | "transform">(Phase 83.3 — richer audit signal)transformed_body: Option<String>(only meaningful withdecision: "transform")do_not_reply_again: bool(anti-loop signal — host suppresses pending auto-replies for the conversation)
Default host-side timeout: 5 seconds (lower than channel
30s and LLM 60s — hooks fire on the message hot path; long
timeouts block agent flow). Operator override via
NEXO_PLUGIN_HOOK_TIMEOUT_MS env.
Continue-on-error semantic
Every dispatch failure (transport closed, subprocess crash,
timeout, JSON-RPC error, malformed reply) returns
HookResponse::default() (Continue) on the host side. The
HookRegistry::fire loop continues iterating remaining
handlers and the agent flow does NOT break on subprocess
misbehavior.
This is the explicit philosophy from hook_registry.rs:
"extension misbehavior must not take down agent flow."
Operators see the failures via tracing::warn! (target
plugins.init and the handler's own dispatch logs).
-32601 method_not_found from a child means the plugin declared
extends.hooks = [...] but did not implement the wire method;
the host treats this as Continue (no hard failure).
5.w Memory backend methods (Phase 81.26)
Subprocess plugins that contribute vector store backends
(declared in [plugin.extends].memory_backends, §2.1)
implement three host-initiated request methods. The host's
RemoteVectorBackend wraps each VectorBackend trait method
into a JSON-RPC request; the child replies with the
corresponding result or a typed error.
Every payload carries backend so a single subprocess
advertising multiple backends (extends.memory_backends = ["pinecone", "qdrant"]) can dispatch via one request handler.
v1 ships the wire surface + registry only — operator-side
consumer wiring (LongTermMemory.recall_vector reading from
wire.vector_backend_registry) lands in 81.26.b. Operators
can audit registered backends today via
wire.vector_backend_registry.names().
memory.vector_upsert
// host → child
{
"jsonrpc": "2.0",
"id": 70,
"method": "memory.vector_upsert",
"params": {
"backend": "pinecone",
"collection": "kb",
"records": [
{
"id": "r1",
"content": "hello",
"embedding": [0.1, 0.2, 0.3],
"metadata": {"source": "kb"}
}
]
}
}
// child → host
{ "jsonrpc": "2.0", "id": 70, "result": { "count": 1 } }
embedding is a pre-computed dense vector (host-side embedder
or LLM provider produces it; backend stores). metadata is
opaque JSON the backend may filter against. Default host-side
timeout 30 seconds.
memory.vector_search
// host → child
{
"jsonrpc": "2.0",
"id": 71,
"method": "memory.vector_search",
"params": {
"backend": "pinecone",
"collection": "kb",
"query": {
"embedding": [0.1, 0.2, 0.3],
"limit": 10,
"filter": {"namespace": "tenant-1"}
}
}
}
// child → host
{
"jsonrpc": "2.0",
"id": 71,
"result": {
"matches": [
{
"id": "r1",
"content": "hello",
"score": 0.97,
"metadata": {"source": "kb"}
}
]
}
}
filter is opaque — backend interprets per its native
convention (Pinecone metadata filter, Qdrant filter
expression, Weaviate where, etc.). The host does NOT
validate or rewrite. score uses the backend's native scale
(cosine vs dot-product vs distance). Default host-side
timeout 10 seconds (search is hot-path).
memory.vector_delete
// host → child
{
"jsonrpc": "2.0",
"id": 72,
"method": "memory.vector_delete",
"params": {
"backend": "pinecone",
"collection": "kb",
"ids": ["r1", "r2"]
}
}
// child → host
{ "jsonrpc": "2.0", "id": 72, "result": { "count": 2 } }
Default host-side timeout 30 seconds. Operator override via
NEXO_PLUGIN_MEMORY_TIMEOUT_MS env (single value applied to
all 3 methods).
Memory-specific error codes
In addition to the JSON-RPC standard codes (§7), memory.*
methods MAY return:
| Code | Meaning | data fields |
|---|---|---|
-33301 | memory.collection_not_found | collection |
-33302 | memory.dimension_mismatch | expected, got |
-33303 | memory.rate_limited | retry_after_secs |
-33304 | memory.write_failed | (message) |
Error example:
{
"jsonrpc": "2.0",
"id": 70,
"error": {
"code": -33302,
"message": "dimension mismatch",
"data": { "expected": 768, "got": 2 }
}
}
The host surfaces these as anyhow::Error with messages
operators can grep ("dimension mismatch: expected 768, got 2",
"rate limited; retry after 60s", etc.).
5.t Tool methods (Phase 81.29)
Plugins declaring [plugin.extends].tools = [...] get a
host-initiated tool.invoke request per agent-loop tool call.
The daemon's LLM picks a tool name from the cached function-
calling spec (built from initialize-reply's tools array, see
§4.1.1), the agent's tool registry routes the call to a
RemoteToolHandler, and the handler serializes the call into
a tool.invoke JSON-RPC frame over the existing stdio bridge.
Default timeout: 60 s. Operator override via
NEXO_PLUGIN_TOOL_TIMEOUT_MS.
tool.invoke
Host → child request:
{
"jsonrpc": "2.0",
"id": 80,
"method": "tool.invoke",
"params": {
"plugin_id": "browser",
"tool_name": "browser_navigate",
"args": { "url": "https://example.com" },
"agent_id": "shopper"
}
}
Child → host reply on success — the result body is whatever
JSON shape the daemon's ToolHandler::call returns to the
agent loop. The conventional shape mirrors the in-tree
ToolResponse:
{
"jsonrpc": "2.0",
"id": 80,
"result": {
"content": [
{ "type": "text", "text": "Navigated to https://example.com" }
],
"is_error": false
}
}
Plugins MAY return any JSON Value — the daemon does not
validate the result shape beyond the JSON-RPC envelope. Tool
authors using the Rust SDK return ToolResponse directly.
Tool-specific error codes
| Code | Variant | Semantic |
|---|---|---|
-33401 | ToolNotFound | Plugin doesn't actually implement the declared tool name (drift between manifest and implementation) |
-33402 | ToolArgumentInvalid | Args failed plugin-side schema validation; surface details: <Value> for the offending fields |
-33403 | ToolExecutionFailed | Tool executed but raised a typed error (network failure, CDP hung, etc.) |
-33404 | ToolUnavailable | Resource exhausted, rate-limited, or otherwise transient. Optional data: { retry_after_ms: <u64> } |
-33405 | ToolDenied | Plugin's per-tenant authorization rejected the call (auth-style) |
-32601 | MethodNotFound | Plugin does not implement tool.invoke — manifest declared extends.tools but child doesn't handle the method |
The host surfaces these as anyhow::Error with messages
operators can grep ("tool not found", "argument invalid",
"unavailable; retry after 5s", etc.). The agent loop receives
the error and decides what to do (LLM retry, abort tool plan,
escalate).
6. Topic allowlist
The host derives subscribe + publish patterns from the
manifest's [[plugin.channels.register]] entries.
For each entry with kind = K:
| Direction | Patterns |
|---|---|
| Outbound (daemon → child) | plugin.outbound.K, plugin.outbound.K.> |
| Inbound (child → daemon) | plugin.inbound.K, plugin.inbound.K.> |
Wildcard semantics follow nexo_broker::topic::topic_matches:
*matches exactly one path segment.>matches one or more trailing segments (must have ≥1).- Plain segments match literally.
So plugin.inbound.slack.> matches plugin.inbound.slack.team_a
and plugin.inbound.slack.team_a.thread_42 but not
plugin.inbound.slack (no trailing segments). That's why both
exact and wildcard patterns are in the allowlist for each kind.
A child publish to a topic that does not match any pattern
in the allowlist is dropped — this is the host's primary defense
against a plugin attempting to hijack core nexo topics like
agent.route.* or command.*.
7. Error codes
-32xxx is JSON-RPC reserved range; nexo extensions live in
-31xxx (none used yet) and -32000..-32099 (implementation
defined).
| Code | Meaning |
|---|---|
-32700 | Parse error — line is not valid JSON |
-32600 | Invalid request — well-formed JSON but not JSON-RPC 2.0 |
-32601 | Method not found |
-32602 | Invalid params |
-32603 | Internal error |
-32000 | nexo: shutdown handler returned an error |
-32001..-32099 | Reserved for future nexo error variants |
The host translates each of these into a structured
PluginInitError or PluginShutdownError variant for operator
diagnostics.
8. Backpressure
The host's stdin writer feeds the child via a bounded mpsc
channel of depth 64. When the channel is full (the child is
processing more slowly than the broker is delivering events to
it), new broker.event notifications are dropped with a
warn-level log rather than blocking the daemon's broker.
This matches the at-most-once delivery semantics the broker itself promises — no plugin should be relying on every event arriving. Plugins that need durable delivery should subscribe to a NATS jetstream stream out-of-band, which is outside the scope of this contract.
9. Examples
9.1 Rust
Using the nexo-microapp-sdk crate with the plugin feature
(Phase 81.15.a):
use nexo_microapp_sdk::plugin::{PluginAdapter, BrokerSender}; use nexo_broker::Event; const MANIFEST: &str = include_str!("../nexo-plugin.toml"); #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { PluginAdapter::new(MANIFEST)? .on_broker_event(|topic: String, event: Event, broker: BrokerSender| async move { // Outbound: deliver to the external service. // (Pseudocode; replace with your channel client.) let payload = event.payload.clone(); let text = payload.get("text").and_then(|v| v.as_str()).unwrap_or(""); send_to_slack(text).await; // Inbound: relay any reply back via the broker. let reply = Event::new( "plugin.inbound.slack", "slack", serde_json::json!({"echo": text}), ); let _ = broker.publish("plugin.inbound.slack", reply).await; }) .on_shutdown(|| async { Ok(()) }) .run_stdio() .await?; Ok(()) } async fn send_to_slack(_text: &str) {}
9.2 Python — nexoai
pip install nexoai (the nexo-plugin-sdk name was taken on PyPI; the
importable module stays nexo_plugin_sdk). Source:
nexo-plugin-sdks/python/.
import asyncio
from nexo_plugin_sdk import PluginAdapter, Event
MANIFEST = open("nexo-plugin.toml").read()
async def on_event(topic: str, event: Event, broker) -> None:
# call back into the host (memory.recall §5.2 / llm.complete §5.3):
entries = await broker.memory_recall(agent_id="my_agent", query="user prefers concise answers", limit=5)
result = await broker.llm_complete(provider="minimax", model="minimax-m2.5",
messages=[{"role": "user", "content": "summarize: ..."}])
await broker.publish("plugin.inbound.slack",
Event.new("plugin.inbound.slack", "slack", {"summary": result.content}))
async def main() -> None:
await PluginAdapter(manifest_toml=MANIFEST, on_event=on_event).run()
asyncio.run(main())
9.3 TypeScript / Node — nexo-plugin-sdk
npm install nexo-plugin-sdk. Source:
nexo-plugin-sdks/typescript/.
import { readFileSync } from "node:fs";
import { PluginAdapter, Event } from "nexo-plugin-sdk";
const adapter = new PluginAdapter({
manifestToml: readFileSync("nexo-plugin.toml", "utf-8"),
onEvent: async (topic, event, broker) => {
const entries = await broker.memoryRecall({ agentId: "my_agent", query: "user prefers concise answers", limit: 5 });
const result = await broker.llmComplete({ provider: "minimax", model: "minimax-m2.5",
messages: [{ role: "user", content: "summarize: ..." }] });
await broker.publish("plugin.inbound.slack",
Event.new("plugin.inbound.slack", "slack", { summary: result.content }));
},
});
await adapter.run();
9.4 PHP — nexo/plugin-sdk
composer require nexo/plugin-sdk (PHP ≥ 8.1 — uses Fibers). Source:
nexo-plugin-sdks/php/
(mirrored to nexo-plugin-sdk-php for Packagist).
<?php declare(strict_types=1);
require __DIR__ . '/vendor/autoload.php';
use Nexo\Plugin\Sdk\{PluginAdapter, BrokerSender, Event};
$adapter = new PluginAdapter([
'manifestToml' => file_get_contents(__DIR__ . '/nexo-plugin.toml'),
'onEvent' => function (string $topic, Event $event, BrokerSender $broker): void {
$entries = $broker->memoryRecall(['agentId' => 'my_agent', 'query' => 'user prefers concise answers', 'limit' => 5]);
$r = $broker->llmComplete(['provider' => 'minimax', 'model' => 'minimax-m2.5',
'messages' => [['role' => 'user', 'content' => 'summarize: ...']]]);
$broker->publish('plugin.inbound.slack', Event::new('plugin.inbound.slack', 'slack', ['summary' => $r->content]));
// streaming: $broker->llmCompleteStream($opts, fn(string $chunk) => /* ... */);
},
]);
$adapter->run();
9.5 Tools — host-initiated tool.invoke (Phase 81.29)
A plugin that declares [plugin.extends].tools = ["myplugin_weather"]
in its manifest advertises a tool catalog at handshake (the
initialize reply's tools array, §4.1.1) and handles one
tool.invoke request per agent-loop tool call (§5.t). All four SDKs
expose the same surface: a catalog of tool definitions, one dispatch
handler (optionally with a context giving broker access mid-invocation),
and a typed -33401..-33405 error band.
Rust — crates/microapp-sdk with feature plugin:
use nexo_microapp_sdk::plugin::{PluginAdapter, ToolDef, ToolInvocation, ToolInvocationError}; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { PluginAdapter::new(include_str!("../nexo-plugin.toml"))? .declare_tools(vec![ToolDef { name: "myplugin_weather".into(), description: "Current weather for a city".into(), input_schema: serde_json::json!({ "type": "object", "properties": { "city": { "type": "string" } }, "required": ["city"] }), }]) .on_tool(|inv: ToolInvocation| async move { let city = inv.args.get("city").and_then(|v| v.as_str()) .ok_or_else(|| ToolInvocationError::ArgumentInvalid("missing `city`".into()))?; Ok(serde_json::json!({ "content": [{ "type": "text", "text": format!("Sunny in {city}") }], "is_error": false })) }) .run_stdio().await?; Ok(()) }
Python — nexoai:
import asyncio
from nexo_plugin_sdk import PluginAdapter, ToolDef, ToolInvocation, ToolArgumentInvalid, text_result
MANIFEST = open("nexo-plugin.toml").read()
async def on_tool(inv: ToolInvocation):
city = (inv.args or {}).get("city")
if not city:
raise ToolArgumentInvalid("missing `city`")
return text_result(f"Sunny in {city}")
async def main() -> None:
await PluginAdapter(
manifest_toml=MANIFEST,
tools=[ToolDef("myplugin_weather", "Current weather for a city",
{"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]})],
on_tool=on_tool, # or on_tool_with_context=fn(inv, ctx) — ctx.broker = the on_event broker handle
).run()
asyncio.run(main())
TypeScript — nexo-plugin-sdk:
import { readFileSync } from "node:fs";
import { PluginAdapter, ToolArgumentInvalidError, textResult } from "nexo-plugin-sdk";
const adapter = new PluginAdapter({
manifestToml: readFileSync("nexo-plugin.toml", "utf-8"),
tools: [{ name: "myplugin_weather", description: "Current weather for a city",
inputSchema: { type: "object", properties: { city: { type: "string" } }, required: ["city"] } }],
onTool: (inv) => {
const city = (inv.args as { city?: string } | null)?.city;
if (!city) throw new ToolArgumentInvalidError("missing `city`");
return textResult(`Sunny in ${city}`);
},
// or onToolWithContext: (inv, ctx) => { ... ctx.broker.memoryRecall(...) ... }
});
await adapter.run();
PHP — nexo/plugin-sdk:
<?php declare(strict_types=1);
require __DIR__ . '/vendor/autoload.php';
use Nexo\Plugin\Sdk\{PluginAdapter, Tool, ToolArgumentInvalid, ToolDef, ToolInvocation};
$adapter = new PluginAdapter([
'manifestToml' => file_get_contents(__DIR__ . '/nexo-plugin.toml'),
'tools' => [new ToolDef('myplugin_weather', 'Current weather for a city',
['type' => 'object', 'properties' => ['city' => ['type' => 'string']], 'required' => ['city']])],
'onTool' => function (ToolInvocation $inv) {
$city = $inv->args['city'] ?? null;
if (!$city) { throw new ToolArgumentInvalid('missing `city`'); }
return Tool::text("Sunny in {$city}");
},
// or 'onToolWithContext' => fn(ToolInvocation $inv, ToolContext $ctx) => /* $ctx->broker->memoryRecall(...) */
]);
$adapter->run();
Throwing one of the typed errors maps to the matching -33401..-33405
code: ToolNotFound (-33401), ToolArgumentInvalid (-33402, carries
details), ToolExecutionFailed (-33403 — also the catch-all for an
uncaught generic exception), ToolUnavailable (-33404, carries
retry_after_ms), ToolDenied (-33405). A tool.invoke arriving when
no handler is registered replies -32601. Declaring a tool whose name
is not in the manifest's [plugin.extends].tools is a hard failure at
construction (the daemon would otherwise kill the plugin — see §4.1.1).
10. Versioning + compatibility
This contract uses semver. The current version is 1.0.0.
| Change kind | Semver bump |
|---|---|
| Add a new optional manifest field | minor |
| Add a new optional method (host or child) | minor |
| Add a new optional notification | minor |
Add a new error code in -32000..-32099 | minor |
| Remove or rename a method / notification / field | major |
| Change the JSON shape of a method's params or result | major |
| Tighten validation (e.g. rejecting previously-allowed input) | major |
Plugins should declare the contract version they target via the
manifest's min_nexo_version field plus a future
contract_version field (Phase 81.16 follow-up). The host
rejects plugins targeting a major version it does not support.
11. Reference implementations
- Host adapter:
crates/core/src/agent/nexo_plugin_registry/subprocess.rs(SubprocessNexoPlugin) — Phase 81.14 + 81.14.b. - Rust child SDK:
crates/microapp-sdk/src/plugin.rs(PluginAdapter, featureplugin) — Phase 81.15.a. - Python / TypeScript / PHP child SDKs:
github.com/lordmacu/nexo-plugin-sdks—python/(PyPInexoai),typescript/(npmnexo-plugin-sdk),php/(Packagistnexo/plugin-sdk, via thenexo-plugin-sdk-phpmirror). All implementinitialize/broker.event/shutdown/broker.publish, the child→host callsmemory.recall/llm.complete(+llm.complete.deltastreaming), and the host→childtool.invokerequest + theinitialize-reply tool catalog (§4.1.1 / §5.t — Pythonnexoai≥ 0.4.0, TypeScript / PHP ≥ 0.3.0). - Go SDK: not yet planned.
11.1 Conformance kit
nexo-plugin-sdks/conformance/
is a cross-language conformance kit: one Python mock-host
(conformance/mock_host.py), a set of declarative scenarios
(conformance/scenarios/*.json — the expect* steps are the golden),
and one config-driven fixture per SDK. An SDK is conformant iff
python conformance/run.py --lang <lang> passes for it — the kit drives
the fixture through every exchange this contract defines (initialize /
shutdown / broker.event / broker.publish / memory.recall /
llm.complete (+ streaming) / tool.invoke + the §4.1.1 catalog) and
diffs the frames structurally (methods, ids, result/error shapes,
error codes, error.data shape, key presence/absence — not message
text). The nexo-plugin-sdks CI runs the {python, typescript, php}
matrix; the Rust SDK runs the same kit in this repo's CI via a shallow
clone (--lang rust --fixture <built-binary> --check-contract-version docs/src/plugins/contract.md — the version check ties §13's top entry
to the kit's SCENARIOS_TARGET, so a contract bump that lands without
updating the kit fails CI). The kit does not replace the per-SDK test
suites, which cover lang-specific robustness (async readers, Fiber
scheduling, the stdout guard, signal handling). Added in Phase 31.12;
the Rust-fixture wiring + reconciling the divergences the kit surfaces
(the Rust child SDK does not yet emit error.data.details /
error.data.retry_after_ms, and its nexo_broker::Event serializes
with extra id / timestamp / session_id fields the scripting SDKs
omit) is follow-up 31.12.b.
12. Out of contract scope
The following are part of the broader plugin platform but are deliberately out of THIS document's scope:
memory.recall/llm.complete/tool.dispatchRPC bridges (Phase 81.20) — let the child invoke daemon-mediated framework services.- Supervisor + respawn + resource limits (Phase 81.21).
- Sandbox (network + filesystem allowlist via manifest, Phase 81.22).
- Stdio → tracing bridge (Phase 81.23).
- Plugin marketplace + signing (Phase 31).
Each of these will either extend this contract additively (in
which case contract_version bumps minor) or live in a separate
contract document.
13. Changelog
| Version | Date | Changes |
|---|---|---|
1.0.0 | 2026-05-01 | Initial publication. Lifecycle (initialize / shutdown) + broker bridge (broker.event / broker.publish) + manifest [plugin.entrypoint] section. Host adapter shipped in Phase 81.14 + 81.14.b; Rust child SDK in Phase 81.15.a. |
1.1.0 | 2026-05-01 | Phase 81.20.a — memory.recall request-response added. Additive; existing 1.0.0 plugins continue to work unchanged. Manifest [plugin.supervisor] section (Phase 81.21.b) — additive. Host-side activation: Phase 81.17.b boot wire. Phase 81.21 supervisor + 81.21.b stderr tail capture. |
1.2.0 | 2026-05-01 | Phase 81.20.b — llm.complete request-response added. Additive. MVP supports text responses only; tool-call responses surface as -32601 not_implemented. Streaming (llm.complete.delta notifications) on roadmap as 81.20.b.b. Host-side runtime threading deferred to 81.20.b.b — daemon today returns -32603 "llm not configured" until main.rs threads LlmServices into the subprocess pipeline. |
1.3.0 | 2026-05-01 | Phase 81.20.b.b runtime threading shipped (memory + llm both flow end-to-end through production daemon path). Phase 81.20.b.c streaming added — llm.complete accepts stream: true opt-in; chunks emit as llm.complete.delta { request_id, chunk } notifications, final reply omits content. Additive — non-streaming requests unchanged. |
1.4.0 | 2026-05-04 | Phase 81.28 — [plugin.extends] manifest section added (channels / llm_providers / memory_backends / hooks lists). Schema-only this revision: parser + validator ship; daemon dispatch wiring per registry lands in Phase 81.24 (channels), 81.25 (LLM providers), 81.26 (memory backends), 81.27 (hooks). Additive — manifests without [plugin.extends] parse and validate unchanged. |
1.5.0 | 2026-05-04 | Phase 81.24 — channel.start / channel.stop / channel.send_outbound host-initiated request methods added (§5.x). Subprocess plugins declaring [plugin.extends].channels = [...] get one RemoteChannelAdapter per kind registered into the daemon's ChannelAdapterRegistry. Channel-specific error codes -33001 through -33005 map onto typed ChannelAdapterError variants. Default host-side timeouts: 30 s for start/stop, 60 s for send_outbound; NEXO_PLUGIN_CHANNEL_TIMEOUT_MS overrides all three. Additive — plugins not declaring channels are unaffected. |
1.6.0 | 2026-05-04 | Phase 81.25 — llm.chat host-initiated request method (sync + streaming via params.stream flag) + llm.chat.delta streaming notifications added (§5.y). Subprocess plugins declaring [plugin.extends].llm_providers = [...] get one RemoteLlmFactory per provider name registered into the daemon's LlmRegistry. LLM-specific error codes -33101 through -33105. Default timeouts: 60 s sync chat, 300 s streaming; NEXO_PLUGIN_LLM_TIMEOUT_MS overrides both. Additive — plugins not declaring llm_providers are unaffected. |
1.7.0 | 2026-05-04 | Phase 81.27 — hook.on_hook host-initiated request method added (§5.z). Subprocess plugins declaring [plugin.extends].hooks = [...] get one RemoteHookHandler per hook name registered into the daemon's HookRegistry. Reply shape is the existing HookResponse (already serde-derived); reused directly as wire type. Continue-on-error semantic: every dispatch failure (transport, timeout, JSON-RPC err, decode) returns HookResponse::default() so HookRegistry::fire keeps iterating + agent flow doesn't break. Default 5s timeout (lower than channels/LLM); NEXO_PLUGIN_HOOK_TIMEOUT_MS env override. Additive — plugins not declaring hooks are unaffected. |
1.8.0 | 2026-05-04 | Phase 81.26 — memory.vector_upsert / memory.vector_search / memory.vector_delete host-initiated request methods added (§5.w). Subprocess plugins declaring [plugin.extends].memory_backends = [...] get one RemoteVectorBackend per name registered into the daemon's VectorBackendRegistry. Memory-specific error codes -33301..=-33304. Default timeouts: 30s upsert/delete, 10s search; NEXO_PLUGIN_MEMORY_TIMEOUT_MS env override. v1 ships wire + registry only — consumer-side wiring (LongTermMemory.recall_vector reading from registry) lands in 81.26.b. Vector-only scope: short/long-term memory keep SQLite; plugin replaces ONLY the vector index. Additive — plugins not declaring memory_backends are unaffected. |
1.9.0 | 2026-05-04 | Phase 81.22 — [plugin.sandbox] manifest section added (§2.2). Linux-only bubblewrap-based isolation: 5 fields (enabled, network, fs_read_paths, fs_write_paths, drop_user). Hard denylist enforced via SANDBOX_DENYLIST_HOST_PATHS const — operator-supplied allowlists that cover or equal denylisted paths are rejected at validate time. Two operator capability env knobs: NEXO_PLUGIN_SANDBOX_REQUIRE (strict-mode rejection of sandbox-disabled plugins) + NEXO_PLUGIN_SANDBOX_HOST_NET_ALLOW (gate for network = "host"). macOS no-op + warn (native sandbox-exec deferred to 81.22.macos). Default off — every existing manifest parses and runs unchanged. Additive — plugins without [plugin.sandbox] are unaffected. |
1.10.0 | 2026-05-04 | Phase 81.29 — tool.invoke host-initiated request method added (§5.t) + initialize-reply tools array extension (§4.1.1). Subprocess plugins declaring [plugin.extends].tools = [...] advertise a tool catalog (name/description/input_schema) at handshake; daemon caches the schemas + builds typed function-calling defs for the LLM without per-call round-trip. Each agent-loop tool call becomes a single tool.invoke { plugin_id, tool_name, args, agent_id } request. Tool-specific error codes -33401..=-33405 map onto typed failures (ToolNotFound / ToolArgumentInvalid / ToolExecutionFailed / ToolUnavailable / ToolDenied). Default timeout 60 s; NEXO_PLUGIN_TOOL_TIMEOUT_MS env override. Subset check: advertised tools MUST be subset of extends.tools (drift detection). New extends.tools field is the 5th list in [plugin.extends] (joining channels/llm_providers/memory_backends/hooks). Tool name MUST satisfy per-plugin namespace policy from 81.3 (<plugin_id>_* or ext_<plugin_id>_*). Completes the 5-wrapper subprocess fleet (channels 81.24 + LLM 81.25 + hooks 81.27 + memory 81.26 + tools 81.29) — subprocess plugins can now contribute every category of host-side capability. Additive — plugins not declaring extends.tools are unaffected. |
See also
- Authoring overview
- Rust SDK, Python SDK, TypeScript SDK, PHP SDK
- Publishing a plugin, Signing & publishing
Plugin patterns
Common shapes for nexo subprocess plugins. Each pattern is a template you adapt — pick the closest match to what you're building, copy the skeleton, modify.
All patterns work in any of the 4 SDK languages (Rust / Python / TypeScript / PHP). Examples below use the language that's clearest for the pattern.
Pattern 1 · Echo channel
When to use · You're learning the SDK or wiring a brand-new channel and want a smoke-test before adding logic.
The plugin echoes every inbound broker.event back as
broker.publish on a mirrored topic. Useful for verifying the
wire format end-to-end before you write business logic.
from nexo_plugin_sdk import PluginAdapter, Event
async def on_event(topic, event, broker):
out_topic = topic.replace("plugin.outbound.", "plugin.inbound.")
await broker.publish(out_topic, Event.new(out_topic, "my_plugin", event.payload))
await PluginAdapter(manifest_toml=MANIFEST, on_event=on_event).run()
→ Used in every template (extensions/template-plugin-{rust,python,typescript,php}/)
Pattern 2 · Webhook receiver
When to use · An external service POSTs JSON; you want the daemon to see it as a
plugin.inbound.<kind>event.
Plugin runs an HTTP server (or listens on a Unix socket) for
inbound POST requests. Each request becomes a broker publish.
Plugin's manifest declares an http_server capability so the
daemon's reverse-proxy / port-allocator wires the route.
use nexo_microapp_sdk::plugin::{BrokerSender, Event, PluginAdapter}; use axum::{Router, routing::post, extract::State, Json}; async fn webhook(State(broker): State<Arc<BrokerSender>>, Json(body): Json<Value>) { let event = Event::new("plugin.inbound.webhook", "my_plugin", body); let _ = broker.publish("plugin.inbound.webhook", event).await; } #[tokio::main] async fn main() -> Result<()> { let adapter = PluginAdapter::new(MANIFEST); let broker = adapter.broker(); tokio::spawn(async move { let app = Router::new().route("/webhook", post(webhook)).with_state(broker); axum::serve(listener, app).await }); adapter.run().await }
Manifest declares the inbound topic the plugin will publish to:
[[plugin.channels.register]]
kind = "webhook"
adapter = "WebhookAdapter"
Pattern 3 · RPC bridge to an external API
When to use · You're exposing a third-party service (Stripe, Twilio, internal CRM) as a tool the agent can call.
Plugin doesn't deal with channels — it registers as a tool
provider. The agent sends a tool.call request; the plugin
forwards to the external API and replies.
import { PluginAdapter } from "nexo-plugin-sdk";
const adapter = new PluginAdapter({
manifestToml: MANIFEST,
onEvent: async (topic, event, broker) => {
if (topic === "plugin.tool.stripe.create_invoice") {
const inv = await stripeClient.invoices.create(event.payload);
await broker.publish("plugin.tool.stripe.create_invoice.reply",
Event.new("plugin.tool.reply", "stripe-bridge", { result: inv }));
}
},
});
Manifest contributes the tool:
[[plugin.tools.expose]]
name = "stripe.create_invoice"
schema_path = "./tools/create_invoice.json"
Pattern 4 · Scheduled poller
When to use · You need to poll an external feed every N minutes and publish only changes (deltas) to the broker.
Plugin holds local state (last-seen IDs / etag / cursor),
re-polls on a timer, dedupes against state, publishes new items.
Persist state to <state_dir>/<plugin_id>/state.json so restarts
don't re-emit historical items.
import asyncio, json, aiohttp
from pathlib import Path
from nexo_plugin_sdk import PluginAdapter, Event
STATE = Path(".nexo-state/poller.json")
seen_ids: set[str] = set(json.loads(STATE.read_text())) if STATE.exists() else set()
async def poll_loop(broker):
while True:
async with aiohttp.ClientSession() as s:
items = await (await s.get("https://example.com/feed.json")).json()
for item in items:
if item["id"] in seen_ids:
continue
seen_ids.add(item["id"])
await broker.publish("plugin.inbound.feed",
Event.new("plugin.inbound.feed", "feed_poller", item))
STATE.write_text(json.dumps(list(seen_ids)))
await asyncio.sleep(300) # 5 min
adapter = PluginAdapter(manifest_toml=MANIFEST)
asyncio.create_task(poll_loop(adapter.broker))
await adapter.run()
→ See Build a poller module for the YAML-only path that doesn't need a plugin at all.
Pattern 5 · Long-running connection (websocket / SSE)
When to use · The external service is push-based (Slack RTM, Discord gateway, MQTT broker, custom WebSocket).
Plugin opens the persistent connection at startup. Inbound
messages from the external side become broker.publish events.
On disconnect, the plugin reconnects with exponential backoff.
#![allow(unused)] fn main() { use tokio_tungstenite::connect_async; let (ws, _) = connect_async("wss://gateway.discord.gg/").await?; let (write, mut read) = ws.split(); // Auth handshake omitted... while let Some(msg) = read.next().await { let evt = parse_discord(msg?)?; broker.publish("plugin.inbound.discord", evt).await?; } // On disconnect: reconnect with backoff. }
The SDK's signal handling (default-on) lets the daemon shut the plugin down cleanly even mid-connection.
Pattern 6 · Stateful conversation glue
When to use · The external channel sends fragments (audio chunks, typing indicators, partial messages) and you want to assemble them before the agent sees a complete event.
Plugin maintains a per-conversation buffer; only emits a
broker.publish when the message is "complete" (final chunk,
silence timeout, or explicit done marker).
buffer: dict[str, list[str]] = {}
timers: dict[str, asyncio.Task] = {}
async def on_chunk(conv_id, fragment, broker):
buffer.setdefault(conv_id, []).append(fragment)
if conv_id in timers:
timers[conv_id].cancel()
timers[conv_id] = asyncio.create_task(flush_after(conv_id, broker, delay=2.0))
async def flush_after(conv_id, broker, delay):
await asyncio.sleep(delay)
text = "".join(buffer.pop(conv_id, []))
await broker.publish("plugin.inbound.assembled",
Event.new("plugin.inbound.assembled", "voice_glue", {"text": text}))
Pattern 7 · Outbound-only adapter
When to use · The plugin only sends (Twilio SMS sender, push notification dispatcher, Slack outbound webhook).
Plugin subscribes to plugin.outbound.<kind> events from the
daemon, calls the external API, and publishes a delivery_status
event back so the agent knows whether it landed.
const adapter = new PluginAdapter({
manifestToml: MANIFEST,
onEvent: async (topic, event, broker) => {
if (topic.startsWith("plugin.outbound.sms")) {
const result = await twilio.messages.create({
to: event.payload.to,
body: event.payload.body,
from: TWILIO_FROM,
});
await broker.publish("plugin.delivery.sms",
Event.new("plugin.delivery.sms", "twilio-out",
{ sid: result.sid, status: result.status }));
}
},
});
Pattern 8 · Provider abstraction (multi-instance)
When to use · Operator wants 3 different Telegram bots, each isolated. Or 5 WhatsApp accounts.
Plugin manifest declares instance support. Operator's config
spawns N copies, each with a distinct instance label. Topics
become plugin.inbound.<kind>.<instance> so agent bindings can
target a specific one.
# operator's pollers.yaml
plugins:
telegram:
- instance: support-bot
bot_token_env: TG_SUPPORT_TOKEN
- instance: sales-bot
bot_token_env: TG_SALES_TOKEN
The plugin reads instance from args or env at startup and
publishes to plugin.inbound.telegram.<instance>.
Choosing between patterns
| If you... | Use |
|---|---|
| Are wiring a brand-new channel for the first time | Echo (pattern 1) |
| Need to receive HTTP from an external service | Webhook receiver (2) |
| Are exposing an external API as a tool | RPC bridge (3) |
| Need to poll something on a timer | Scheduled poller (4) |
| Have a push-based external service | Long-running connection (5) |
| Receive fragmented inputs (chunks, partials) | Stateful glue (6) |
| Only need to send (no receive) | Outbound-only (7) |
| Want N copies of the same plugin | Provider abstraction (8) |
See also
- Plugin contract — the wire format every plugin implements.
- Publishing a plugin — CI workflow + asset convention.
- Python SDK, TypeScript SDK, PHP SDK — language-specific guides.
Rust plugin SDK
Phase 31.9. Author plugins in Rust that the daemon spawns as subprocesses, talking the same JSON-RPC 2.0 wire format used by the Python / TypeScript / PHP SDKs.
The SDK lives in
crates/microapp-sdk/
behind the plugin Cargo feature; the reference plugin template
is at
extensions/template-plugin-rust/.
Use nexo plugin new <id> --lang rust to scaffold a fresh
out-of-tree project from that template.
Read this when
- You picked Rust from the language picker in Plugin authoring overview and want the SDK reference.
- You are porting an in-tree plugin (
crates/plugins/<id>) into an out-of-tree subprocess and need the wire-API mapping. - You want the canonical Rust handler signature for
broker.eventnotifications.
Why subprocess + Rust
Running Rust plugins as separate processes — instead of crates linked into the daemon — gives you:
- Isolation — a panic in your plugin terminates one process, not the daemon.
- One contract, every language — the daemon treats your binary the same way it treats Python or TypeScript plugins. Switching languages later is an SDK choice, not a daemon recompile.
- No link-time coupling — your plugin can use any Rust
toolchain or
tokioversion that compiles; the daemon does not care. - Single static binary —
cargo build --releaseproduces one file the publish workflow uploads as the per-target tarball.
Daemon-side spawn code in
crates/core/src/agent/nexo_plugin_registry/subprocess.rs
treats the plugin as an opaque executable; Rust plugins re-use
that path without modification.
Architecture
Operator host Plugin process
┌──────────────────┐ stdin ┌─────────────────────────────┐
│ daemon (Rust) │──JSON-RPC──▶│ target/release/<id> │
│ subprocess host │ │ tokio::main async runtime │
│ │◀──JSON-RPC──│ PluginAdapter.run_stdio() │
└──────────────────┘ stdout └─────────────────────────────┘
The daemon writes newline-delimited JSON-RPC requests to your
binary's stdin; you write replies + outbound broker.publish
notifications back on stdout. stderr is collected by the
operator's tracing pipeline (Phase 81.23 fold pending) — use it
freely for plugin-side logs.
Public API
#![allow(unused)] fn main() { use nexo_broker::Event; use nexo_microapp_sdk::plugin::{BrokerSender, PluginAdapter}; }
PluginAdapter builder methods:
| Method | Required | Description |
|---|---|---|
PluginAdapter::new(manifest_toml: &str) | ✅ | Body of nexo-plugin.toml. Read once at startup; the SDK validates plugin.id + plugin.version and surfaces ManifestError on parse failure. |
.on_broker_event(handler) | ⬜ | async fn(topic: String, event: Event, broker: BrokerSender). Invoked for every broker.event notification. Each handler call is spawned on the runtime; the dispatch loop continues reading stdin without blocking. |
.on_shutdown(handler) | ⬜ | async fn() -> Result<(), Box<dyn Error + Send + Sync>>. Awaited before the SDK replies {ok: true} to the host's shutdown request. In-flight on_broker_event tasks are awaited too. |
.run_stdio().await | ✅ | Single-shot — calling it twice returns PluginError::AlreadyRunning. Drives the JSON-RPC loop until stdin closes or the host sends shutdown. |
Event (re-exported from nexo-broker) carries topic,
source, payload: serde_json::Value, optional correlation_id
metadata. Construct withEvent::new(topic, source, payload)which stamps a fresh UUID + RFC3339 timestamp.
BrokerSender::publish(topic: &str, event: Event) -> Result<(), WireError> serializes a broker.publish notification to stdout
under an internal write lock. The daemon's bridge re-checks the
topic against the manifest's [[plugin.channels.register]]
allowlist before forwarding to the broker.
Manifest example
[plugin]
id = "my_plugin"
version = "0.1.0"
name = "My Plugin"
description = "Forwards inbound events to a third-party API."
min_nexo_version = ">=0.1.0"
[plugin.requires]
nexo_capabilities = ["broker"]
[[plugin.channels.register]]
kind = "my_plugin_inbound"
description = "Inbound events the plugin emits onto the broker."
plugin.id MUST match ^[a-z][a-z0-9_]{0,31}$. Cargo's
[[bin]] name MUST equal plugin.id so the publish workflow's
pack-tarball.sh finds the artifact at
target/<target>/release/<id>.
See Plugin contract for the full manifest schema and the JSON-RPC envelope every method exchanges.
Quickstart
Scaffold + build + run, copy-paste:
nexo plugin new my_plugin --lang rust --owner alice
cd my_plugin
cargo build
nexo plugin run .
nexo plugin run boots the daemon with your plugin injected at
the head of plugins.discovery.search_paths, bypassing the
install pipeline. See Local dev loop for the
inner-loop conventions and --no-daemon-config.
The handler in the scaffolded src/main.rs echoes every
inbound event back on plugin.inbound.<id>_echo:
use nexo_broker::Event; use nexo_microapp_sdk::plugin::{print_manifest_if_requested, BrokerSender, PluginAdapter}; const MANIFEST: &str = include_str!("../nexo-plugin.toml"); #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { // First line — honour the daemon's plugin auto-discovery probe. // When invoked with `--print-manifest`, dump the embedded TOML // to stdout and exit 0 before constructing any runtime state. print_manifest_if_requested(MANIFEST); tracing_subscriber::fmt() .with_writer(std::io::stderr) .init(); PluginAdapter::new(MANIFEST)? .on_broker_event(handle_event) .on_shutdown(|| async { tracing::info!("plugin shutdown handler invoked"); Ok(()) }) .run_stdio() .await?; Ok(()) } async fn handle_event(topic: String, event: Event, broker: BrokerSender) { let echo = Event::new( "plugin.inbound.my_plugin_echo", "my_plugin", serde_json::json!({ "echoed_from": topic, "echoed_payload": event.payload, }), ); let _ = broker .publish("plugin.inbound.my_plugin_echo", echo) .await; }
Replace the body of handle_event with your channel's real
outbound logic (forward to a third-party API, persist to disk,
trigger a downstream agent, etc.) and re-publish the API's
reply back through broker so agents can observe it.
Smoke test
Hand-run the binary against a synthetic JSON-RPC frame to confirm the handshake is well-formed:
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' \
| ./target/debug/my_plugin
The plugin should print one JSON-RPC response containing your
manifest's id, version, name, and the SDK's
server_version. If you see anything other than a single line
of valid JSON on stdout, check that you have not added stray
println!s in the handler — every byte on stdout must be a
JSON-RPC frame. Use eprintln! / tracing::* for logs.
Auto-discovery probe
The daemon's discovery walker checks each candidate binary with
--print-manifest (Phase 81.33 Stage 8). Verify your plugin
answers it correctly:
./target/debug/my_plugin --print-manifest
The expected output is the verbatim contents of
nexo-plugin.toml followed by exit 0. The
print_manifest_if_requested(MANIFEST) call in main() handles
this for you — if the smoke test prints anything else (logs,
empty stdout, a JSON-RPC frame) the helper is missing from your
entry point.
Per-target tarball convention
Operators install Rust plugins via the same nexo plugin install <owner>/<repo>[@<tag>] CLI. The resolver expects
per-target tarballs:
<id>-<version>-<target>.tar.gz
├── nexo-plugin.toml
└── bin/<id> # static binary, mode 0755
Targets follow Rust's standard target triples
(x86_64-unknown-linux-gnu, aarch64-apple-darwin,
x86_64-unknown-linux-musl, etc.). The shipped CI workflow
in
extensions/template-plugin-rust/.github/workflows/release.yml
covers Linux musl + macOS by default; add additional matrix
entries to support more.
CI publish workflow
The shipped workflow has 4 jobs: validate-tag →
build (matrix) → optional sign (cosign keyless,
gated by repo variable COSIGN_ENABLED) → release (uploads
all tarballs + sha256 sidecars + signing material + a copy of
nexo-plugin.toml). See
Publishing a plugin for the full asset
naming convention and
Signing & publishing for the
end-to-end signed-release tutorial.
Local validation
Before pushing a tag, dry-run the pack step:
cargo build --release --target x86_64-unknown-linux-gnu
bash scripts/pack-tarball.sh x86_64-unknown-linux-gnu
ls dist/
# my_plugin-0.1.0-x86_64-unknown-linux-gnu.tar.gz
# my_plugin-0.1.0-x86_64-unknown-linux-gnu.tar.gz.sha256
The Rust integration test
extensions/template-plugin-rust/tests/pack_tarball.rs
covers this end-to-end against a synthetic binary; copy it when
you fork the template to keep the convention regression-tested.
SDK tests
cargo test -p nexo-microapp-sdk --features plugin
Covers handshake, manifest validation, dispatch (including non-blocking reader proof), shutdown lifecycle, unknown-method handling, oversized-frame rejection.
See also
- Plugin authoring overview — start here if you have not picked a language yet.
- Plugin contract — full wire spec every SDK implements.
- Patterns (8 common shapes) — channel / poller / hybrid plugin shapes.
- Publishing a plugin — CI workflow shape and asset naming convention.
- Signing & publishing — cosign keyless tutorial.
- Plugin trust (
trusted_keys.toml) — operator-side verification policy.
Python plugin SDK
Author plugins in Python that the daemon spawns as subprocesses,
talking the same JSON-RPC 2.0 wire format used by the Rust SDK in
crates/microapp-sdk/.
The robustness defaults (stdout guard, frame cap, signal handling)
match the TypeScript and PHP
SDKs (sub-phase 31.4.c).
Reference template:
extensions/template-plugin-python/
(or run nexo plugin new --lang python). The SDK package lives in the
nexo-plugin-sdks repo
(python/ subdir) and ships on PyPI as
nexoai — pip install nexoai
(the nexo-plugin-sdk name was taken; the importable module is still
nexo_plugin_sdk).
Why subprocess + Python instead of an embedded interpreter
Running Python plugins as separate processes:
- Keeps the daemon language-agnostic; one wire contract, many SDK languages.
- Isolates plugin failures (a runaway Python plugin cannot panic the daemon).
- Sidesteps GIL coordination + PyO3 link-time complexity.
Daemon-side spawn code in
crates/core/src/agent/nexo_plugin_registry/subprocess.rs
treats the plugin as an opaque executable; Python plugins
re-use it without modification.
Architecture summary
Operator host Plugin process
┌──────────────────┐ stdin ┌──────────────────────────┐
│ daemon (Rust) │──JSON-RPC──▶│ bin/<id> (bash launcher) │
│ subprocess host │ │ exec python3 main.py │
│ │◀──JSON-RPC──│ PluginAdapter.run() │
└──────────────────┘ stdout └──────────────────────────┘
The bash launcher in bin/<id> sets PYTHONPATH=lib/ and
exec's the vendored Python runtime so the plugin's deps come
from lib/ only — no site-packages interference.
Public API
from nexo_plugin_sdk import (
PluginAdapter,
BrokerSender,
Event, EventHandler, ShutdownHandler,
PluginError, ManifestError, WireError,
read_manifest,
install_stdout_guard, uninstall_stdout_guard, is_stdout_guard_installed,
STDOUT_GUARD_MARKER,
MAX_FRAME_BYTES, JSONRPC_VERSION,
serialize_frame, build_response, build_error_response, build_notification,
)
PluginAdapter constructor (all keyword-only):
| Parameter | Default | Description |
|---|---|---|
manifest_toml: str | required | Body of nexo-plugin.toml. Parsed + validated once at construction; the SDK checks plugin.id (incl. the ^[a-z][a-z0-9_]{0,31}$ slug regex the host enforces) and plugin.version. A failed construction leaves no stdout guard installed. |
server_version: str | "0.1.0" | Returned in the initialize reply alongside the manifest. |
on_event | None | async (topic, Event, BrokerSender) -> None. Invoked for every broker.event notification. Handler runs in a detached task; the dispatch loop continues reading stdin without blocking. |
on_shutdown | None | async () -> None. Awaited before the SDK replies {ok: true} to the host's shutdown request. In-flight on_event (and tool.invoke) tasks are also awaited before returning. |
tools | None | list[ToolDef] — the tool catalog advertised in the initialize reply's tools array (contract §4.1.1). Also settable post-construction via .declare_tools([...]). Every name must appear in the manifest's [plugin.extends].tools — a name that doesn't raises ManifestError at construction (mirrors the host's hard-failure). Omit → no tools array in the reply. |
on_tool | None | (ToolInvocation) -> Any, sync or async. Dispatch handler for tool.invoke (contract §5.t). Runs on a detached task tracked by the shutdown drain. Mutually exclusive with on_tool_with_context. |
on_tool_with_context | None | (ToolInvocation, ToolContext) -> Any, sync or async. Like on_tool, but ctx.broker is the same BrokerSender on_event gets — a tool body can memory_recall / llm_complete mid-invocation. Wins over on_tool when both are set. |
enable_stdout_guard: bool | True | Replace sys.stdout with a line-buffering proxy that diverts non-JSON lines (a stray print) to stderr tagged [stdout-guard]. Blessed replies / broker.publish frames write through the captured original stdout, bypassing the guard. |
max_frame_bytes: int | MAX_FRAME_BYTES (1 MiB) | Inbound JSON-RPC frames larger than this are rejected with a WireError log; dispatch continues. |
handle_process_signals: bool | True | SIGTERM / SIGINT → graceful shutdown: drain in-flight handlers, then exit 0. loop.add_signal_handler is the primary path, falling back to signal.signal where unavailable (Windows ProactorEventLoop / non-main-thread). |
Calling run() twice raises PluginError. The stdin reader is fully
async (loop.connect_read_pipe + asyncio.StreamReader) — no
threadpool worker.
Event is a dataclass with topic, source, payload, optional
correlation_id + metadata. BrokerSender.publish(topic, event)
serializes a JSON-RPC notification to the captured original stdout
under an asyncio write lock.
Stdout guard limitation
The guard only intercepts the text-stream API (print,
sys.stdout.write). A C extension or subprocess that writes to file
descriptor 1 directly bypasses it. Plugin authors who need stdout
output should use print() / sys.stdout.write().
Tool dispatch (tool.invoke, contract §4.1.1 + §5.t)
A plugin that declares [plugin.extends].tools = ["myplugin_weather"]
advertises a catalog of ToolDef(name, description, input_schema) and
handles one tool.invoke request per agent-loop tool call:
from nexo_plugin_sdk import (
PluginAdapter, ToolDef, ToolInvocation, ToolContext,
ToolNotFound, ToolArgumentInvalid, ToolExecutionFailed, ToolUnavailable, ToolDenied,
text_result,
)
async def on_tool(inv: ToolInvocation, ctx: ToolContext):
if inv.tool_name != "myplugin_weather":
raise ToolNotFound(inv.tool_name)
city = (inv.args or {}).get("city")
if not city:
raise ToolArgumentInvalid("missing `city`", details={"field": "city"})
# ctx.broker is the on_event broker handle — host calls work mid-invocation:
# _ = await ctx.broker.memory_recall(agent_id=inv.agent_id or "", query=city)
return text_result(f"Sunny in {city}") # any JSON value is fine; this is the conventional shape
await PluginAdapter(
manifest_toml=MANIFEST,
tools=[ToolDef("myplugin_weather", "Current weather for a city",
{"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]})],
on_tool_with_context=on_tool, # or on_tool=fn(inv) when you don't need the broker
).run()
The handler's return value becomes the JSON-RPC result verbatim
(non-JSON-serializable → -33403). Raising one of the ToolInvocationError
subclasses maps to the matching -33401..-33405 code (with
error.data.details / error.data.retry_after_ms when set); an uncaught
generic exception maps to -33403; a tool.invoke with no handler
registered replies -32601. (PyPI nexoai ≥ 0.4.0.)
Tarball convention (noarch)
Operators install Python plugins via the same
nexo plugin install <owner>/<repo>[@<tag>] CLI. The resolver
in nexo-ext-installer falls back to noarch when no
per-target tarball matches the daemon's host triple:
<id>-<version>-noarch.tar.gz
├── nexo-plugin.toml
├── bin/<id> # bash launcher, mode 0755
└── lib/
├── plugin/main.py
└── nexo_plugin_sdk/
└── ...
The launcher (~5 LOC) reads:
#!/usr/bin/env bash
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
exec env PYTHONPATH="$DIR/lib" python3 "$DIR/lib/plugin/main.py" "$@"
Pure-Python deps constraint
noarch requires that vendored deps work on every operator's
CPU. Native extensions (*.so, *.pyd, *.dylib) invalidate
the claim. The publish workflow's audit step runs
scripts/verify-pure-python.sh post-vendor and rejects any
tree containing those suffixes.
If your plugin needs a native dep, per-target Python tarballs
(<id>-<version>-py312-x86_64-linux.tar.gz etc.) are tracked
as Phase 31.4.b and not yet shipped.
CI publish workflow
The shipped workflow in
extensions/template-plugin-python/.github/workflows/release.yml
has the same 4-job shape as the Rust template (see
Publishing a plugin) but:
- Build matrix has a single
noarchentry. - Build step uses
actions/setup-python@v5+pip install --target lib/instead ofcargo zigbuild. - Vendor audit step calls
scripts/verify-pure-python.shto enforce the pure-Python constraint.
Sign + release jobs are identical to the Rust template; cosign
keyless OIDC ships .sig + .pem + .bundle per asset when
the COSIGN_ENABLED repo variable is "true".
Operator install flow (no changes for Python)
nexo plugin install your-handle/your-plugin@v0.2.0
Identical pipeline to the Rust install path:
- Resolve release JSON.
- Try
<id>-0.2.0-<host-triple>.tar.gz(miss for noarch plugins). - Fall back to
<id>-0.2.0-noarch.tar.gz(Phase 31.4 addition). - Verify sha256.
- Cosign verify per
trusted_keys.toml(Phase 31.3). - Extract under
<dest_root>/<id>-0.2.0/. - Daemon picks it up at next boot or hot-reload; spawns
bin/<id>which exec'spython3 lib/plugin/main.py.
Local smoke test
echo '{"jsonrpc":"2.0","id":1,"method":"initialize"}' \
| python3 src/main.py
Should print one JSON-RPC response with your manifest +
server_version.
End-to-end test for the pack pipeline:
python3 -m unittest extensions/template-plugin-python/tests/test_pack_tarball.py -v
SDK tests
In a clone of nexo-plugin-sdks:
cd python
PYTHONPATH=. python3 -m unittest discover -v tests/
21 tests: handshake (incl. unknown-method -32601), manifest
validation (missing id, invalid TOML, id-regex violation), dispatch
(incl. non-blocking reader proof + oversized frame rejected with
continued dispatch), stdout guard (idempotent install, divert vs
passthrough, handler-print diverted while the blessed frame stays
clean), broker.publish back channel, lifecycle (double run()
rejected, SIGTERM exits 0, SIGTERM drains an in-flight handler).
See also
- Publishing a plugin (CI workflow) — Rust counterpart of the publish workflow this template is modeled after.
- Plugin trust (
trusted_keys.toml) — operator-side cosign verification policy that applies to Python plugins too. - Plugin contract — wire format both SDKs implement.
TypeScript plugin SDK
Author plugins in TypeScript (or plain JavaScript) that the daemon
spawns as subprocesses, talking the same JSON-RPC 2.0 wire format used
by the Rust SDK in
crates/microapp-sdk/
and the Python / PHP SDKs.
Reference template:
extensions/template-plugin-typescript/
(or run nexo plugin new --lang typescript). The SDK package lives in
the nexo-plugin-sdks
repo (typescript/ subdir) and ships on npm as
nexo-plugin-sdk —
npm install nexo-plugin-sdk.
Why subprocess + Node instead of an embedded runtime
Running TypeScript plugins as separate Node processes:
- Keeps the daemon language-agnostic; one wire contract, three shipped SDK languages (Rust, Python, TypeScript).
- Isolates plugin failures (a runaway plugin cannot crash the daemon).
- Sidesteps V8 embedding complexity.
Daemon-side spawn code in
crates/core/src/agent/nexo_plugin_registry/subprocess.rs
treats the plugin as an opaque executable; TypeScript plugins
re-use it without modification.
Architecture summary
Operator host Plugin process
┌──────────────────┐ stdin ┌──────────────────────────┐
│ daemon (Rust) │──JSON-RPC──▶│ bin/<id> (bash launcher) │
│ subprocess host │ │ exec node main.js │
│ │◀──JSON-RPC──│ PluginAdapter.run() │
└──────────────────┘ stdout └──────────────────────────┘
The bash launcher in bin/<id> sets
NODE_PATH=lib/node_modules and exec's the vendored Node
runtime so the plugin's deps come from lib/ only — no global
node_modules interference.
Public API
import {
PluginAdapter,
BrokerSender,
Event,
PluginError, ManifestError, WireError,
installStdoutGuard, parseManifest,
STDOUT_GUARD_MARKER,
} from "nexo-plugin-sdk";
PluginAdapter constructor options:
| Option | Required | Description |
|---|---|---|
manifestToml: string | ✅ | Body of nexo-plugin.toml. Read once at startup; the SDK validates plugin.id (regex /^[a-z][a-z0-9_]{0,31}$/), plugin.version, plugin.name, plugin.description. |
serverVersion?: string | ⬜ | Returned in the initialize reply. Default "0.1.0". |
onEvent?: EventHandler | ⬜ | async (topic, Event, BrokerSender) => Promise<void>. Invoked for every broker.event notification. Handler runs in a detached task; the dispatch loop continues reading stdin without blocking. |
onShutdown?: ShutdownHandler | ⬜ | async () => Promise<void>. Awaited before {ok: true} reply to the host's shutdown request. In-flight onEvent (and tool.invoke) tasks are also awaited before returning. |
tools?: ToolDef[] | ⬜ | { name, description, inputSchema }[] — the tool catalog advertised in the initialize reply's tools array (contract §4.1.1; serialized with the wire key input_schema). Every name must appear in the manifest's [plugin.extends].tools — otherwise the constructor throws ManifestError. |
onTool?: (inv) => unknown | Promise<unknown> | ⬜ | Dispatch handler for tool.invoke (contract §5.t). Runs as a detached task tracked by the shutdown drain. Mutually exclusive with onToolWithContext. |
onToolWithContext?: (inv, ctx) => unknown | Promise<unknown> | ⬜ | Like onTool, but ctx.broker is the same BrokerSender onEvent gets — a tool body can memoryRecall / llmComplete mid-invocation. Wins over onTool when both are set. |
enableStdoutGuard?: boolean | ⬜ default true | Patches process.stdout.write so any stray console.log from your handler (or a chatty transitive dep) is diverted to stderr tagged with STDOUT_GUARD_MARKER instead of corrupting the JSON-RPC frame stream. |
maxFrameBytes?: number | ⬜ default 1 MiB | Reject inbound frames larger than this with a WireError log; dispatch continues. |
handleProcessSignals?: boolean | ⬜ default true | Listen for SIGTERM + SIGINT and trigger graceful shutdown (drain in-flight, exit 0). |
Event is a value object with topic, source, payload,
optional correlation_id + metadata.
BrokerSender.publish(topic, event) serializes a JSON-RPC
notification to stdout under a Promise-chain write lock so
concurrent handler tasks never interleave half-written frames.
Tool dispatch (tool.invoke, contract §4.1.1 + §5.t)
import { PluginAdapter, ToolNotFoundError, ToolArgumentInvalidError, textResult } from "nexo-plugin-sdk";
const adapter = new PluginAdapter({
manifestToml: readFileSync("nexo-plugin.toml", "utf-8"),
tools: [{ name: "myplugin_weather", description: "Current weather for a city",
inputSchema: { type: "object", properties: { city: { type: "string" } }, required: ["city"] } }],
onToolWithContext: async (inv, ctx) => {
if (inv.toolName !== "myplugin_weather") throw new ToolNotFoundError(inv.toolName);
const city = (inv.args as { city?: string } | null)?.city;
if (!city) throw new ToolArgumentInvalidError("missing `city`", { field: "city" });
// ctx.broker is the onEvent broker handle — e.g. await ctx.broker.memoryRecall({ agentId: inv.agentId ?? "", query: city });
return textResult(`Sunny in ${city}`); // any JSON value is fine; this is the conventional shape
},
// or onTool: (inv) => ... when you don't need the broker
});
await adapter.run();
The handler's return value becomes the JSON-RPC result verbatim
(non-serializable → -33403). Throwing ToolNotFoundError /
ToolArgumentInvalidError (.details) / ToolExecutionFailedError /
ToolUnavailableError (.retryAfterMs) / ToolDeniedError maps to the
matching -33401..-33405 code; an uncaught throw maps to -33403; a
tool.invoke with no handler registered replies -32601.
(npm nexo-plugin-sdk ≥ 0.3.0.)
Tarball convention (noarch)
Operators install TypeScript plugins via the same
nexo plugin install <owner>/<repo>[@<tag>] CLI. The resolver
in nexo-ext-installer falls back to noarch when no
per-target tarball matches the daemon's host triple (Phase
31.4):
<id>-<version>-noarch.tar.gz
├── nexo-plugin.toml
├── bin/<id> # bash launcher, mode 0755
└── lib/
├── plugin/main.js # compiled from src/main.ts via tsc
└── node_modules/
├── nexo-plugin-sdk/dist/...
└── ... # pure-JS production deps
The launcher (~5 LOC) reads:
#!/usr/bin/env bash
set -euo pipefail
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
exec env NODE_PATH="$DIR/lib/node_modules" node "$DIR/lib/plugin/main.js" "$@"
Pure-JS deps constraint
noarch requires that vendored deps work on every operator's
CPU. Native node addons (*.node, *.so, *.dylib, *.dll)
invalidate the claim. The publish workflow's audit step runs
scripts/verify-pure-js.sh post-vendor and rejects any tree
containing those suffixes.
If your plugin needs a native dep, per-target TypeScript
tarballs (<id>-<version>-node20-x86_64-linux.tar.gz etc.) are
tracked as Phase 31.5.b and not yet shipped.
Stdout guard — the robustness multiplier
Plugin authors invariably console.log("debug") at some point,
or import a chatty dep (dotenv banners, transitive logging
libs). Without protection, the very first non-JSON line on
stdout corrupts the daemon's JSON-RPC parser mid-stream — no
recovery path, the host disconnects.
The default-on stdout guard wraps process.stdout.write and:
- Buffers writes until a newline arrives.
- Each complete line is
JSON.parse-tested. - Lines that parse → forwarded to the real stdout.
- Lines that don't parse → diverted to stderr tagged with
[stdout-guard] <line>.
The blessed write path (BrokerSender and the SDK's own
response helpers) always emits valid JSON so frames pass through
unchanged. Operator log scraping picks up the [stdout-guard]
marker so debug output stays visible without breaking the wire
format.
Set enableStdoutGuard: false only if you have another guard
layer (e.g. process-level isolation) — it is the single
strongest recommendation in the SDK.
CI publish workflow
The shipped workflow in
extensions/template-plugin-typescript/.github/workflows/release.yml
has the same 4-job shape as the Rust + Python templates but:
- Build matrix has a single
noarchentry. - Build step uses
actions/setup-node@v4+npm ci+npm run typecheck+npm run build(tsctodist/). - Pre-vendor:
npm prune --omit=devstrips dev deps so only runtime deps land in the tarball. - Vendor audit step calls
scripts/verify-pure-js.sh .audit/lib/node_modulesto enforce pure-JS.
Sign + release jobs are identical to the Rust + Python templates;
cosign keyless OIDC ships .sig + .pem + .bundle per asset
when the COSIGN_ENABLED repo variable is "true".
Operator install flow (no changes for TypeScript)
nexo plugin install your-handle/your-plugin@v0.2.0
Identical pipeline to the Rust + Python install paths:
- Resolve release JSON.
- Try
<id>-0.2.0-<host-triple>.tar.gz(miss for noarch plugins). - Fall back to
<id>-0.2.0-noarch.tar.gz(Phase 31.4 addition). - Verify sha256.
- Cosign verify per
trusted_keys.toml(Phase 31.3). - Extract under
<dest_root>/<id>-0.2.0/. - Daemon picks it up at next boot or hot-reload; spawns
bin/<id>which exec'snode lib/plugin/main.jswithNODE_PATH=lib/node_modules.
Local smoke test
echo '{"jsonrpc":"2.0","id":1,"method":"initialize"}' \
| node dist/main.js
Should print one JSON-RPC response with your manifest +
server_version.
End-to-end test for the pack pipeline:
node --test tests/pack-tarball.test.mjs
SDK tests
In a clone of nexo-plugin-sdks:
cd typescript
npm install
npm run build
npm test
13 tests across handshake, manifest validation, dispatch,
stdout-guard, wire, lifecycle. All run via stdlib node:test
so there is zero install friction beyond the SDK's runtime
dep on smol-toml.
See also
- Publishing a plugin (CI workflow) — Rust counterpart of the publish workflow this template is modeled after.
- Python plugin SDK — sibling SDK in Python.
- Plugin trust (
trusted_keys.toml) — operator-side cosign verification policy that applies to TypeScript plugins too. - Plugin contract — wire format all SDKs implement.
PHP plugin SDK
Author plugins in PHP 8.1+ that the daemon spawns as subprocesses, talking the same JSON-RPC 2.0 wire format used by the Rust + Python + TypeScript SDKs.
Reference template:
extensions/template-plugin-php/
(or run nexo plugin new --lang php). The SDK package lives in the
nexo-plugin-sdks repo
(php/ subdir, mirrored to
nexo-plugin-sdk-php
for Packagist) and ships on Packagist as
nexo/plugin-sdk —
composer require nexo/plugin-sdk.
Why PHP 8.1+
The SDK uses Fibers (introduced in PHP 8.1) to run each
broker.event handler as a cooperative coroutine. Without
Fibers the dispatch loop would block on slow handlers,
breaking the contract invariant proven necessary by the TS +
Python SDKs.
Architecture summary
Operator host Plugin process
┌──────────────────┐ stdin ┌────────────────────────────┐
│ daemon (Rust) │──JSON-RPC──▶│ bin/<id> (bash launcher) │
│ subprocess host │ │ exec php main.php │
│ │◀──JSON-RPC──│ PluginAdapter::run() │
└──────────────────┘ stdout │ Fiber scheduler ticks │
│ between stdin polls │
└────────────────────────────┘
The bash launcher in bin/<id> runs:
exec env php -d display_errors=stderr -d log_errors=0 \
"$DIR/lib/plugin/main.php" "$@"
-d display_errors=stderr is critical — without it, PHP's
default behavior writes errors to stdout, which would corrupt
the JSON-RPC frame stream.
Daemon-side spawn code in
crates/core/src/agent/nexo_plugin_registry/subprocess.rs
treats the plugin as an opaque executable; PHP plugins re-use
it without modification.
Public API
use Nexo\Plugin\Sdk\PluginAdapter; // async dispatch loop
use Nexo\Plugin\Sdk\BrokerSender; // write-only broker handle
use Nexo\Plugin\Sdk\Event; // value object
use Nexo\Plugin\Sdk\Manifest; // standalone TOML parser
use Nexo\Plugin\Sdk\StdoutGuard; // defensive guard
use Nexo\Plugin\Sdk\Wire; // JSON-RPC frame helpers + MAX_FRAME_BYTES
use Nexo\Plugin\Sdk\PluginError; // base exception
use Nexo\Plugin\Sdk\ManifestError; // raised when manifest malformed
use Nexo\Plugin\Sdk\WireError; // raised on malformed/oversized frames
PluginAdapter constructor options:
| Option | Required | Description |
|---|---|---|
manifestToml: string | ✅ | Body of nexo-plugin.toml. Read once at startup; the SDK validates plugin.id (regex /^[a-z][a-z0-9_]{0,31}$/), plugin.version, plugin.name, plugin.description. |
serverVersion?: string | ⬜ | Returned in the initialize reply. Default "0.1.0". |
onEvent?: callable(string, Event, BrokerSender): void | ⬜ | Invoked for every broker.event notification. Runs in a Fiber so the dispatch loop continues. |
onShutdown?: callable(): void | ⬜ | Awaited before {ok: true} reply to the host's shutdown request. In-flight Fibers (onEvent + tool.invoke) also drained first. |
tools?: ToolDef[] | ⬜ | new ToolDef($name, $description, $inputSchema)[] — the tool catalog advertised in the initialize reply's tools array (contract §4.1.1; serialized with the wire key input_schema). Every $name must appear in the manifest's [plugin.extends].tools — otherwise the constructor throws ManifestError. |
onTool?: callable(ToolInvocation): mixed | ⬜ | Dispatch handler for tool.invoke (contract §5.t). Runs in a Fiber tracked by the scheduler's drain set. Mutually exclusive with onToolWithContext. |
onToolWithContext?: callable(ToolInvocation, ToolContext): mixed | ⬜ | Like onTool, but $ctx->broker is the same BrokerSender onEvent gets — a tool body can memoryRecall / llmComplete mid-invocation. Wins over onTool when both are set. |
enableStdoutGuard?: bool | ⬜ default true | Installs an ob_start callback that diverts non-JSON echo/print/printf/var_dump output to stderr tagged with [stdout-guard]. |
maxFrameBytes?: int | ⬜ default 1048576 | Reject inbound frames larger than this with WireError; dispatch continues. |
handleProcessSignals?: bool | ⬜ default true | Listen for SIGTERM + SIGINT via pcntl_async_signals and trigger graceful shutdown (drain in-flight, exit 0). |
Tool dispatch (tool.invoke, contract §4.1.1 + §5.t)
The tool classes live in src/Tool.php (loaded via the files autoload
entry alongside src/Host.php):
use Nexo\Plugin\Sdk\{PluginAdapter, Tool, ToolDef, ToolInvocation, ToolContext,
ToolNotFound, ToolArgumentInvalid, ToolExecutionFailed, ToolUnavailable, ToolDenied};
$adapter = new PluginAdapter([
'manifestToml' => file_get_contents(__DIR__ . '/nexo-plugin.toml'),
'tools' => [new ToolDef('myplugin_weather', 'Current weather for a city',
['type' => 'object', 'properties' => ['city' => ['type' => 'string']], 'required' => ['city']])],
'onToolWithContext' => function (ToolInvocation $inv, ToolContext $ctx): mixed {
if ($inv->toolName !== 'myplugin_weather') { throw new ToolNotFound($inv->toolName); }
$city = $inv->args['city'] ?? null;
if (!$city) { throw new ToolArgumentInvalid('missing `city`', ['field' => 'city']); }
// $ctx->broker is the onEvent broker handle — e.g. $ctx->broker->memoryRecall(['agentId' => $inv->agentId ?? '', 'query' => $city]);
return Tool::text("Sunny in {$city}"); // any JSON value is fine; this is the conventional shape
},
// or 'onTool' => fn(ToolInvocation $inv) => ... when you don't need the broker
]);
$adapter->run();
The handler's return value becomes the JSON-RPC result verbatim
(non-encodable → -33403). Throwing ToolNotFound /
ToolArgumentInvalid ($details) / ToolExecutionFailed /
ToolUnavailable ($retryAfterMs) / ToolDenied maps to the matching
-33401..-33405 code (the code is carried via
parent::__construct($msg, $code) like RpcServerError — read it with
getCode()); an uncaught \Throwable maps to -33403; a tool.invoke
with no handler registered replies -32601.
(Packagist nexo/plugin-sdk ≥ 0.3.0.)
Tarball convention (noarch)
Operators install PHP plugins via the same
nexo plugin install <owner>/<repo>[@<tag>] CLI. The resolver
in nexo-ext-installer falls back to noarch when no
per-target tarball matches the daemon's host triple (Phase
31.4):
<id>-<version>-noarch.tar.gz
├── nexo-plugin.toml
├── bin/<id> # bash launcher mode 0755
└── lib/
├── plugin/main.php
└── vendor/ # composer install --no-dev output
├── autoload.php
├── nexo/plugin-sdk/...
├── yosymfony/toml/...
└── composer/...
Composer integration
Templates consume the in-tree SDK via a path repository:
"repositories": [
{
"type": "path",
"url": "../sdk-php",
"options": { "symlink": false }
}
]
symlink: false is critical — without it Composer creates a
symlink in vendor/nexo/plugin-sdk/ pointing at the path repo.
When the tarball is packed, that symlink would break on the
operator host. With symlink: false Composer copies the SDK
files physically — the tarball stays self-contained.
The publish workflow runs:
composer install --no-dev --optimize-autoloader --classmap-authoritative
This produces a deterministic + smallest vendor tree. The
operator host does NOT need Composer installed — the
vendor/autoload.php shipped in the tarball is plain PHP and
works with just php-cli.
composer.lock is checked in for the template (reproducibility
analogous to Cargo.lock for binary projects). The SDK itself
omits the lockfile so consumers resolve fresh against their own
constraints.
Pure-PHP deps constraint
noarch requires that vendored deps work on every operator's
CPU. Native PHP extensions (*.so, *.dylib, *.dll) are
normally loaded via php.ini from /usr/lib/php/<version>/,
NOT vendored. If a Composer dep smuggles in a native build
artifact under vendor/, the publish workflow's
scripts/verify-pure-php.sh audit step rejects the tarball.
If your plugin needs a native dep, per-target tarballs are tracked as Phase 31.5.c.b and not yet shipped.
Stdout guard — what's guarded vs not
| API | Behavior |
|---|---|
echo $x; | ✅ Guarded — non-JSON lines diverted to stderr. |
print $x; | ✅ Guarded. |
printf("%s", $x); | ✅ Guarded. |
var_dump($x); | ✅ Guarded. |
fwrite(STDOUT, $x); | ❌ NOT guarded — bypasses ob_start. The SDK's own BrokerSender::publish() uses this deliberately so blessed JSON frames always reach the host. |
Plugin authors who need stdout output should use echo /
print / printf — those are guarded. Calling
fwrite(STDOUT, ...) directly from author code is undefined
behavior; the operator's daemon will see the raw bytes and
disconnect on parser failure.
CI publish workflow
The shipped workflow in
extensions/template-plugin-php/.github/workflows/release.yml
has the same 4-job shape as the Rust + Python + TS templates
but:
- Build matrix has a single
noarchentry. - Build step uses
shivammathur/setup-php@v2withphp-version: "8.3"+tools: composer:v2. composer validate --strictgates the build.composer install --no-dev --optimize-autoloader --classmap-authoritativeproduces the vendor tree.- Pack step calls
scripts/pack-tarball-php.shwithSKIP_COMPOSER=1(composer ran already). - Vendor audit step calls
scripts/verify-pure-php.sh .audit/lib/vendorto enforce pure-PHP.
Sign + release jobs are identical to the other templates;
cosign keyless OIDC ships .sig + .pem + .bundle per asset
when the COSIGN_ENABLED repo variable is "true".
Operator install flow (no changes for PHP)
nexo plugin install your-handle/your-plugin@v0.2.0
Identical pipeline to the Rust + Python + TS install paths:
- Resolve release JSON.
- Try
<id>-0.2.0-<host-triple>.tar.gz(miss for noarch plugins). - Fall back to
<id>-0.2.0-noarch.tar.gz(Phase 31.4 addition). - Verify sha256.
- Cosign verify per
trusted_keys.toml(Phase 31.3). - Extract under
<dest_root>/<id>-0.2.0/. - Daemon picks it up at next boot or hot-reload; spawns
bin/<id>which exec'sphp lib/plugin/main.php.
Local smoke test
echo '{"jsonrpc":"2.0","id":1,"method":"initialize"}' \
| php src/main.php
Should print one JSON-RPC response with your manifest +
server_version.
End-to-end test for the pack pipeline:
php tests/test_pack_tarball.php
SDK tests
In a clone of nexo-plugin-sdks:
cd php
composer install
php tests/run-all.php
14 test cases across handshake, manifest validation, dispatch
(incl. Fiber-based slow-handler proof + drain), stdout-guard,
wire-format hardening, lifecycle, event round-trip. All run via
plain PHP scripts using proc_open — zero PHPUnit / Pest dep,
mirroring the TS SDK's node:test choice and the Python SDK's
unittest choice.
Plugin author constraint: cooperative scheduling
The Fiber scheduler preserves the "reader does not block on handler" invariant only at SDK boundaries. If your handler calls a synchronous blocking I/O function:
$result = file_get_contents("https://example.com/slow"); // blocks
…the dispatch loop blocks for the duration of the call. Cooperative scheduling cannot interrupt blocking I/O. Two mitigations:
- Keep handlers fast — typical channel plugins do work in <10ms.
- For long external calls, periodically
Fiber::suspend()to yield. The SDK doesn't auto-suspend; that's an explicit author decision.
This matches the Python and TypeScript SDKs' contract — long blocking work is the author's responsibility to break up.
See also
- Publishing a plugin (CI workflow) — Rust counterpart of the publish workflow this template is modeled after.
- TypeScript plugin SDK — sibling SDK with similar robustness defaults.
- Python plugin SDK — sibling SDK with the closest async model match.
- Plugin trust (
trusted_keys.toml) — operator-side cosign verification policy that applies to PHP plugins too. - Plugin contract — wire format all SDKs implement.
Publishing a plugin (CI workflow)
Phase 31.2. Operators install plugins via:
nexo plugin install <owner>/<repo>[@<tag>]
The CLI hits the GitHub Releases API of <owner>/<repo> and
expects a fixed asset naming convention. This page documents the
convention so plugin authors can publish releases that the
operator-side install path consumes without translation.
The reference Rust plugin template
extensions/template-plugin-rust/
ships a drop-in workflow plus helper scripts. Copy them to your
own plugin repo and you are done.
Asset naming convention
For every release tag v<semver> (e.g. v0.2.0) the workflow
uploads the following assets to the GitHub Release:
| Asset | Required | Contents |
|---|---|---|
nexo-plugin.toml | ✅ | The plugin manifest. Operator's CLI fetches first to learn plugin.id. |
<id>-<version>-<target>.tar.gz | ✅ | One per supported target. Layout: bin/<id> + nexo-plugin.toml at the root, no top-level wrapping dir. Binary mode 0755 on Unix. |
<id>-<version>-<target>.tar.gz.sha256 | ✅ | Single line of lowercase hex (64 chars). |
<id>-<version>-<target>.tar.gz.sig | ⬜ | Cosign keyless signature blob. |
<id>-<version>-<target>.tar.gz.pem | ⬜ | Cosign certificate. |
<id>-<version>-<target>.tar.gz.bundle | ⬜ | Cosign Sigstore bundle. |
Targets follow Rust's standard target triple notation
(x86_64-unknown-linux-gnu, aarch64-apple-darwin, etc.).
Publish workflow shape
The shipped workflow has four jobs:
validate-tag— checks tag format^v[0-9]+\.[0-9]+\.[0-9]+(-[a-zA-Z0-9.]+)?$, asserts the tag matches theversiondeclared innexo-plugin.toml. Hard fails on mismatch (no partial release).build— matrix over targets. For each:cargo zigbuild --release --target <target>for linux musl entries (cross-compiled fromubuntu-latest).cargo build --release --target <target>for darwin entries (run onmacos-latest).bash scripts/pack-tarball.sh <target>produces the tarball- sha256 sidecar following the convention above.
sign(optional, gated on repo variableCOSIGN_ENABLED == "true") — keyless cosign signs each tarball using the workflow's OIDC token, producing.sig/.pem/.bundleper asset.release— creates the GitHub Release if missing, uploads all artifacts includingnexo-plugin.toml. Uses--clobberso re-runs of the same tag overwrite stale assets.
Required permissions
permissions:
contents: write # gh release upload
id-token: write # cosign keyless OIDC
GITHUB_TOKEN is auto-provided. No additional secrets required
for the unsigned path. Cosign keyless does not need any secret
either — it uses Sigstore/Fulcio with the workflow's OIDC token.
Enabling cosign signing
gh variable set COSIGN_ENABLED --body true
After signing is enabled, every tag push produces signing
material that operators with config/extensions/trusted_keys.toml
(Phase 31.3) can verify against your GitHub identity.
Constraint: cargo bin name = plugin id
Cargo's [[bin]] name MUST equal nexo-plugin.toml [plugin] id.
The convention is bin/<id> inside the tarball, and
pack-tarball.sh looks for the binary at
target/<target>/release/<id>. Mismatch fails the pack step
(built binary missing at target/...).
Local validation
Before pushing a tag, dry-run the pack step:
cargo build --release --target x86_64-unknown-linux-gnu
bash scripts/pack-tarball.sh x86_64-unknown-linux-gnu
ls dist/
# my_plugin-0.2.0-x86_64-unknown-linux-gnu.tar.gz
# my_plugin-0.2.0-x86_64-unknown-linux-gnu.tar.gz.sha256
The Rust integration test tests/pack_tarball.rs covers this
end to end against a synthetic binary; copy it when you fork the
template to keep the convention regression-tested.
Troubleshooting
tag 'X' does not match v<semver>— the workflow rejects any tag that does not start withvand parse as semver. Examples:v0.2.0,v1.0.0-beta.3. Reject:0.2.0(missingv),v0.2,v01.0.0(leading zero).nexo-plugin.toml version <X> != tag <Y>— the workflow enforces that the tag and the manifest version match. Update one before retagging.built binary missing at target/...—cargoproduced a binary at a path other than whatpack-tarball.shexpected. Check[[bin]] nameinCargo.tomlmatches[plugin] idinnexo-plugin.toml.- Operator hits
TargetNotFound— your matrix did not build for the operator's target triple. Re-enable the matrix entry and re-run; operator can also pass--targetto override.
See also
- Plugin contract (out-of-tree) — the wire format the binary speaks once the operator runs it.
crates/ext-installer/README.md— operator-side install pipeline that consumes these assets.
Signing & publishing your plugin
Phase 31.9. End-to-end tutorial: take a freshly scaffolded
plugin from nexo plugin new, ship it as a public GitHub
release that operators can install signed, and confirm an
operator with --require-signature accepts it.
This page is the how-to. For reference material:
- Publishing a plugin — asset naming convention + workflow-job shape.
- Plugin trust (
trusted_keys.toml) — operator-side verification policy and troubleshooting.
Read this when
- You finished a plugin and want to publish your first release.
- You want operators on
--require-signatureto trust your releases via cosign keyless signing. - You want a concrete checklist before tagging
v0.1.0.
Prerequisites
- A GitHub repo containing the plugin scaffolded by
nexo plugin new <id> --lang <lang>. Repo must use the shipped.github/workflows/release.ymlfrom the matchingextensions/template-plugin-<lang>/template (the scaffolder copies it for you). ghCLI authenticated against the repo (gh auth status).gitconfigured to push tags toorigin.- (Optional, for signing)
cosignis not required on your host — keyless cosign runs inside GitHub Actions using the workflow's OIDC token.
1. Publish your first release (unsigned)
The shortest path. Tag, push, watch CI.
# Pick a semver tag matching plugin.version in nexo-plugin.toml.
# The validate-tag job will reject any mismatch.
git tag v0.1.0
git push origin v0.1.0
The shipped workflow runs three jobs by default (validate-tag → build → release; sign is gated and stays inactive until you opt in):
gh run watch # tail the latest run
gh release view v0.1.0 # confirm assets uploaded
Expected assets per <target>:
nexo-plugin.toml
my_plugin-0.1.0-x86_64-unknown-linux-gnu.tar.gz
my_plugin-0.1.0-x86_64-unknown-linux-gnu.tar.gz.sha256
Operators can already install at this point with default trust
mode (warn):
nexo plugin install your-handle/my_plugin@v0.1.0
The CLI prints ! No signature in release; trust mode is 'warn' — proceeding unverified. and extracts the plugin.
2. Add cosign keyless signing
Cosign keyless does not need any secret on your end — it uses Sigstore + Fulcio with the GitHub Actions OIDC token. Enable it with one command:
gh variable set COSIGN_ENABLED --body true
Re-tag (or move the existing tag) and re-run the workflow:
git tag -d v0.1.0
git tag v0.1.0
git push --force origin v0.1.0
The sign job now runs and produces three extra assets per
tarball:
my_plugin-0.1.0-x86_64-unknown-linux-gnu.tar.gz.sig
my_plugin-0.1.0-x86_64-unknown-linux-gnu.tar.gz.pem
my_plugin-0.1.0-x86_64-unknown-linux-gnu.tar.gz.bundle
The certificate's Subject Alternative Name (SAN) encodes the workflow URL plus the ref:
https://github.com/your-handle/my_plugin/.github/workflows/release.yml@refs/tags/v0.1.0
Operators with --require-signature will allowlist this SAN
shape via a regex — that's what step 3 is about.
3. Operator-side trust setup
Operators who want to enforce signatures add an [[authors]]
entry to <config_dir>/extensions/trusted_keys.toml:
schema_version = "1.0"
default = "warn"
[[authors]]
owner = "your-handle"
identity_regexp = "^https://github\\.com/your-handle/[^/]+/\\.github/workflows/release\\.yml@.*$"
oidc_issuer = "https://token.actions.githubusercontent.com"
mode = "require"
Notes for the operator (link this paragraph from your plugin's README):
ownermatches the<owner>segment ofnexo plugin install <owner>/<repo>invocations.identity_regexpshould be specific to your owner and loose on tag so it survives release-tag bumps. The example above accepts every repo underyour-handle/that shipsrelease.ymlfrom its default workflow path.- Anchored
^…$is intentional — leaving anchors off makes the regex match substrings of unrelated SANs.
The full sample with comments lives at
config/extensions/trusted_keys.toml.example in the nexo-rs
repo.
4. Verify the round trip
On a host with cosign installed, an operator runs:
nexo plugin install your-handle/my_plugin@v0.1.0 --require-signature
Expected human output:
→ Resolving your-handle/my_plugin@v0.1.0 (target: x86_64-unknown-linux-gnu)
✓ Found release v0.1.0 (x86_64-unknown-linux-gnu, 4.1 MB, sha256 ab12cd34ef56…)
→ Downloading
✓ sha256 verified
→ Verifying signature against trusted_keys.toml
✓ Signature verified (identity: https://github.com/your-handle/my_plugin/.github/workflows/release.yml@refs/tags/v0.1.0)
→ Extracting to /var/lib/nexo/plugins
✓ Plugin installed at /var/lib/nexo/plugins/my_plugin-0.1.0
✓ Lifecycle event emitted (broker)
JSON output (--json) carries the full report including
signature_verified, signature_identity, signature_issuer,
trust_mode, and trust_policy_matched:
nexo plugin install your-handle/my_plugin@v0.1.0 --require-signature --json
{
"ok": true,
"id": "my_plugin",
"version": "0.1.0",
"target": "x86_64-unknown-linux-gnu",
"plugin_dir": "/var/lib/nexo/plugins/my_plugin-0.1.0",
"binary_path": "/var/lib/nexo/plugins/my_plugin-0.1.0/bin/my_plugin",
"sha256": "ab12cd34ef56...",
"size_bytes": 4194304,
"was_already_present": false,
"lifecycle_event_emitted": true,
"signature_verified": true,
"signature_identity": "https://github.com/your-handle/my_plugin/.github/workflows/release.yml@refs/tags/v0.1.0",
"signature_issuer": "https://token.actions.githubusercontent.com",
"trust_mode": "require",
"trust_policy_matched": "your-handle"
}
5. Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
CosignNotFound | Operator host lacks cosign binary. | Install via brew install cosign, apt install cosign, or download from https://github.com/sigstore/cosign/releases. |
PolicyRequiresSig | Trust mode is require but release has no .sig / .cert. | Re-run the workflow after gh variable set COSIGN_ENABLED --body true. |
CosignFailed | Cert SAN does not match identity_regexp. | Compare the SAN reported in the error against the regex. Common cause: regex too tight on tag (v0\.1\.0 instead of .*). |
Sha256Mismatch | Tarball corrupted in transit or rebuilt out-of-band. | Re-tag and re-run; uploads are reproducible from the same commit. |
TargetNotFound | Operator's host triple has no matching tarball. | Add the missing entry to the build matrix in release.yml and re-tag. |
For full operator-side troubleshooting (cosign discovery
fallbacks, identity_regexp examples, manual cosign verify-blob invocation), see
Plugin trust.
See also
- Plugin authoring overview — picks a language and gets you to a running plugin in 5 minutes.
- Publishing a plugin — asset naming reference and the 4-job CI workflow anatomy.
- Plugin trust (
trusted_keys.toml) — operator-side cosign verification policy. - Plugin contract — wire format the binary speaks once installed.
- Verifying releases — same Sigstore keyless flow used for nexo-rs's own release signing.
Plugin supervisor (auto-respawn)
Subprocess plugins are isolated child processes. When one crashes, the daemon supervisor can either pause + log (default) or auto-respawn it with exponential backoff up to a bounded number of attempts. This page documents the manifest knobs that control that behaviour, the broker lifecycle events the supervisor publishes, and the edge cases operators should plan for.
Manifest knobs
[plugin.supervisor]
respawn = false # opt-in. Default: false (Phase 81.21.b semantics)
max_attempts = 3 # cap on respawns before "gave_up". Default: 3
backoff_ms = 1000 # initial backoff; doubles per attempt, capped 60s. Default: 1000
stderr_tail_lines = 32 # ring buffer per running child for crash forensics. Default: 32
respawn is opt-in — community-tier plugins should not
silently keep restarting if they're broken. Operators that trust
their plugin (in-house adapters, well-tested community plugins)
flip the toggle on; everything else stays paused-on-crash.
max_attempts is the hard ceiling. After that many consecutive
respawn attempts the supervisor publishes gave_up and stops.
The operator must restart the daemon (or fix the plugin + redeploy)
to recover.
backoff_ms is the initial wait before the first retry. Each
subsequent attempt doubles the wait, capped at 60 seconds.
Example with backoff_ms = 1000:
| Attempt | Wait |
|---|---|
| 1 | 1s |
| 2 | 2s |
| 3 | 4s |
| 4 | 8s |
| 5 | 16s |
| 6 | 32s |
| 7+ | 60s (capped) |
stderr_tail_lines is the per-running-plugin ring buffer of
recent stderr lines. On crash the supervisor drains it into the
stderr_tail field of the lifecycle events for forensic context.
Hard-capped at 512 by manifest validation.
Lifecycle events (broker)
Every transition publishes a best-effort event on the daemon's broker (NATS-style topic). Subscribers can stream these into audit logs, dashboards, or alerts.
| Topic | When | Payload |
|---|---|---|
plugin.lifecycle.<id>.crashed | Child exit detected (non-zero) | {plugin_id, exit_code, stderr_tail: Vec<String>} |
plugin.lifecycle.<id>.respawning | Before each backoff sleep | {plugin_id, attempt: u32 (1-indexed), backoff_ms: u64} |
plugin.lifecycle.<id>.respawned | After successful re-handshake | {plugin_id, attempt, total_uptime_ms} |
plugin.lifecycle.<id>.gave_up | After attempts >= max_attempts | {plugin_id, attempts, last_exit_code, stderr_tail} |
plugin.lifecycle.<id>.restarted_manually | After force_restart completes | {plugin_id, previous_uptime_ms: u64, restarted_at_ms: i64, new_pid?: u32} |
source field on every event = "plugin.supervisor".
stderr_tail is chronological (oldest first), capped at the
manifest's stderr_tail_lines.
respawned.total_uptime_ms carries the previous Inner's uptime
in milliseconds (Phase 90 audit fix — was always 0). Subscribers
diffing crashed→respawned timestamps can now consume the field
directly.
gave_up.last_exit_code = -1 (sentinel) indicates a spawn
failure — the supervisor never reached the handshake. A real
child exit code (e.g. 1, 127, 139) means the child started but
crashed; the per-attempt stderr_tail carries forensics. Spawn-
failure paths emit an empty stderr_tail because there was no
process to read from.
restarted_manually is published only by operator-initiated
nexo/admin/plugins/restart calls. Auto-respawn cycles emit
crashed+respawning+respawned/gave_up instead.
new_pid is Some when Tokio could read the freshly spawned
child's PID (almost always the case); None for pathological
spawns where Child::id() returned None.
Auto-respawn flow
Initial init() — spawn_one_attempt + handshake
│
▼
(child running)
│ ───── NormalExit (clean shutdown) ──── return
│
▼ Crashed
publish "crashed" event
│
│ ┌── respawn=false ──── return (Phase 81.21.b semantics)
│ │
│ ▼ respawn=true
maybe reset attempt counter (heuristic)
│
│ ┌── attempt >= max_attempts ──── publish "gave_up" + return
│ │
│ ▼
publish "respawning {attempt+1, backoff_ms}"
│
sleep next_backoff(attempt) (or shutdown short-circuit)
│
drain pending oneshots with "plugin restarted; retry"
│
spawn_one_attempt + handshake
│
│ ┌── Err ──── attempt += 1; loop continues
│ │
│ ▼ Ok
check shutdown_signaled (kill child if shutdown fired race)
│
install new Inner; publish "respawned"
│
▼
attempt += 1; loop continues
Reset attempt counter heuristic
If the most recent child sobreived ≥ backoff_ms × max_attempts × 2
milliseconds after a respawn, the supervisor treats the next crash
as a transient blip rather than a continuation of a respawn loop —
the attempt counter resets to 0. This permits recovery from network
blips / OAuth token refreshes / occasional segfaults without
masking real crash loops.
The window is hard-capped at 10 × 60s = 600s so an over-tuned
manifest can't disable the heuristic entirely.
The window is not an operator knob; it derives from
backoff_ms + max_attempts. Operators that want a longer
window bump backoff_ms (which also slows down respawns) — that
trade-off is intentional. A future follow-up may expose
restart_window_secs as an explicit field if real-world demand
emerges.
Shutdown semantics
shutdown()flips a per-plugin atomic flag and notifies the supervisor immediately. A supervisor parked in backoff sleep wakes within milliseconds (no waiting up to 60s for the natural deadline).- A shutdown that races a respawn handshake will kill the
just-spawned child if shutdown fires between
spawn_one_attemptreturning Ok and the newInnerinstallation. No orphaned processes. - The daemon-wide
ctx_shutdowncancellation token is also observed. Either source returns the supervisor cleanly.
Manual restart
Operators can force-restart any subprocess plugin from the admin
UI without restarting the daemon. Useful after a gave_up event
(auto-respawn loop exhausted) or to apply config changes that
only take effect at boot.
| Topic | Capability | Behaviour |
|---|---|---|
nexo/admin/plugins/restart { plugin_id } | plugin_restart | Force-kill + fresh spawn + new respawn_loop |
The restart is distinct from auto-respawn:
- Publishes
plugin.lifecycle.<id>.restarted_manually(NOTcrashed+respawned) — operator dashboards can distinguish intentional restarts from crash recovery. - Capability
plugin_restartis separate fromplugin_doctor(read-only). Security review can grant write+destructive separately from read access. - Bypasses
respawn=false— even with auto-respawn disabled, the manual restart spawns a fresh child + respawn_loop. After manual restart, the new respawn_loop respects the manifest'srespawnsetting again.
Flow
operator clicks "Restart" in plugin admin UI
↓
RPC nexo/admin/plugins/restart { plugin_id }
↓
LivePluginRestarter.restart() — lookup + downcast + force_restart()
↓
SubprocessNexoPlugin::force_restart()
├─ capture previous_uptime_ms (Inner.spawned_at.elapsed())
├─ drain pending oneshots with "plugin restarted by operator"
├─ cancel.cancel() (cascade tears down writer/reader/forwarders/supervisor)
├─ wait up to 2s for supervisor task to drain
├─ force-kill child if still alive
├─ tokio::time::timeout(60s, spawn_one_attempt(...))
├─ capture new_pid from child.id()
├─ install new Inner
├─ spawn fresh respawn_loop
├─ publish "restarted_manually" event
└─ return PluginsRestartResponse { plugin_id, previous_uptime_ms,
restarted_at_ms, new_pid }
Errors
| Error | Maps to | Operator action |
|---|---|---|
plugin {id} not found | InvalidParams | Refresh admin UI; plugin removed from manifest |
plugin {id} is in-tree | InvalidParams | Use daemon restart for in-tree plugins |
restart timed out (60s) | Internal | Plugin in degraded state; inspect logs + fix manifest |
plugin handles not yet populated; daemon still booting | Internal | Retry after 1-2s; daemon finishing wire_plugin_registry |
Limitations
- Subprocess plugins only — in-tree plugins (
assistant,dispatch-tools) cannot be hot-restarted. Operator restarts the daemon. - Manifest unchanged — force_restart uses the cached manifest;
operator-edited
manifest.entrypoint.commandwon't take effect until daemon restart. Manifest hot-reload is a deferred follow-up. - No coalesce — concurrent restart calls (two operators clicking
simultaneously) execute sequentially via
self.inner.lock(). Functional but with funny intermediate state for ~1s. Add explicit coalesce only if abuse seen. - No restart cooldown / rate-limiting — capability gate is the gate. Add cooldown only if abuse seen.
Limitations + open follow-ups
- No Prometheus counter —
nexo_plugin_respawn_total{plugin_id, outcome}pending the general metrics pipeline. - No multi-recipient encrypt for stderr_tail — captured plaintext only. A plugin that prints secrets to stderr will leak them via lifecycle events.
- Per-attempt timeout is the same
NEXO_PLUGIN_INIT_TIMEOUT_MSused by the initial spawn. A respawn handshake that hangs beyond the timeout counts as a failed attempt.
Operator checklist
- Decide
respawnper-plugin. Defaultfalseis safer; flip on for plugins you trust. - Tune
backoff_msto your plugin's recovery character. OAuth refresh blips: 1-5s. Network outages: 5-30s. Heavy boot plugins: 5s+ to avoid wasting CPU on tight retry loops. - Subscribe to
plugin.lifecycle.>from a downstream system (audit log, alerting). Thegave_uptopic is the operator's clearest signal that human action is needed. - Read
stderr_tailoncrashedevents for a quick crash triage before tailing log files manually.
Web Search plugin
Multi-provider web search (Brave / Tavily / DuckDuckGo / Perplexity) for Nexo agents. Subprocess binary; daemon discovers
- spawns via
[plugin.entrypoint].
Phase 95 — extracted from
crates/web-search/to standalone subprocess pluginnexo-rs-plugin-web-searchv0.1.0. Daemon'sweb_search_routerfield onAgentContext/AgentRuntimeremoved (nexo-core 0.2.0breaking).
Install
cargo install nexo-plugin-web-search
The binary lands at $HOME/.cargo/bin/nexo-plugin-web-search.
Discovery walker probes it with --print-manifest and
auto-registers.
Operator config
<config_dir>/plugins/web-search.yaml:
instances:
- id: default # required, unique
# agent_id omitted → shared across all agents
providers:
brave:
api_key_path: ./secrets/brave_api_key.txt
timeout_ms: 8000
tavily:
api_key_path: ./secrets/tavily_api_key.txt
timeout_ms: 10000
duckduckgo:
timeout_ms: 12000 # no API key required
cache:
enabled: true
path: ./data/web_search_cache.db
ttl_secs: 3600
default_order: [brave, tavily, duckduckgo]
Multi-instance × multi-agent
Power-users with several agents each wanting their own search
profile declare multiple instances: entries. Optional
agent_id per instance scopes it to that single agent:
instances:
- id: default # shared baseline
providers: { duckduckgo: {} }
default_order: [duckduckgo]
- id: research # private for ana
agent_id: ana
providers:
perplexity:
api_key_path: ./secrets/ana_perplexity.txt
cache: { path: ./data/ana_research.db }
default_order: [perplexity]
- id: news # another private for ana
agent_id: ana
providers:
brave:
api_key_path: ./secrets/ana_brave.txt
cache: { enabled: false }
default_order: [brave]
Resolution per agent's web_search call:
args.instanceif operator-supplied.- Agent's first private instance from
by_agentmap. - First shared instance (no
agent_id). - Error if none.
Tool surface
web_search arguments:
| Field | Required | Description |
|---|---|---|
query | yes | Search query string. |
count | no | 1-10; defaults from per-binding policy. |
instance | no | Search profile id. Absent → agent's default. |
provider | no | Provider override: brave/tavily/duckduckgo/perplexity. |
freshness | no | Time window: day/week/month/year. |
country | no | ISO-3166 alpha-2. |
language | no | ISO-639-1. |
expand | no | v0.1.0 no-op; v0.2.0 follow-up. |
Per-binding policy fields (agents.yaml::inbound_bindings[].web_search):
| Field | Default | Effect |
|---|---|---|
enabled | false | Gate. False blocks all web_search calls on this binding (returns Denied). |
provider | "auto" | Default provider override. args.provider wins. |
default_count | 5 | Default count when LLM omits it. |
cache_ttl_secs | 600 | Per-router cache TTL hint. |
expand_default | false | Default expand arg. |
Admin RPCs
| Method | Params | Reply |
|---|---|---|
nexo/admin/web_search/bot_info | {} | plugin metadata + instance counts |
nexo/admin/web_search/cache_stats | {instance?} | per-instance status |
nexo/admin/web_search/cache_clear | {instance?} | placeholder (v0.2.0) |
nexo/admin/web_search/provider_status | {} | per-instance configured providers |
nexo/admin/web_search/list_instances | {} | full instances + by_agent + shared map |
Metrics
Prometheus exposition format via broker scrape
plugin.web_search.metrics.scrape. Daemon's /metrics
aggregator appends.
Source
github.com/lordmacu/nexo-rs-plugin-web-search
— crates.io: nexo-plugin-web-search 0.1.0.
Installing personas — nexo persona install
A persona pack bundles an out-of-tree agent definition (system prompt
- plugin bindings + workspace seed + secrets templates) that
operators install into their
nexodaemon. Distinct from a plugin (plugins register CODE; personas register CONFIG that consumes that code). Authored as a v2 manifest pack and published as a GitHub Release; the daemon resolves + downloads + verifies + extracts under the operator's configured search path.
v1 vs v2. The legacy
install.sh-driven flow (v1 manifest) stays supported for airgapped hosts + CI.nexo persona installonly consumes v2 manifests (manifest_version = 2); a v1 pack errors with a clear migration hint pointing atinstall.sh.
Quickstart
# Install the latest release of a persona from GitHub:
nexo persona install lordmacu/nexo-persona-cody
# Pin to a specific release tag:
nexo persona install lordmacu/nexo-persona-cody@v0.2.0
# JSON output for CI:
nexo persona install lordmacu/nexo-persona-cody --json
# List every installed persona:
nexo persona list
# Remove (with confirmation gate):
nexo persona remove cody # prints what WOULD be removed
nexo persona remove cody --yes # actually removes
Subcommands
| Command | Purpose |
|---|---|
nexo persona install <owner>/<repo>[@<tag>] | Resolve + verify + extract a v2 persona pack. |
nexo persona list | Walk every configured search path, render every installed persona. |
nexo persona remove <id> [--yes] | Atomic removal of the install dir for <id>. |
nexo persona get <id> | Print the full manifest + computed contributes paths for <id>. |
nexo persona upgrade <id> | Re-resolve the installed persona's source repo at latest + install if newer. Refuses to downgrade. |
nexo persona run <path> | Inner-loop dev: validate a local persona pack + boot the daemon with its parent dir prepended to personas.discovery.search_paths. Mirror of nexo plugin run. |
nexo persona help | Print the help text inline. |
Flags
install
| Flag | Default | Effect |
|---|---|---|
--dest <dir> | cfg.personas.discovery.search_paths[0] (or <state_dir>/personas/) | Override the install root. Must be absolute. |
--target <triple> | Daemon's host triple (NEXO_INSTALL_TARGET env wins) | Asset-matching target. Persona packs typically publish noarch only; the resolver falls back automatically. |
--json | off | Emit a JSON envelope instead of human-readable lines. CI-friendly. |
list
| Flag | Effect |
|---|---|
--json | JSON array under { "personas": [...] }. |
remove
| Flag | Effect |
|---|---|
--yes | Required — without it the command prints what it WOULD remove and exits 0. |
--json | Same JSON envelope as install. |
get
Prints id / version / description / homepage / install_root + every
contributes.agent_configs and contributes.plugin_configs_partial
path resolved to absolute. JSON variant returns the full manifest
sections (requires, meta) too — CI can grep specific fields
without re-parsing the on-disk TOML.
| Flag | Effect |
|---|---|
--json | Emit the typed manifest payload as JSON instead of human lines. |
upgrade
Inspects cfg.personas.discovery.search_paths, finds the installed
persona by id, extracts its source GitHub repo from
manifest.persona.homepage, hits the GitHub Releases API at
/releases/latest, and re-runs the install pipeline if the resolved
version is strictly newer than the on-disk one. Refuses to
downgrade (use nexo persona install <coords>@<tag> to pin if
intentional).
| Flag | Effect |
|---|---|
--json | Same JSON envelope as install. |
run
Inner-loop dev — point the daemon at a local persona pack without
going through the install + verify pipeline. Validates the path's
persona.toml, prepends the pack's parent dir to
cfg.personas.discovery.search_paths (so the boot-time F5
discovery picks it up as <parent>/<id>-<version>/), then falls
through to the daemon boot path.
# Develop a persona locally:
mkdir -p /tmp/dev/cody-0.99.0
$EDITOR /tmp/dev/cody-0.99.0/persona.toml
nexo persona run /tmp/dev/cody-0.99.0
| Flag | Effect |
|---|---|
--json | Emit the override payload as JSON before daemon boot starts streaming logs. |
Configuration — personas/discovery.yaml
Lives at <config_dir>/personas/discovery.yaml. Optional —
absent file means no scan happens (the daemon boots with an
empty persona catalog).
discovery:
search_paths:
- /var/lib/nexo/personas # default for system installs
- /home/operator/.nexo/personas # default for user installs
disabled: [] # ids skipped even when found
allowlist: [] # empty = accept any; non-empty = whitelist
The CLI consumes the same config: nexo persona list walks
search_paths and applies the disabled / allowlist filters.
Layout on disk
After a successful install, the pack lives under:
<install_root>/
<id>-<version>/
persona.toml
agents.d/
<agent>.yaml
plugins/
<plugin>.partial.yaml
secrets/
<secret>.txt.template
data/
workspace/...
The <id>-<version> shape mirrors the plugin install layout (Phase
31.1.b) so operators familiar with one immediately read the other.
Re-installing the same id+version short-circuits via the
idempotency check (no re-download, returns was_already_present: true).
Boot-time discovery
When the daemon starts, after plugins.start_all it walks
cfg.personas.discovery.search_paths, parses + validates every
<id>-<version>/persona.toml, applies the disabled / allowlist
filters, and registers each survivor in an in-memory persona
catalog. Discovery is best-effort: malformed / unparseable packs are
logged at WARN and skipped rather than aborting boot.
Kill switch — NEXO_DISABLE_BUNDLED_PERSONAS
Set to 1 / true / on to skip discovery entirely, regardless of
cfg.personas.discovery.search_paths. The daemon's in-memory
catalog stays empty; the CLI still works against the on-disk dirs
(it re-runs discovery itself).
export NEXO_DISABLE_BUNDLED_PERSONAS=1
nexo daemon
Useful for hardened deployments that want to refuse all out-of-tree
persona packs at the daemon level even when the search paths config
still references dirs. Surfaces in nexo doctor capabilities as a
Medium risk toggle (Phase F7 of cody-cli-install).
Wire shape — release JSON conventions
A v2 persona release on GitHub must publish these assets at the release tag:
| Asset | Required | Purpose |
|---|---|---|
persona.toml | yes | The v2 manifest. |
<id>-<version>-<target>.tar.gz OR <id>-<version>-noarch.tar.gz | yes (one of) | The pack tarball. noarch is the fallback when no per-target asset exists. |
<tarball>.sha256 | yes | Single line of lowercase hex (64 chars). |
<tarball>.sig + <tarball>.cert | optional | Cosign material — when both present, the resolver records them in the resolved entry (verification gates land in a follow-up wave). |
The naming convention mirrors nexo plugin install (Phase 31.1.c)
so a single CI workflow can publish both flavors with the same
tooling (cargo dist, gh release upload).
Errors
| Symptom | Cause | Fix |
|---|---|---|
release tag does not parse as semver | Tag uses release-1.2.3 or another non-semver shape. | Re-tag as vX.Y.Z. |
release is missing required asset persona.toml | The release JSON has no persona.toml asset. | Upload the manifest as a release asset matching the convention. |
persona id violates id regex | The manifest's [persona] id has uppercase / spaces / etc. | Rename to ^[a-z0-9][a-z0-9-]{2,63}$. |
v1 packs install via the persona's install.sh | Manifest declares manifest_version = 1. | Bump to 2 (no field-shape changes); same TOML re-parses. |
tar entry path contains ..; rejected for safety | Malicious / malformed tarball. | Re-pack ensuring every entry path is relative + traversal-free. |
persona install root must be an absolute path | --dest <relative>. | Pass an absolute path. |
Related
- Persona pack manifest schema (
persona.toml) — see the Cody pack README for the v2 manifest shape (a dedicated docs page is a TBD follow-up). - Plugin install (
nexo plugin install) — sister CLI surface; the persona installer reuses ~60 % of the resolve + download + sha256-verify plumbing. - Broker shapes — local vs.
NATS vs. embedded (orthogonal, but referenced by personas
declaring
[persona.requires] features).
Manifest (plugin.toml)
Every extension ships a plugin.toml at its root. It declares
identity, transport, capabilities, runtime requirements, and any
bundled MCP servers. The runtime parses and validates the manifest
before spawning anything.
Source: crates/extensions/src/manifest.rs.
Minimal example
[plugin]
id = "weather"
version = "0.1.0"
name = "Weather"
description = "Fetch weather by city name."
min_agent_version = "0.1.0"
priority = 0
[capabilities]
tools = ["get_weather"]
hooks = []
[transport]
type = "stdio"
command = "./weather"
args = []
[requires]
bins = ["curl"]
env = ["WEATHER_API_KEY"]
[context]
passthrough = false
[meta]
author = "you"
license = "MIT OR Apache-2.0"
Sections
[plugin]
| Field | Required | Purpose |
|---|---|---|
id | ✅ | Unique id. Regex ^[a-z][a-z0-9_-]*$, ≤ 64 chars. Must not be a reserved id (see below). |
version | ✅ | Semver. |
name | — | Human-readable label. |
description | — | ≤ 512 UTF-8 chars. |
min_agent_version | — | Semver. Checked against the running agent version at load time. |
priority | — | i32, default 0. Lower fires first in hook chains. |
Reserved ids: agent, browser, core, email, heartbeat,
memory, telegram, whatsapp. The host may register more via
register_reserved_ids().
[capabilities]
[capabilities]
tools = ["get_weather", "get_forecast"]
hooks = ["before_message", "after_tool_call"]
channels = []
providers = []
At least one capability list must be non-empty. Names match
^[a-z][a-z0-9_]*$, ≤ 64 chars, no duplicates.
[transport]
One of three forms:
# stdio — spawn a child process
[transport]
type = "stdio"
command = "./my-extension"
args = ["--verbose"]
# nats — talk over a NATS subject prefix
[transport]
type = "nats"
subject_prefix = "ext.myext"
# http — call over HTTP
[transport]
type = "http"
url = "https://localhost:8080"
Validation: command, subject_prefix, url non-empty; url must
be http(s)://.
[requires]
[requires]
bins = ["ffmpeg", "imagemagick"]
env = ["OPENAI_API_KEY"]
Declarative preconditions used for gating: when the runtime
discovers the extension, it calls Requires::missing(). If any
bins is not on $PATH or any env is unset, the extension is
skipped (warn, not fail) and its tools are not registered.
[context]
[context]
passthrough = true
When true, every tool call sent to this extension has
_meta = { agent_id, session_id } injected into the JSON args. Lets
the extension tell calls apart per-agent without the runtime having
to encode the split into every tool signature.
[mcp_servers] (phase 12.7)
Inline MCP server declarations bundled with the extension:
[mcp_servers.gmail]
type = "stdio"
command = "./gmail-mcp"
args = []
[mcp_servers.calendar]
type = "streamable_http"
url = "https://mcp.example.com/calendar"
Each server name must match ^[a-z][a-z0-9_-]*$, ≤ 32 chars. Alternatively, drop a sidecar .mcp.json next to plugin.toml if the
manifest has no [mcp_servers] section.
Validation at a glance
flowchart TD
READ[read plugin.toml] --> PARSE[parse TOML]
PARSE --> ID{id valid?<br/>regex + length<br/>+ not reserved}
ID --> VER{version<br/>valid semver?}
VER --> MIN{min_agent_version<br/>satisfied?}
MIN --> CAPS{at least one<br/>capability declared?}
CAPS --> NAMES{capability names<br/>valid + unique?}
NAMES --> TRANS{transport<br/>non-empty +<br/>http scheme valid?}
TRANS --> MCP{mcp_server names<br/>valid?}
MCP --> OK([Manifest accepted])
ID --> FAIL([Diagnostic: Error])
VER --> FAIL
MIN --> FAIL
CAPS --> FAIL
NAMES --> FAIL
TRANS --> FAIL
MCP --> FAIL
Any failure produces a DiagnosticLevel::Error in the discovery
report — the candidate is dropped but scanning continues so an
operator sees every broken manifest at once.
Agent-version gating
[plugin]
min_agent_version = "0.2.0"
On load the runtime compares against the agent build version. A
mismatch logs a diagnostic and drops the candidate. Useful for
shipping a manifest that relies on a newer host API without
crash-looping older deployments. The host can override the reported
version for tests via set_agent_version().
Next
- Discovery and NATS runtime — how the manifest drives spawn
- CLI —
agent ext validate <path>checks a manifest without touching the registry - Templates — prebuilt skeletons to copy
Extension patterns
Common shapes for nexo extensions. An extension is a self-contained
directory with a manifest.toml that declares contributed tools,
advisors, skills, MCP servers, channel adapters, and config schemas.
Operators install with nexo ext install ./your-extension.
Pick the closest match; copy the skeleton; modify.
Pattern 1 · Tool bundle
When to use · You have 3-10 related tools (e.g. CRM ops:
crm_lookup,crm_create_contact,crm_update_deal,crm_close_deal) and you want to ship them as a unit.
A tool bundle is the simplest extension. Each tool gets its own
JSON schema + handler binary (or in-process Rust function). The
manifest enumerates them; the daemon registers all on
nexo ext install.
[extension]
id = "crm-tools"
version = "0.2.0"
description = "Salesforce-style CRM operations"
[[tools]]
name = "crm_lookup"
schema_path = "tools/crm_lookup.json"
binary = "./bin/crm-tools"
[[tools]]
name = "crm_create_contact"
schema_path = "tools/crm_create_contact.json"
binary = "./bin/crm-tools"
[[tools]]
name = "crm_close_deal"
schema_path = "tools/crm_close_deal.json"
binary = "./bin/crm-tools"
The binary is a single executable that dispatches by tool name.
Operators add the tool names to agents.yaml once installed.
Pattern 2 · Advisor pack
When to use · You're shipping domain-specific personas (sales, legal-review, customer-support escalation) that other operators can drop into their agents.
Each advisor is a markdown system-prompt file the agent prepends to its base persona when handling specific topics. Bundle 3-8 together for a vertical.
[extension]
id = "sales-advisor-pack"
version = "0.1.0"
description = "BANT-style qualification + handoff prompts"
[[advisors]]
id = "bant-qualifier"
prompt_path = "advisors/bant_qualifier.md"
[[advisors]]
id = "objection-handler"
prompt_path = "advisors/objection_handler.md"
[[advisors]]
id = "demo-booker"
prompt_path = "advisors/demo_booker.md"
advisors/bant_qualifier.md:
You are a BANT-trained sales qualifier. For every inbound message,
internally score:
- Budget: ...
- Authority: ...
- Need: ...
- Timeline: ...
Only progress to demo-booker advisor when score >= 70.
Pattern 3 · Skill bundle
When to use · You have multi-step workflows (
send-quote,escalate-to-human,handoff-to-team) that aren't single LLM turns — they need scripted sequences with branching.
Skills are YAML-defined workflows the agent can invoke. Multi-step with conditionals and tool calls. The extension ships YAML + referenced templates.
[extension]
id = "support-skills"
version = "0.3.1"
[[skills]]
id = "escalate-to-human"
yaml_path = "skills/escalate.yaml"
[[skills]]
id = "schedule-followup"
yaml_path = "skills/followup.yaml"
skills/escalate.yaml:
id: escalate-to-human
description: "Hand off to a human on a Telegram channel"
steps:
- tool: format_transcript
args: { last_n: 10 }
- tool: telegram_post
args:
channel: ${ESCALATION_CHANNEL}
message: |
⚠ Escalation request from ${user_id}
Summary: ${summary}
Transcript: ${transcript_url}
- reply: "Te conecto con un agente humano. Te responderá pronto."
Pattern 4 · MCP server bundle
When to use · You're wrapping an external service as an MCP server so multiple agents can use it.
The extension ships a binary that speaks MCP (stdio or HTTP+SSE). Operators register the MCP server via the manifest; agents see its tools as native ones.
[extension]
id = "github-mcp"
version = "1.0.0"
[[mcp_servers]]
id = "github"
command = "./bin/github-mcp"
transport = "stdio"
env_passthrough = ["GITHUB_TOKEN"]
→ See Building an MCP server extension for the full walkthrough.
Pattern 5 · Multi-tenant SaaS extension
When to use · You're building a vertical SaaS (sales / support / marketing) where each tenant gets the same toolkit but isolated state, scoped credentials, per-tenant audit logs.
The extension declares multi_tenant.isolated_state = true. The
framework partitions tool state, credentials, and skill output
per tenant_id. Agents bound to a tenant only see that tenant's
data.
[extension]
id = "sales-saas"
version = "1.2.0"
[[tools]]
name = "crm_lookup"
schema_path = "tools/crm_lookup.json"
[[advisors]]
id = "bant-qualifier"
prompt_path = "advisors/bant.md"
[multi_tenant]
isolated_state = true # state stored under tenant scope
per_tenant_secrets = true # secrets resolved per tenant
audit_per_tenant = true # audit log scoped per tenant
[multi_tenant.quotas]
default = { llm_tokens_month = 1_000_000, agents = 3 }
The microapp layer (above) provisions tenants + assigns this extension to them via admin RPC.
Pattern 6 · Channel adapter pack
When to use · You're contributing a new channel kind that's not a subprocess plugin (e.g. a stdlib-friendly one that fits inline as a daemon module).
The extension declares a channel adapter implementation. The
framework registers it with the channel registry; agents reference
it via channels: [<kind>:<instance>] in agents.yaml.
[extension]
id = "discord-channel"
version = "0.1.0"
[[channel_adapters]]
kind = "discord"
adapter_module = "discord_adapter" # rust crate path or shared lib
config_schema_path = "discord_config.json"
Most channels ship as plugins (subprocess), not extensions. Use this pattern only when the adapter must run in-process for performance or to share daemon state directly.
Pattern 7 · Config schema extension
When to use · You want to expose a new YAML config block that operators set in
agents.yamlor a new file underconfig/.
The extension declares a JSON Schema for the new config; the
daemon merges it into nexo doctor config validation and
nexo agent doctor reports.
[extension]
id = "billing-config"
version = "0.1.0"
[[config_schemas]]
section = "billing"
schema_path = "billing.schema.json"
yaml_files = ["billing.yaml"]
Operator's config/billing.yaml:
billing:
provider: stripe
webhook_secret: ${STRIPE_WEBHOOK_SECRET}
default_plan: pro
nexo doctor config will validate the file against your schema.
Pattern 8 · Knowledge-base loader
When to use · You're shipping a curated KB (FAQs, runbooks, playbooks) that should land in the operator's vector store.
The extension ships markdown / JSON documents + a kb_loader
hook that imports them into the configured vector store on
install.
[extension]
id = "support-kb"
version = "1.0.0"
[[kb_collections]]
id = "support-faqs"
loader = "./bin/load-faqs" # binary that reads docs/ and emits chunks
docs_dir = "docs/"
embedding_model = "minimax-embed"
The loader runs once at install time + re-runs whenever the
operator updates the extension version. Output lands in
<state_dir>/<tenant_id>/vector/support-faqs/.
Choosing between patterns
| If you... | Use |
|---|---|
| Have related tools to ship together | Tool bundle (1) |
| Have domain-specific persona prompts | Advisor pack (2) |
| Have multi-step scripted workflows | Skill bundle (3) |
| Wrap an external service as MCP | MCP server bundle (4) |
| Build a vertical SaaS | Multi-tenant SaaS (5) |
| Add a new in-process channel kind | Channel adapter (6) |
| Add a new config section | Config schema (7) |
| Ship a curated knowledge base | KB loader (8) |
Plugin vs Extension — quick decision
If you find yourself between Plugin and Extension:
- Choose Plugin when: the work is a separate process, runs in a non-Rust language, or interacts with an external service that has its own connection lifecycle (WebSocket, gateway, push).
- Choose Extension when: the work is in-process Rust, ships with curated assets (advisors / skills / KBs), or needs tight multi-tenant state isolation.
See also
- Manifest reference — full TOML schema.
- Templates — copy-and-modify starters.
- CLI —
nexo ext install/list/doctor. - Multi-tenant SaaS guide — full walkthrough of pattern 5.
Templates
The repo ships two extension templates as starting points. Copy one, rename it, fill in the tools, done.
Location: extensions/template-rust/ and extensions/template-python/.
What's shared
Both templates follow the same wire protocol and directory shape:
<your-ext>/
├── plugin.toml # manifest (see ./manifest.md)
├── README.md # what the extension does
├── <binary or script> # stdio-RPC entry point
└── ... # build files specific to the language
The agent talks to both in the same JSON-RPC 2.0 shape:
initialize— handshake; returns{server_version, tools, hooks}tools/<name>— tool invocation; returns the tool's resulthooks/<name>— hook invocation (when any hook is declared)
Line-delimited JSON over stdin/stdout. stderr is forwarded to the agent's tracing output — that's your debug log.
Rust template (extensions/template-rust/)
Standalone Cargo project outside the agent workspace — its own
Cargo.toml, own Cargo.lock, own target/. Keeps your extension's
deps independent of the agent's.
template-rust/
├── Cargo.toml
├── Cargo.lock
├── plugin.toml
├── README.md
├── src/
│ └── main.rs # JSON-RPC loop
└── target/ # (gitignore)
src/main.rs implements:
#![allow(unused)] fn main() { // pseudocode loop { let line = read_line_from_stdin(); let req: JsonRpcRequest = parse(line); let result = match req.method.as_str() { "initialize" => handshake_info(), "tools/ping" => ping(req.params), "tools/add" => add(req.params), "hooks/before_message" => pass(), _ => method_not_found(), }; write_line_to_stdout(json!({ "jsonrpc": "2.0", "id": req.id, "result": result })); } }
Build with cargo build --release; the release binary at
./target/release/template-rust is what plugin.toml::transport.command
points at.
Python template (extensions/template-python/)
template-python/
├── plugin.toml
├── main.py # #!/usr/bin/env python3
└── README.md
stdlib only (no pip install). Same JSON-RPC loop over stdin/stdout.
Logs to stderr via print(..., file=sys.stderr).
Good for quick extensions where starting a Python interpreter per tool call is acceptable (batch workloads, cron-ish tasks, one-off scripting).
Promoting a template to your own extension
flowchart LR
COPY[copy template-rust<br/>to my-extension] --> EDIT[edit plugin.toml<br/>id, version, tools]
EDIT --> CODE[implement tools/...]
CODE --> BUILD[cargo build --release]
BUILD --> VAL[agent ext validate<br/>./my-extension/plugin.toml]
VAL --> INSTALL[agent ext install<br/>./my-extension --link --enable]
INSTALL --> DOCTOR[agent ext doctor<br/>--runtime]
Conventions in the shipped templates
plugin.tomldeclares the minimum required capabilities — no phantom hooks or toolsrequires.bins/requires.envleft empty; add your own[context] passthrough = false— opt in explicitly when you need per-agent / per-session state- License left blank — pick one and add it to
[meta]
Gotchas
- Rust template builds in its own workspace. Don't
cargo addfrom the repo root — that edits the agent workspace, not the extension. - Python template spawns a new interpreter per extension, not per
tool call. Stdin/stdout stay open for the life of the process.
Don't
exitafter one tool call. - JSON-RPC ids must echo back. If your handler drops the
idfield, the agent can't correlate the reply.
CLI (agent ext)
Operator-facing commands for discovering, installing, validating, and
toggling extensions. Every subcommand accepts --json for scripting.
Source: crates/extensions/src/cli/.
Subcommands
agent ext list [--json]
agent ext info <id> [--json]
agent ext enable <id>
agent ext disable <id>
agent ext validate <path>
agent ext doctor [--runtime] [--json]
agent ext install <path> [--update] [--enable] [--dry-run] [--link] [--json]
agent ext uninstall <id> --yes [--json]
list — discovered extensions
Walks the configured search_paths, prints each candidate, its
transport, and its enabled/disabled state.
info <id> — manifest + status
Prints the full parsed manifest, the runtime state if the agent is currently running, and any diagnostics attached to the candidate.
enable / disable — toggle in extensions.yaml
Rewrites the disabled list in config/extensions.yaml:
extensions:
disabled: [weather]
No runtime side effect; operator must restart the agent to apply.
validate <path> — manifest check without registering
Parses and validates a plugin.toml at <path>. Good for CI checks
on an extension's manifest before shipping.
doctor — preflight checks
Runs the same Requires::missing() logic as discovery, plus
transport-specific checks:
flowchart TB
START([agent ext doctor]) --> DISC[discover candidates]
DISC --> REQ[check requires.bins + requires.env]
REQ --> RUNT{--runtime?}
RUNT -->|yes| SPAWN[spawn each stdio extension<br/>and handshake]
RUNT -->|no| DONE([report table])
SPAWN --> DONE
--runtime actually spawns each stdio extension and runs the
handshake — useful to catch a broken binary before production
boot.
install <path> — copy or symlink
Adds an extension to the active search_paths:
agent ext install ./extensions/weather
agent ext install /abs/path/to/my-ext --link --enable
--updatereplaces an existing extension with the same id--enableadds it toextensions.yamlenabled (default: disabled until youenable)--dry-runprints what would happen without writing--linkcreates a symlink instead of copying — requires an absolute source path. Good for dev loops.
uninstall <id> --yes
Removes the extension's directory from the active search path (or the
symlink, in --link installs). --yes is mandatory — no accidental
destruction.
Exit codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Extension not found / --update target missing |
| 2 | Invalid manifest / invalid source / --link needs absolute path |
| 3 | Config write failed |
| 4 | Invalid id (reserved or empty) |
| 5 | Target exists (use --update) |
| 6 | Id collision across roots |
| 7 | uninstall missing --yes confirmation |
| 8 | Copy / atomic swap failed |
| 9 | Runtime check(s) failed (doctor --runtime) |
Non-zero codes are stable for scripting.
JSON mode
Every subcommand that produces human output also supports --json
for machine consumption. Fields are stable per code-phase; schema is
not officially frozen yet — pin to a specific agent version in CI.
Common ops flows
Ship an extension to staging
agent ext validate ./my-ext/plugin.toml
agent ext install ./my-ext --link --enable
agent ext doctor --runtime
Disable a flapping extension without redeploying
agent ext disable weather # writes to extensions.yaml
systemctl reload agent # or restart, depending on deployment
CI gate
# .github/workflows/extension.yml
- run: cargo build --release
- run: agent ext validate ./plugin.toml
Building a multi-tenant SaaS microapp (Phase 82 walkthrough)
This page connects the dots across Phase 82's primitives so a microapp author can ship a multi-tenant SaaS extension without re-deriving the architecture from each sub-phase doc. Every section maps directly to a primitive that's already built; the work is wiring them together for your specific shape.
What you get from Phase 82
| Primitive | Doc |
|---|---|
BindingContext propagation (per-call agent + binding identity) | agents.md |
| Webhook receiver (single HTTP entry, YAML-routed to NATS) | ops/webhook-receiver.md |
Outbound dispatch from extension (nexo/dispatch) | extensions/stdio.md |
| NATS event subject → agent turn binding | config/agents.md |
| Per-binding tool rate-limit | ops/per-binding-rate-limits.md |
| Per-extension state directory | extensions/state-management.md |
| Multi-tenant audit log filter (Phase 82.8) | inline below |
| Admin RPC (CRUD agents/credentials/pairing/llm/channels) | microapps/admin-rpc.md |
| Agent events firehose | microapps/admin-rpc.md |
| HTTP server capability | microapps/admin-rpc.md |
| Operator chat takeover | microapps/admin-rpc.md |
| Agent escalation | microapps/admin-rpc.md |
Reference scaffold
agent-creator is the reference SaaS-shaped microapp (out-of-tree
repo: see your operator's microapp registry for the URL). It uses
every primitive in this list and is the recommended starting
point for clone-and-adapt. The rest of this page assumes you've
checked it out alongside the daemon source.
Tenant onboarding flow
- Operator creates a row in your microapp's
tenantstable (seemigrations/0001_tenants.sql). Each tenant carries anaccount_id: TEXT PRIMARY KEYthat becomes the cross-cutting identifier through:BindingContext.account_idon every inbound + tool callgoal_turns.account_idfor audit isolation (Phase 82.8)ProcessingScope::Conversation { account_id, … }for pause/resume (Phase 82.13)EscalationEntry { agent_id, scope, … }wherescopecarries theaccount_id(Phase 82.14)
- The microapp creates per-tenant artifacts under
state_dir_for(extension_id)/tenants/<account_id>/:~/.nexo/extensions/agent-creator/state/tenants/acme/ ├── leads.sqlite ├── opt_outs.sqlite └── credentials.json # encrypted at rest - Operator binds the tenant to a channel via
nexo/admin/credentials/register(Phase 82.10.d) — the same bearer token gets both the channel's outbound write capability AND the per-tenant audit scope.
Channel binding
agents.yaml.<id>.inbound_bindings lists which channels the
agent answers. Each binding inherits the tenant's account_id
via the channel plugin's inbound shape (Phase 82.5
InboundMessageMeta). Provider plugins (whatsapp, telegram,
email, slack-mcp) are responsible for stamping account_id
onto the inbound — this is what threads tenancy through to
the audit log + rate-limit buckets + escalation scopes.
Credential vault pattern
Credentials are filesystem-backed (Phase 82.10.h.3
FilesystemCredentialStore):
secrets/<channel>/<instance>/payload.json
For multi-tenant, use <instance> = <account_id> so the
operator UI can rotate one tenant's bearer without touching
others. The Phase 82.12 token_hash helper lets the daemon
notify a microapp of rotation without putting the cleartext
old token on the wire.
Drip scheduler (or whatever cron-like flow you need)
Phase 82.4 + 82.4.b ships the NATS event subscriber runtime — extensions subscribe to a NATS subject and the daemon binds each event to an agent turn. For a per-tenant drip:
- Microapp publishes
marketing.drip.fire.<account_id>on NATS at the cron tick. agents.yaml.<agent_id>.event_subscribersincludesmarketing.drip.fire.*(glob).- Per-binding rate-limit (Phase 82.7,
tool_rate_limits.<binding_id>.send_drip = 10/min) caps the per-tenant outbound velocity so a runaway tenant doesn't starve the others.
Compliance hooks
- Redactor (Phase 10.4) runs inside
TranscriptWriter::append_entryBEFORE persistence. Body bytes that hit disk are already redacted; the firehose emits the same redacted body. Microapps don't have to implement their own redaction — operator config intranscripts.yamlis the single point of control. - Audit retention (Phase 82.10.h.1) — operators set
NEXO_MICROAPP_ADMIN_AUDIT_RETENTION_DAYS/NEXO_MICROAPP_ADMIN_AUDIT_MAX_ROWS. Boot sweep enforces both. - Operator takeover (Phase 82.13) — pause a single
conversation with
nexo/admin/processing/pause; agent goes silent while operator types a manual reply vianexo/admin/processing/intervention. Compliance teams use this for high-risk tenants.
Audit queries
For per-tenant billing / support, query the audit log scoped to one tenant:
#![allow(unused)] fn main() { use nexo_agent_registry::SqliteTurnLogStore; use chrono::{Duration, Utc}; let rows = store .tail_for_account("acme", Utc::now() - Duration::days(30), 500) .await?; }
The store filters strictly by account_id and excludes legacy
NULL rows. Cross-tenant probes return an empty list (not an
error) — defense in depth against existence oracles. Operator-
scoped tools (tail, tail_since) keep returning every row
including legacy NULL.
For admin RPC audit (Phase 82.10.h SQLite writer):
nexo microapp admin audit tail \
--microapp-id agent-creator \
--since-mins 60 \
--format json | jq '.[] | select(.method | startswith("nexo/admin/agents/"))'
Live event firehose
Microapps that need a real-time UI (chat, dashboard) hold the
transcripts_subscribe capability and receive
nexo/notify/agent_event notifications on their stdio. The
boot subscriber loop (Phase 82.11) handles fan-out, lag
recovery, and per-microapp filtering — the microapp just
reads JSON-RPC frames as they arrive. See
microapps/admin-rpc.md
for the wire shape.
Going to production
- Ship the microapp binary alongside its
plugin.toml. - Operator drops it into
extensions/<id>/and runsnexo ext install <path>. - Operator grants capabilities in
extensions.yaml.entries.<id>.capabilities_grant. Common shape for a multi-tenant chat SaaS:extensions: entries: agent-creator: capabilities_grant: - agents_crud - credentials_crud - pairing_initiate - llm_keys_crud - transcripts_read - transcripts_subscribe - operator_intervention - escalations_read - escalations_resolve - Operator runs
nexo doctor capabilitiesto confirm every INVENTORY toggle is on. - Boot — the daemon validates the grants, spawns the microapp, threads the admin RPC dispatcher into the extension's stdio, and starts the firehose subscribe tasks for every microapp that holds the capability.
What's NOT in v0
These are framework-supported but not wired in main.rs yet
(see FOLLOWUPS.md
under the 82.x sections):
- Pairing notifier wire — microapps poll
pairing/statusinstead of receiving livepairing_status_changedframes. EventForwarderthreadaccount_idfromBindingContexton live writes (audit reader is correct; the writer always emitsNonetoday).escalate_to_humanbuilt-in tool registration in ToolRegistry — microapps that want escalations today have to call the admin RPC directly.processing_state_changed/escalation_requested/escalation_resolvedevent variants on the firehose.
All of these are framework-level deferreds, not microapp-level work. They land in the same boot-order refactor that's tracked across the FOLLOWUPS entries.
Per-extension state directory (Phase 82.6)
Extensions need a stable place to put SQLite databases, vault files, and per-tenant artifacts. Phase 82.6 formalises the convention and ships a CLI helper so authors and operators agree on the path layout.
Canonical path
$NEXO_HOME/extensions/<extension-id>/state/
NEXO_HOME falls back to $HOME/.nexo when unset, then to
the current working directory if even $HOME is missing
(rare; covers minimal CI containers).
For an extension agent-creator on a typical install:
~/.nexo/extensions/agent-creator/state/
CLI
# Print the path (no filesystem touch).
nexo ext state-dir agent-creator
# /home/operator/.nexo/extensions/agent-creator/state
# Create the directory if missing (idempotent).
nexo ext state-dir agent-creator --ensure
Operators pipe the output into cd, sqlite3 .backup, etc.
The base form is pure path resolution — useful in scripts that
want to compute paths without side effects. --ensure is the
moral equivalent of mkdir -p.
Programmatic access
nexo-extensions exposes:
#![allow(unused)] fn main() { use nexo_extensions::{ensure_state_dir, state_dir_for}; // Compute the path without touching disk. let path = state_dir_for("agent-creator"); // Materialise it (idempotent). let path = ensure_state_dir("agent-creator")?; }
The daemon calls ensure_state_dir at extension first spawn so
microapps can rely on the directory existing by the time their
initialize handshake runs. The path is also exposed via the
NEXO_EXTENSION_STATE_ROOT env var injected into the
extension's process environment (constant
EXTENSION_STATE_ROOT_ENV in the same module).
Backup procedure
The state dir is a regular filesystem location — operators back it up with the same tooling they use for other on-disk state:
# Whole-extension snapshot.
tar czf agent-creator-state-$(date +%F).tgz \
-C "$(nexo ext state-dir agent-creator)" .
# SQLite-aware online backup (preferred for live DBs).
sqlite3 "$(nexo ext state-dir agent-creator)/db.sqlite" \
".backup '/var/backups/agent-creator-$(date +%F).db'"
Isolation
Each extension owns its own subtree. nexo does not enforce
namespacing inside state/ — that's the extension's
responsibility. v1 microapps that store per-tenant artifacts
typically sub-divide as state/tenants/<tenant-id>/…. The
framework treats the whole subtree as opaque.
Getting started: build a microapp in 1 hour
This walks the first hour of building a nexo microapp end to end. Goal: by the end of this page you have a working hello-world microapp running against a local nexo daemon, with one tool the LLM can call.
For the language-agnostic protocol spec, see
contract.md. For the full Rust SDK reference,
see rust.md. For a complete, shipping example —
React UI + HTTP backend over the admin RPC + firehose SSE,
consuming the @lordmacu/nexo-microapp-ui-react theme preset —
see lordmacu/agent-creator-microapp
and its write-up in the agent-creator reference microapp.
Prerequisites
✅ Rust 1.80+ (`rustup default stable`)
✅ The `template-microapp-rust/` directory (from a `git clone` of
nexo-rs, or copied out — it depends on `nexo-microapp-sdk` from
crates.io, so the copy builds standalone)
✅ A configured nexo daemon (one agent, one channel binding)
You don't need crates.io publish keys, npm, or a CI pipeline. Local files only.
Step 1 — copy the template (5 min)
# From your work directory (a `git clone` of nexo-rs gives you the
# template under extensions/):
cp -r /path/to/nexo-rs/extensions/template-microapp-rust ./mi-microapp
cd ./mi-microapp
# Rename inside Cargo.toml + plugin.toml + src/main.rs:
sed -i 's/template-microapp-rust/mi-microapp/g' Cargo.toml plugin.toml src/main.rs
git init && git add -A && git commit -m "scaffold from nexo template"
# Sanity-check it builds (no path-dep surgery needed — the SDK
# resolves from crates.io):
cargo build
Now you have:
mi-microapp/
├── Cargo.toml # depends on nexo-microapp-sdk = "0.1" (crates.io)
├── plugin.toml # capabilities + transport declaration
├── README.md # rename checklist + porting guide
└── src/main.rs # ~100 LOC including comments
Step 2 — write your first tool (15 min)
Open src/main.rs. Replace the greet_tool body with your
domain logic:
#![allow(unused)] fn main() { async fn buscar_cliente(args: Value, ctx: ToolCtx) -> Result<ToolReply, ToolError> { let phone = args .get("phone") .and_then(|v| v.as_str()) .ok_or_else(|| ToolError::wire("phone required"))?; // BindingContext threads the agent + channel + account // (Phase 82.1) through every call. let agent = ctx.binding().map(|b| b.agent_id.clone()).unwrap_or_default(); Ok(ToolReply::ok_json(json!({ "agent": agent, "phone": phone, "found": false, "lead_id": null, }))) } }
Register it in main():
#![allow(unused)] fn main() { let app = Microapp::new("mi-microapp", env!("CARGO_PKG_VERSION")) .with_tool("mi_microapp_buscar_cliente", buscar_cliente); }
Build:
cargo build --release
The binary lands in ./target/release/mi-microapp.
Step 3 — smoke test the wire (5 min)
The microapp speaks line-delimited JSON-RPC over stdio. You can exercise it without the daemon:
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' \
| ./target/release/mi-microapp
Expected output (one line, JSON):
{"jsonrpc":"2.0","id":1,"result":{
"tools":["mi_microapp_buscar_cliente"],
"hooks":["before_message"],
"server_info":{"name":"mi-microapp","version":"0.1.0"}
}}
tools/call works the same way:
printf '%s\n%s\n' \
'{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' \
'{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"mi_microapp_buscar_cliente","arguments":{"phone":"+57311"}}}' \
| ./target/release/mi-microapp
If both calls return clean JSON, your microapp speaks the contract.
Step 4 — install into the daemon (15 min)
Copy the build artifact + plugin.toml into the daemon's
extensions/ directory:
mkdir -p ~/.nexo/extensions/mi-microapp
cp target/release/mi-microapp ~/.nexo/extensions/mi-microapp/
cp plugin.toml ~/.nexo/extensions/mi-microapp/
Reference the microapp from ~/.nexo/config/extensions.yaml:
extensions:
entries:
mi-microapp:
enabled: true
capabilities_grant:
- dispatch_outbound # if your tools call nexo/dispatch
# add more as your microapp needs them
Reference its tool from ~/.nexo/config/agents.yaml:
agents:
- id: ana
extensions: [mi-microapp]
allowed_tools:
- mi_microapp_buscar_cliente # appears in the LLM tool catalogue
Restart the daemon:
nexo daemon restart
# or for dev: kill the process and re-run `nexo daemon start`
Step 5 — verify the LLM sees your tool (10 min)
Send a test message through your bound channel. The LLM should
see mi_microapp_buscar_cliente in its tool catalogue and call
it on relevant prompts.
Check the daemon logs:
nexo logs --tail | grep mi-microapp
You should see:
extensions: spawned mi-microapp pid=...extensions: mi-microapp -> initialize oktools/call mi_microapp_buscar_cliente {"phone": "..."}
If the tool is being called but the LLM doesn't surface it
correctly, the prompt may not have descriptions rich enough —
add a description to your tool registration.
Step 6 — add per-agent config (10 min)
Different agents may need different microapp behaviour. Use
Phase 83.1 (see proyecto/PHASES.md) extensions_config:
agents:
- id: ana
extensions: [mi-microapp]
extensions_config:
mi-microapp:
regional: bogota
api_token_env: ANA_ETB_TOKEN
- id: maria
extensions: [mi-microapp]
extensions_config:
mi-microapp:
regional: cali
api_token_env: MARIA_ETB_TOKEN
In your handler, the BindingContext.agent_id lets you key
into a per-agent config map you build at initialize time.
Until 83.1.b ships the JSON-RPC propagation, the operator can
also pass the config via env vars and your microapp reads them
on boot.
Common patterns
Multi-tenant SaaS
You're shipping a single microapp binary that serves multiple
tenants. See extensions/multi-tenant-saas.md.
Key idea: every tool call carries BindingContext.account_id
(Phase 82.1) — key your per-tenant SQLite tables on it.
Compliance enforcement
Drop in nexo-compliance-primitives
to anti-loop / anti-manipulation / opt-out / PII-redact / rate
limit / consent track. Wire each primitive into a Phase 83.3
hook that votes Block or Transform before the LLM sees
the inbound.
Outbound dispatch
Need your microapp to send a WhatsApp / Telegram / email reply?
Use the nexo-microapp-sdk outbound feature:
[dependencies]
nexo-microapp-sdk = { path = "...", features = ["outbound"] }
Then ctx.outbound().dispatch(...) from inside any tool
handler. See extensions/stdio.md.
Troubleshooting
| Symptom | Fix |
|---|---|
extensions: mi-microapp -> initialize timed out | Microapp didn't reply within 30 s. Check stderr; missing tokio runtime is the most common cause. |
tool 'mi_microapp_x' not in catalogue | Tool name missing the <extension_id>_ prefix. Daemon enforces the namespacing. |
capability denied: dispatch_outbound | Operator forgot to add the capability to extensions.yaml.entries.<id>.capabilities_grant. |
404 unknown method: hooks/before_message | The hook name in your with_hook(...) call doesn't match a daemon-emitted hook. Check crates/extensions/src/runtime/mod.rs::HOOK_NAMES. |
Build fails: nexo-microapp-sdk = "0.1" not found | SDK isn't on crates.io yet (Phase 83.14). Use path = "..." against your nexo-rs checkout. |
Next steps
You have a working microapp. Now:
- Read contract.md end-to-end — the wire spec is short, and every detail matters for compat.
- Read rust.md for the full SDK reference.
- For multi-tenant SaaS: extensions/multi-tenant-saas.md.
- For compliance gating: pull in
nexo-compliance-primitivesand wire its primitives into your Phase 83.3 hooks.
Microapp patterns
Common shapes for nexo microapps. A microapp is a complete product that consumes nexo-rs as its agent runtime — your microapp owns the UI, the multi-tenant story, the billing; the framework runs out of view.
Microapps talk to the framework over admin RPC over NATS — provision tenants, configure agents, manage knowledge bases, rotate API keys.
Pattern 1 · Single-tenant deploy
When to use · You're building an internal tool for one team or one company. Multi-tenancy is overkill.
Microapp configures one tenant at boot, never creates more. Used for: an internal sales bot, a personal AI assistant, a single-org customer-support system.
#![allow(unused)] fn main() { use nexo_microapp_sdk::admin::{AdminClient, TenantSpec, AgentSpec}; let admin = AdminClient::connect("nats://localhost:4222").await?; // Bootstrap on first run; idempotent on subsequent boots. admin.ensure_tenant(TenantSpec { id: "default".into(), plan: "internal".into(), quotas: Quotas::unlimited(), }).await?; admin.ensure_agent("default", AgentSpec { id: "ana".into(), persona_path: "./personas/ana.md".into(), channels: vec!["whatsapp:internal".into()], llm: "minimax-m2.5".into(), }).await?; }
Microapp's UI is a thin admin panel. Most config lives in YAML; microapp tweaks runtime knobs.
Pattern 2 · Multi-tenant SaaS
When to use · You're selling to multiple customers. Each gets isolated state, their own agents, their own KB.
Microapp creates a tenant per signup. The framework partitions
state per tenant_id. Microapp owns the auth / billing / UI;
framework runs the agent loop.
#![allow(unused)] fn main() { async fn handle_signup(req: SignupRequest, admin: &AdminClient) -> Result<TenantId> { let tenant_id = format!("client-{}", uuid::Uuid::new_v4()); admin.create_tenant(TenantSpec { id: tenant_id.clone(), plan: req.plan, quotas: quotas_for_plan(&req.plan), }).await?; // Provision the customer's first agent. admin.create_agent(&tenant_id, AgentSpec { id: "default-agent".into(), persona_path: req.persona.unwrap_or_else(default_persona), channels: vec![], // customer pairs channels via UI later llm: "minimax-m2.5".into(), }).await?; Ok(tenant_id) } }
agent-creator-microapp (the reference implementation) is built
exactly this way — every signup gets a tenant, end-users build
their own WhatsApp agents through a WhatsApp-Web-style UI.
→ agent-creator reference
→ Multi-tenant SaaS guide
Pattern 3 · BYO-UI
When to use · You're building a SaaS but want full control over the user-facing interface (custom React app, mobile app, Tauri desktop).
Microapp exposes its own HTTP / GraphQL / gRPC API. The frontend calls the microapp; the microapp calls the framework via admin RPC. The framework never serves UI directly.
// React frontend
async function pairWhatsApp(agentId: string): Promise<{ qr: string }> {
return fetch(`/api/agents/${agentId}/whatsapp/pair`, { method: "POST" })
.then(r => r.json());
}
#![allow(unused)] fn main() { // Microapp backend (Rust + Axum) async fn pair_whatsapp( State(admin): State<AdminClient>, Path(agent_id): Path<String>, auth: AuthSession, // resolves tenant_id ) -> Json<PairQrResponse> { let qr = admin.pair_channel( &auth.tenant_id, &agent_id, ChannelKind::Whatsapp, ).await.unwrap(); Json(PairQrResponse { qr }) } }
The microapp can be in any language — Rust, Python, TypeScript, PHP, Go — as long as it speaks NATS to the framework.
Pattern 4 · Knowledge-as-a-Service
When to use · Customers upload documents (PDFs, MD, URLs); your microapp ingests them into a per-tenant vector store; agents answer from the KB.
Microapp owns the upload UI + ingestion pipeline. Framework exposes vector-store admin RPC; microapp uses it to populate each tenant's KB.
#![allow(unused)] fn main() { async fn ingest_document( tenant_id: &str, doc: UploadedDoc, admin: &AdminClient, ) -> Result<()> { let chunks = chunk_document(&doc.content); for chunk in chunks { let embedding = embed(&chunk).await?; admin.vector_upsert(tenant_id, VectorRecord { id: uuid::Uuid::new_v4().to_string(), collection: "kb".into(), content: chunk.text, embedding, metadata: doc.metadata.clone(), }).await?; } Ok(()) } }
Agents in that tenant query via a search_kb tool that the
framework wires automatically when vector_collections: [kb] is
declared in their agents.yaml.
Pattern 5 · Webhook-driven SaaS
When to use · External services (Stripe, GitHub, Shopify) push events to your SaaS; you trigger agent workflows from those events.
Microapp accepts webhooks at POST /webhook/<provider>. Each
webhook becomes a RemoteTrigger published to the framework,
which routes to the right agent based on tenant + provider.
#![allow(unused)] fn main() { async fn stripe_webhook( State((admin, secret)): State<(AdminClient, String)>, body: Bytes, headers: HeaderMap, ) -> StatusCode { let event = stripe::verify_webhook(&body, &headers, &secret)?; let tenant_id = lookup_tenant_by_stripe_customer(&event.customer).await?; admin.publish_remote_trigger(&tenant_id, RemoteTrigger { kind: "stripe.charge.failed".into(), target_agent: "billing-bot".into(), payload: serde_json::to_value(&event)?, }).await?; StatusCode::OK } }
→ RemoteTrigger outbound publisher
Pattern 6 · Background workers + scheduled jobs
When to use · Microapp needs to run periodic tasks (digest emails, lead nurturing campaigns, billing reconciliation) that don't fit naturally into the agent loop.
Microapp uses its own job runner (Sidekiq / Celery / cron). When a job fires, it talks to the framework via admin RPC to dispatch the agent task.
# Microapp's celery worker
@celery.task
def daily_digest(tenant_id: str):
admin = AdminClient.connect("nats://...")
leads = fetch_new_leads(tenant_id)
if not leads:
return
admin.dispatch_agent_task(
tenant_id=tenant_id,
agent_id="digest-bot",
prompt=f"Build a 3-line summary of {len(leads)} new leads",
context={"leads": leads},
)
The framework ships cron_schedule tools too — but microapp-side
jobs can do anything the framework can't (DB queries, third-party
API calls, multi-step orchestration).
Pattern 7 · White-label deploy
When to use · You're selling the same microapp to multiple customers, each with their own branding / domain.
Microapp reads its branding (logo, name, primary color) from the
tenant's config. Each tenant's domain points to the same
microapp deploy with a header (X-Tenant-Slug: acme) that
resolves to the right tenant.
#![allow(unused)] fn main() { async fn extract_tenant(headers: &HeaderMap) -> Result<TenantId> { let slug = headers.get("X-Tenant-Slug") .and_then(|v| v.to_str().ok()) .ok_or(BadRequest)?; Ok(tenant_id_for_slug(slug).await?) } }
The framework's per-tenant secrets + audit logs handle the isolation; microapp handles the branding.
Pattern 8 · Hybrid (your stack + framework)
When to use · You have an existing product (Rails / Django / Laravel SaaS) and want to add agent capability without rebuilding.
Microapp keeps its existing UI / DB / auth. It only delegates the agent loop to nexo-rs. The integration is one admin RPC client in your existing backend.
// Existing Laravel SaaS adds an agent endpoint
class AgentController extends Controller
{
public function ask(Request $req): JsonResponse
{
$admin = app(AdminClient::class);
$reply = $admin->dispatchAgentTask(
tenantId: auth()->user()->tenant_id,
agentId: 'support-copilot',
prompt: $req->input('message'),
);
return response()->json(['reply' => $reply]);
}
}
Your existing app stays as-is; nexo-rs becomes a backend service your code calls when it needs an agent.
Choosing between patterns
| If you... | Use |
|---|---|
| Build for one team / one company | Single-tenant deploy (1) |
| Sell to multiple customers | Multi-tenant SaaS (2) |
| Want a custom UI (React / mobile / Tauri) | BYO-UI (3) |
| Customers upload docs to query | Knowledge-as-a-Service (4) |
| External services push events to you | Webhook-driven (5) |
| Need scheduled tasks beyond cron tools | Background workers (6) |
| Sell to multiple resellers | White-label (7) |
| Have an existing SaaS to augment | Hybrid (8) |
Microapp vs Extension — quick decision
If you're between Microapp and Extension:
- Choose Microapp when: you own the UI, the auth, the billing, and the framework runs out of view. End-users never see nexo.
- Choose Extension when: you're contributing functionality
into the framework that operators install with
nexo ext install. End-users may see your tool / advisor / skill output but not your code's UI.
A SaaS often combines both: a multi-tenant microapp + one or two custom extensions for the vertical.
See also
- Microapps · getting started — 1-hour walkthrough of pattern 1.
agent-creatorreference — full pattern 2 implementation.- Admin RPC — every endpoint you'll call.
- Building microapps in Rust — language-specific guide.
Admin RPC
Phase 82.10 ships a bidirectional JSON-RPC layer that lets
microapps perform admin operations on the daemon without leaving
the existing stdio transport. Today the daemon → microapp
direction is tools/call + hooks/<name>; the inverse is
nexo/admin/<domain>/<method>.
A microapp with an operator UI (e.g. agent-creator-microapp)
uses this surface to:
- CRUD agents (
agents.yaml.<id>) - Register / revoke channel credentials (many-to-many)
- Initiate WhatsApp QR pairing flows
- Manage LLM provider entries (
llm.yaml.providers.*global,llm.yaml.tenants.<id>.providers.*per-tenant — Phase 83.8.12.5) - Approve / revoke MCP-channel servers per agent
- CRUD tenants (
config/tenants.yaml) for SaaS deployments hosting N empresas / workspaces from one daemon (Phase 83.8.12 —nexo/admin/tenants/{list,get,upsert,delete}) - Force a hot-reload after batch mutations
Layered grant model
Admin RPC uses two layers of opt-in:
-
plugin.toml [capabilities.admin]— what the microapp needs:[capabilities.admin] required = ["agents_crud", "credentials_crud", "pairing_initiate"] optional = ["llm_keys_crud", "channels_crud"]required— boot fails if operator did not grant.optional— boot OK; runtime calls return-32004 capability_not_granteduntil granted.
-
extensions.yaml.entries.<id>.capabilities_grant— what the operator allows:extensions: entries: agent-creator: capabilities_grant: - agents_crud - credentials_crud - pairing_initiate # llm_keys_crud not granted → calls return -32004
Boot diff produces a CapabilityBootReport:
| Diff outcome | Severity | Behaviour |
|---|---|---|
| Required not granted | error | Boot fails |
| Optional not granted | warn | Runtime returns -32004 |
| Granted but not declared | warn | Allowed (forward-compat) |
| All matched | ok | No log |
Wire shape
Microapp → daemon request (over the existing stdio):
{
"jsonrpc": "2.0",
"id": "app:01HXXX...",
"method": "nexo/admin/agents/list",
"params": { "active_only": true }
}
Daemon → microapp response:
{
"jsonrpc": "2.0",
"id": "app:01HXXX...",
"result": {
"agents": [
{ "id": "ana", "active": true, "model_provider": "minimax", "bindings_count": 2 }
]
}
}
ID prefix app: distinguishes microapp-initiated requests from
daemon-initiated tools/call. Daemon-initiated IDs use random
UUIDs without that prefix; the runtime asserts the invariant at
boot.
Capability denial
When the capability gate refuses a call:
{
"jsonrpc": "2.0",
"id": "app:01HXXX...",
"error": {
"code": -32004,
"message": "capability_not_granted",
"data": {
"capability": "agents_crud",
"microapp_id": "agent-creator",
"method": "nexo/admin/agents/upsert"
}
}
}
SDK side maps this to AdminError::CapabilityNotGranted { capability, method }.
Domains + methods
| Method | Capability | Domain | Wraps |
|---|---|---|---|
nexo/admin/agents/list | agents_crud | agents | yaml read |
nexo/admin/agents/get | agents_crud | agents | yaml read |
nexo/admin/agents/upsert | agents_crud | agents | yaml mutate + reload |
nexo/admin/agents/delete | agents_crud | agents | yaml remove + reload |
nexo/admin/credentials/list | credentials_crud | credentials | filesystem + yaml join |
nexo/admin/credentials/register | credentials_crud | credentials | filesystem write + yaml mutate (many-to-many) |
nexo/admin/credentials/revoke | credentials_crud | credentials | filesystem unlink + yaml mutate |
nexo/admin/pairing/start | pairing_initiate | pairing | session_store insert + plugin trigger |
nexo/admin/pairing/status | pairing_initiate | pairing | session_store read |
nexo/admin/pairing/cancel | pairing_initiate | pairing | session_store mutate + notification |
nexo/admin/llm_providers/list | llm_keys_crud | llm_providers | llm.yaml read |
nexo/admin/llm_providers/upsert | llm_keys_crud | llm_providers | env var validation + llm.yaml mutate |
nexo/admin/llm_providers/delete | llm_keys_crud | llm_providers | refuse if agent uses + llm.yaml remove |
nexo/admin/channels/list | channels_crud | channels | yaml read |
nexo/admin/channels/approve | channels_crud | channels | yaml mutate (idempotent) |
nexo/admin/channels/revoke | channels_crud | channels | yaml mutate |
nexo/admin/channels/doctor | channels_crud | channels | static yaml verdicts |
nexo/admin/reload | agents_crud | meta | force Phase 18 hot-reload |
nexo/admin/llm/complete | llm_complete | llm | one-shot completion (admin debugger) |
nexo/admin/agent_events/list | transcripts_read | agent_events | transcript pagination |
nexo/admin/agent_events/read | transcripts_read | agent_events | single transcript fetch |
nexo/admin/agent_events/search | transcripts_read | agent_events | full-text search |
nexo/admin/microapp_audit/tail | audit_read | audit | per-microapp audit log tail |
nexo/admin/processing/pause | operator_intervention | processing | pause autonomous loop |
nexo/admin/processing/resume | operator_intervention | processing | resume after pause |
nexo/admin/processing/intervention | operator_intervention | processing | inject operator turn |
nexo/admin/processing/state | operator_intervention | processing | read pause/intervention state |
nexo/admin/escalations/list | escalations_read | escalations | pending escalation queue |
nexo/admin/escalations/resolve | escalations_resolve | escalations | mark escalation handled |
nexo/admin/skills/list | skills_crud | skills | filesystem walk + manifest read |
nexo/admin/skills/get | skills_crud | skills | single skill manifest |
nexo/admin/skills/upsert | skills_crud | skills | filesystem write + reload |
nexo/admin/skills/delete | skills_crud | skills | filesystem unlink + reload |
nexo/admin/tenants/list | tenants_crud | tenants | tenants.yaml read |
nexo/admin/tenants/get | tenants_crud | tenants | tenants.yaml lookup |
nexo/admin/tenants/upsert | tenants_crud | tenants | tenants.yaml mutate + reload |
nexo/admin/tenants/delete | tenants_crud | tenants | tenants.yaml remove + reload |
nexo/admin/mcp/list | mcp_crud | mcp | mcp.yaml read |
nexo/admin/mcp/get | mcp_crud | mcp | mcp.yaml lookup |
nexo/admin/mcp/upsert | mcp_crud | mcp | mcp.yaml mutate + reload |
nexo/admin/mcp/delete | mcp_crud | mcp | mcp.yaml remove + reload |
nexo/admin/plugins/doctor | plugin_doctor | plugins | discovery snapshot (manifests + capabilities) |
nexo/admin/plugins/restart | plugin_restart | plugins | force-restart subprocess plugin (Phase 81.21.b.b) |
nexo/admin/memory/query | memory_query | memory | LongTermMemory recall |
nexo/admin/memory/list_snapshots | memory_snapshot | memory | snapshot bundle list |
nexo/admin/memory/delete_snapshot | memory_snapshot | memory | snapshot bundle delete (idempotent) |
nexo/admin/memory/create_snapshot | memory_snapshot | memory | capture bundle (server forces redact_secrets+admin-ui provenance) |
nexo/admin/memory/restore_snapshot | memory_snapshot | memory | restore from snapshot_id (server resolves bundle path; auto_pre_snapshot=true) |
nexo/admin/secrets/write | secrets_write | secrets | per-microapp secret store mutate |
nexo/admin/auth/rotate_token | auth_rotate | auth | bearer + cookie HMAC rotation |
nexo/admin/whatsapp/bot/list | channels_crud | bot enumeration | |
nexo/admin/whatsapp/bot/send | channels_crud | one-off send |
Live methods: 57 across 17 capabilities. Phase 81.21.b.b added
plugin_restart (write+destructive, distinct from read-only
plugin_doctor). Phase 90.x.memory-snapshot.create-restore added
memory_snapshot covering all four CRUD verbs on snapshot bundles.
Many-to-many credentials
A single channel credential can serve N agents simultaneously:
# agents.yaml — both agents bind to the shared credential
agents:
- id: ana
inbound_bindings:
- { plugin: whatsapp, instance: shared }
- id: carlos
inbound_bindings:
- { plugin: whatsapp, instance: shared }
Operators rebind from either side:
- Credential side —
nexo/admin/credentials/register {channel, instance, agent_ids: ["ana","carlos"], payload: {...}}writes the credential file and appends{plugin: channel, instance}to each agent'sinbound_bindings(skipping duplicates). - Agent side —
nexo/admin/agents/upsert {id, inbound_bindings: [...]}replaces the binding list directly.
nexo/admin/credentials/revoke {channel, instance} removes the
binding from every agent that was using it AND deletes the
credential file.
Framework is channel-agnostic; v1 microapp UIs scope to WhatsApp only.
Channel credential persisters (Phase 82.10.n)
credentials/register does NOT only write the opaque credential
blob: it also brides into the per-channel plugin's runtime state
(yaml accounts list, secret file, in-memory store) via the
ChannelCredentialPersister trait. Channel plugins register a
persister at boot; the dispatcher routes per input.channel.
Lifecycle on register (when a persister is registered):
validate_shape(payload, metadata)— synchronous, network-free shape check. Bad shape →-32602 invalid_params.- Opaque blob write (
CredentialStore::write_credential). persist(instance, payload, metadata).await— writes the per-channel runtime state. Failure leaves the opaque blob on disk so the operator can retry.- Agent bindings + reload signal (existing).
probe(instance, payload, metadata).await— best-effort connectivity check. Errors NEVER abort register; outcome is surfaced to the caller asvalidation.
Response shape:
{
"summary": { "channel": "telegram", "instance": "kate",
"agent_ids": ["kate"] },
"validation": {
"probed": true,
"healthy": true,
"detail": "authenticated as @kate_bot",
"reason_code": "ok"
}
}
validation is null when no persister is registered for the
channel (back-compat: pre-82.10.n callers see only summary-
shaped data inside the wrapper).
Stable reason codes
reason_code mirrors the pattern in
research/docs/auth-credential-semantics.md:
| Code | Meaning |
|---|---|
ok | Probe completed; channel reachable + authenticated |
unsupported_channel | No persister registered for the channel |
invalid_payload | Persister rejected payload shape |
invalid_metadata | Persister rejected metadata shape |
connectivity_failed | Network failure (DNS, TCP, timeout) |
auth_failed | Provider rejected credentials (401, IMAP NO) |
tls_failed | TLS handshake failed |
not_probed | Persister opted out of probing (whatsapp default) |
Bundled persisters
| Channel | Yaml file | Secret layout | Probe |
|---|---|---|---|
telegram | <config_dir>/plugins/telegram.yaml | <secrets>/telegram_<instance>_token.txt (mode 0600) | GET https://api.telegram.org/bot<TOKEN>/getMe (5s timeout) |
email | <config_dir>/plugins/email.yaml | <secrets>/email/<instance>.toml (mode 0600) | TCP connect + TLS handshake to IMAP host (5s timeout) |
whatsapp | n/a (pairing flow owns it) | n/a | not_probed (pairing has its own probe surface) |
Telegram persister metadata fields (all optional, defaults applied):
{
"polling": { "enabled": true, "interval_ms": 1000 },
"allow_agents": ["kate"],
"allowed_chat_ids": [123, 456]
}
Email persister payload + metadata shape (all required unless noted):
{
"channel": "email",
"instance": "ops",
"agent_ids": ["ana"],
"payload": {
"address": "ops@example.com",
"password": "..." // OR "xoauth2_token", exactly one
},
"metadata": {
"imap": { "host": "imap.example.com", "port": 993, "tls": "implicit_tls" },
"smtp": { "host": "smtp.example.com", "port": 587, "tls": "starttls" },
"provider": "gmail" // optional
}
}
Audit redaction
payload.token, payload.password, payload.xoauth2_token are
replaced with "<redacted>" before the audit row's args_hash
is computed. Defense-in-depth: any token / password /
xoauth2_token / api_key / secret key inside metadata.*
(including nested objects) is also redacted.
Adding a new channel persister
- Implement
ChannelCredentialPersisterincrates/setup/src/persisters/<channel>.rs. - Add to
nexo_setup::persistersre-exports. - Push into
AdminBootstrapInputs.persistersinsrc/main.rs. - Document the payload + metadata schema + reason codes here.
The trait + dispatcher registry lives in nexo-core; the
trait is #[async_trait] and probe has a default implementation
returning not_probed so a persister can opt out.
Async pairing flow
Microapp Daemon
|--- pairing/start (agent_id, channel) ---->|
|<-- {challenge_id, expires_at_ms, ...} ----|
| |
| (out-of-band: channel plugin starts QR) |
| |
|<-- nexo/notify/pairing_status_changed -----|
| {challenge_id, state: "qr_ready", data: {qr_ascii, qr_png_base64}}
| |
| (operator scans QR on phone) |
| |
|<-- nexo/notify/pairing_status_changed -----|
| {challenge_id, state: "linked", data: {device_jid}}
| |
| (microapp calls credentials/register to |
| complete the binding) |
Notification topic: nexo/notify/pairing_status_changed (no id
field — server-pushed).
States: pending → qr_ready → awaiting_user → linked |
expired | cancelled. Microapp may also poll
nexo/admin/pairing/status or cancel via
nexo/admin/pairing/cancel.
Audit log
Every dispatched call appends one row regardless of outcome
(ok / error / denied):
#![allow(unused)] fn main() { struct AdminAuditRow { microapp_id: String, method: String, capability: String, args_hash: String, // SHA-256 of canonicalized params started_at_ms: u64, result: AdminAuditResult, error_code: Option<i32>, duration_ms: u64, } }
args_hash lets operator audit pipelines detect repeated
identical calls (potential abuse) without storing PII payloads.
Two writer implementations:
InMemoryAuditWriter— default, used in tests and as a fallback when no on-disk path is configured. Resets on restart.SqliteAdminAuditWriter(Phase 82.10.h.1) — writes themicroapp_admin_audittable (idempotentCREATE TABLE IF NOT EXISTS+ WAL + 3 indices onmicroapp_id,method, andtenant_id).sweep_retention(retention_days, max_rows)runs at boot to enforce age + cap limits via theNEXO_MICROAPP_ADMIN_AUDIT_RETENTION_DAYS/_MAX_ROWStoggles. Library-leveltail(&AuditTailFilter)query (Phase 82.10.h.2) backs thenexo microapp admin audit tailCLI —format_rows_as_tableandformat_rows_as_jsonhelpers ship in the same module.
Phase 83.8.12.6.runtime + .b — skills resolution chain + migration
The runtime SkillLoader resolves a skill name in this order:
<root>/<tenant_id>/<name>/SKILL.md(when the agent hastenant_idset)<root>/__global__/<name>/SKILL.md<root>/<name>/SKILL.md(legacy pre-83.8.12.6 layout — logs a deprecation warning when used)
Per-tenant skills override the global namespace, and the global namespace fills in for tenants that don't have their own copy. The legacy fallback keeps existing deployments working without any migration; the deprecation log nudges operators toward the new layout.
For a clean cutover, nexo_setup::skills_migrate::migrate_legacy_skills_to_global
moves every legacy <root>/<name>/SKILL.md into
<root>/__global__/<name>/SKILL.md. Idempotent, leaves
tenant-scoped layouts untouched, reports filename conflicts.
Phase 83.8.12.4.b — per-tenant event firehose + escalations filter
AgentEventKind::TranscriptAppended events carry the agent's
tenant_id whenever the runtime knows it (agent.tenant_id
from agents.yaml). The framework writer
(TranscriptWriter::with_tenant_id) and reader
(TranscriptReaderFs::with_tenant_id) both stamp the field
on emit; firehose subscribers can filter per-tenant without
a per-event lookup against agents.yaml. Untagged
deployments (single-tenant) emit tenant_id: null — back
compat preserved.
agent_events/list and escalations/list honour
filter.tenant_id defense-in-depth: cross-tenant queries
return empty (no leak of existence). Agents lacking a
tenant_id field in agents.yaml are excluded from any
non-null tenant filter.
Phase 83.8.12.7 — per-tenant audit scope
Every audit row carries an Option<String> tenant_id that the
dispatcher sniffs from params.tenant_id (string-typed only —
non-string values yield None defensively). Calls that lack a
tenant scope (echo, pairing/*, credentials/*) leave the
column NULL so existing pre-83.8.12.7 deployments keep
working. Operators can filter the tail by tenant for SaaS
billing or compliance reviews:
# CLI — restrict to one tenant scope
nexo microapp admin audit tail --tenant acme --limit 100
# combine with other filters
nexo microapp admin audit tail --tenant acme --result denied --since-mins 60
# library-side convenience: tail_for_tenant(tenant, since_ms?, limit)
let rows = writer.tail_for_tenant("acme", None, 50).await?;
Schema migrates forward-only on open(): the inline
CREATE TABLE IF NOT EXISTS adds tenant_id for fresh DBs, and
ALTER TABLE ... ADD COLUMN tenant_id TEXT runs idempotently
on legacy DBs (the duplicate-column-name error is the green
path). Existing audit rows keep NULL and are excluded from
any tenant-scoped tail.
INVENTORY env toggles
Per-domain global kill switches in
crates/setup/src/capabilities.rs::INVENTORY:
| Env var | Default | Disable effect |
|---|---|---|
NEXO_MICROAPP_ADMIN_AGENTS_ENABLED | 1 | All agents/* return -32601 |
NEXO_MICROAPP_ADMIN_CREDENTIALS_ENABLED | 1 | All credentials/* return -32601 |
NEXO_MICROAPP_ADMIN_PAIRING_ENABLED | 1 | All pairing/* return -32601 |
NEXO_MICROAPP_ADMIN_LLM_KEYS_ENABLED | 1 | All llm_providers/* return -32601 |
NEXO_MICROAPP_ADMIN_CHANNELS_ENABLED | 1 | All channels/* return -32601 |
Capability grants are the per-microapp check; INVENTORY is the operator-global kill switch (e.g. enterprise op disables pairing entirely while keeping agents CRUD).
SDK side
Microapp Rust code uses the SDK's AdminClient (gated by the
admin cargo feature):
[dependencies]
nexo-microapp-sdk = { version = "0.1", features = ["admin"] }
#![allow(unused)] fn main() { use nexo_microapp_sdk::admin::{AdminClient, AdminError}; use nexo_tool_meta::admin::agents::AgentsListFilter; async fn list_active_agents(client: &AdminClient) -> Result<usize, AdminError> { let response: nexo_tool_meta::admin::agents::AgentsListResponse = client.call( "nexo/admin/agents/list", AgentsListFilter { active_only: true, plugin_filter: None }, ).await?; Ok(response.agents.len()) } }
Each call generates a fresh app:<uuid-v7> request id, registers
a oneshot receiver, writes the JSON-RPC frame, and awaits the
response (default 30 s timeout). Capability denial maps to the
typed AdminError::CapabilityNotGranted { capability, method }.
Operator identity stamping (Phase 82.10.m)
A handful of admin methods carry an operator_token_hash: String
field in their wire shape — processing/{pause, resume, intervention} and escalations/resolve. The canonical list lives
at
nexo_tool_meta::admin::operator_stamping::OPERATOR_STAMPED_METHODS.
Microapps register a closure-based source via
AdminClient::set_operator_token_hash; the SDK then transparently
stamps the field on every outbound stamped call. The override is
unconditional (defense-in-depth): any caller-supplied value is
replaced with the value the closure returns.
#![allow(unused)] fn main() { use std::sync::Arc; use arc_swap::ArcSwap; use nexo_microapp_sdk::admin::AdminClient; // Hot-swappable identity source — rotation updates the ArcSwap // in place; the next stamped call re-reads it. let live_hash = Arc::new(ArcSwap::from_pointee( "deadbeef0123cafe".to_string(), )); fn install(client: &AdminClient, source: Arc<ArcSwap<String>>) { client.set_operator_token_hash(move || (*source.load_full()).clone()); } }
The closure is invoked once per outbound stamped call, so a
post-rotation pause request lands the new identity without any
re-registration. Non-stamped methods (agents/list,
escalations/list, etc.) pass through untouched.
This pattern replaces the legacy "HTTP middleware injection"
approach where each microapp duplicated the method list locally.
Single source of truth lives in nexo-tool-meta.
Production wiring
Three production adapters ship in nexo_setup::admin_adapters
(Phase 82.10.h.3) — they close the cycle between core (which
declares the traits) and setup (which holds the concrete
yaml_patch + filesystem code):
#![allow(unused)] fn main() { use nexo_setup::admin_adapters::{ AgentsYamlPatcher, FilesystemCredentialStore, LlmYamlPatcherFs, }; let agents = AgentsYamlPatcher::new(config_dir.join("agents.yaml")); let llm = LlmYamlPatcherFs::new(config_dir.join("llm.yaml")); let creds = FilesystemCredentialStore::new(secrets_root); let audit = SqliteAdminAuditWriter::open(state_dir.join("admin_audit.db")).await?; let dispatcher = AdminRpcDispatcher::new() .with_capabilities(capability_set) .with_audit_writer(audit) .with_agents_domain(agents.clone(), reload_signal.clone()) .with_credentials_domain(agents, creds) .with_llm_providers_domain(llm); }
AgentsYamlPatcher is Clone and feeds both the agents and the
credentials domain (the latter mutates inbound_bindings on each
agent). serde_yaml::Value ↔ serde_json::Value conversion
happens inside the adapter, so trait callers stay JSON-typed
(matching what microapps see on the wire).
Bootstrap helper (Phase 82.10.h.b.5)
nexo_setup::admin_bootstrap::AdminRpcBootstrap::build wraps the
full wire path so operators don't hand-thread every adapter into
the dispatcher:
#![allow(unused)] fn main() { use nexo_setup::admin_bootstrap::{AdminBootstrapInputs, AdminRpcBootstrap}; let bootstrap = AdminRpcBootstrap::build(AdminBootstrapInputs { config_dir: &config_dir, secrets_root: &secrets_root, audit_db: std::env::var_os("NEXO_MICROAPP_ADMIN_AUDIT_DB") .as_ref() .map(std::path::Path::new), extensions_cfg: &extensions_cfg, admin_capabilities: &per_extension_admin_caps, reload_signal, }) .await?; }
build returns Ok(None) when no microapp declares
[capabilities.admin] so the daemon pays zero overhead in the
common case. When it returns Some(bootstrap), the spawn loop
threads the per-microapp AdminRouter through
StdioSpawnOptions::admin_router and post-spawn binds the live
outbound writer:
#![allow(unused)] fn main() { let opts = bootstrap .spawn_options_for(&extension_id, default_opts) .unwrap_or(default_opts); let runtime = StdioRuntime::spawn_with(&manifest, opts).await?; bootstrap.bind_writer(&extension_id, runtime.outbox_sender()); }
A periodic 30 s task prunes the in-memory pairing store.
In-memory pairing challenge store (Phase 82.10.h.b.1)
InMemoryPairingChallengeStore is a DashMap<Uuid, …> + TTL
adapter — same pattern as OpenClaw's activeLogins map.
read_challenge lazily flips entries past their TTL to
PairingState::Expired with an operator-readable
data.error, so polls converge to the terminal state without
waiting for the prune cadence. Daemon restart drops in-flight
challenges (the WhatsApp QR client-side expires in ~30 s
anyway, so a SQLite-backed store would be wasted work).
Pairing notifier (deferred)
StdioPairingNotifier ships as a building block but is not
yet wired into AdminRpcBootstrap. Microapps fall back to
polling pairing/status until a follow-up exposes a separate
notification queue independent of the response writer.
Agent events firehose (Phase 82.11)
agent_events is the cross-app surface microapps use to stream
and query agent activity. v0 emits one variant —
TranscriptAppended — but the wire shape is a
discriminated #[non_exhaustive] enum so future kinds (batch
job completion, image-gen output, custom) land non-breaking.
Backfill RPC (nexo/admin/agent_events/*)
nexo/admin/agent_events/list { agent_id, kind?, since_ms?, limit? }— newest-first window query, defaultsince_ms = now - 30d,limit = 500clamped to 1000.nexo/admin/agent_events/read { agent_id, session_id, since_seq?, limit? }— one-scope ascending tail, exclusivesince_seq(a microapp that receivedseq=4live re-issuesreadwithsince_seq=4and gets seq=5,6,7,…). Unknown scope returnsevents: [], NOT-32601.nexo/admin/agent_events/search { agent_id, query, kind?, limit? }— FTS5 query over the redacted body. Backed by the existingtranscripts_ftsvirtual table.
All three require capability transcripts_read.
Live notifications (nexo/notify/agent_event)
JSON-RPC notification frame, no id:
{"jsonrpc":"2.0","method":"nexo/notify/agent_event",
"params":{"kind":"transcript_appended","agent_id":"ana",
"session_id":"…","seq":7,"role":"user",
"body":"[REDACTED:phone] hola","sent_at_ms":…,
"sender_id":"wa.55","source_plugin":"whatsapp"}}
Body is always already-redacted at emit time — the hook
fires inside TranscriptWriter::append_entry AFTER the
redactor (Phase 10.4) replaces secrets with
[REDACTED:label]. Defense-in-depth: a microapp without
transcripts_read cannot recover the raw body either.
Subscribe semantics
There is no explicit subscribe RPC — AdminRpcBootstrap
inspects the operator's grant matrix at boot:
- Microapp granted
transcripts_subscribe→ receives everyTranscriptAppendedframe. - Microapp granted
agent_events_subscribe_all→ receives every kind. Reserved for audit / compliance microapps that need full visibility (v0 emits onlyTranscriptAppendedso the two caps are equivalent today; the slot future-proofs for batch / output kinds). - Microapp without either cap → receives no frames; backfill
RPC still gated on
transcripts_read.
seq discipline: per-session_id monotonic counter that
advances by 1 per TranscriptAppended frame. Live + backfill
agree on seq values, so a microapp that misses live frames
(broadcast lag, transient stdin block) re-issues
agent_events/read with since_seq = last_seen to resync.
INVENTORY toggle
NEXO_MICROAPP_AGENT_EVENTS_ENABLED (default 1). Off →
broadcast emitter is replaced with a no-op AND no subscribe
tasks spawn. Backfill RPC continues to work (so a microapp
with transcripts_read keeps querying past sessions). Useful
for hardened deployments that want only on-demand history.
Lag handling
tokio::sync::broadcast channel with default capacity 256.
Subscribers that fall behind get RecvError::Lagged(n) —
boot wires this as a single warn log and the receiver
re-syncs to the next surviving frame. Microapps that need
gap-free history call agent_events/read from
last_seen_seq.
HTTP server capability (Phase 82.12)
Microapps that ship their own HTTP UI / API (meta-microapp,
dashboard, settings panel) declare it in plugin.toml:
[capabilities.http_server]
port = 9001
bind = "127.0.0.1" # default — loopback only
token_env = "AGENT_CREATOR_TOKEN"
health_path = "/healthz" # default
Boot supervisor
HttpServerSupervisor::probe(decl) polls
GET <bind>:<port><health_path> every 250 ms until 200 OK or
the 30 s ready timeout. Typed errors:
Timeout { url }— no listener after 30 s.BadStatus { url, status }— listener responds non-200.
Once probed, spawn_monitor_loop(decl) polls every 60 s.
Failures log at warn and flip a watch::Receiver<bool> so
nexo extension status / admin-ui can surface the live
health state. Monitor handle aborts on drop.
Bind policy
bind defaults to 127.0.0.1. Anything else (0.0.0.0,
public IP, …) requires the operator to flip
extensions.yaml.<id>.allow_external_bind = true. The
AdminRpcBootstrap::build validator checks this BEFORE
spawning the extension; mismatches surface as
AdminBootstrapError::ExternalBindNotAllowed { microapp_id, bind }. Defense in depth against accidentally world-exposed
services.
Shared bearer token
The microapp reads <token_env> at boot (the daemon passes
it through via the initialize env block). All inbound HTTP
requests must include Authorization: Bearer <token> or
X-Nexo-Token: <token>. Token rotation arrives as a JSON-RPC
notification — the daemon emits
nexo/notify/token_rotated { old_hash, new } after the
operator changes the env + reloads. Microapps compare
old_hash against token_hash(<their current token>)
(sha256-hex truncated to 16 chars) before swapping, so a
stale notification hitting an already-restarted microapp is
ignored.
INVENTORY toggle
NEXO_MICROAPP_HTTP_SERVERS_ENABLED (default 1). Off →
boot supervisor skips the probe + monitor loop entirely.
Microapps still spawn; the daemon just doesn't gate ready
on the HTTP endpoint. Useful for hardened deployments that
ban embedded HTTP servers or run them out-of-band.
Operator processing pause + intervention (Phase 82.13)
Operators sometimes need to suspend agent autonomy on a specific scope and step in manually. v0 ships chat-takeover (per-conversation pause + manual reply); the wire shape is generalised across every agent shape so future variants (batch override, event injection, image-gen output edit) plug in without breaking the surface.
Wire shapes
#![allow(unused)] fn main() { #[non_exhaustive] enum ProcessingScope { Conversation { agent_id, channel, account_id, contact_id, mcp_channel_source? }, AgentBinding { ... }, // reserved Agent { ... }, // reserved EventStream { ... }, // reserved BatchQueue { ... }, // reserved Custom { ... }, // forward-compat } #[non_exhaustive] enum InterventionAction { Reply { channel, account_id, to, body, msg_kind, attachments?, reply_to_msg_id? }, SkipItem { ... }, // reserved OverrideOutput { ... }, // reserved InjectInput { ... }, // reserved Custom { ... }, // forward-compat } #[non_exhaustive] enum ProcessingControlState { AgentActive, PausedByOperator { scope, paused_at_ms, operator_token_hash, reason? }, } }
operator_token_hash is the Phase 82.12 token_hash shape
(sha256-hex truncated to 16 chars) — audits correlate without
storing the cleartext bearer.
Methods
nexo/admin/processing/pause { scope, reason?, operator_token_hash }→ProcessingAck { changed, correlation_id }. Idempotent.nexo/admin/processing/resume { scope, operator_token_hash }→ ack.nexo/admin/processing/intervention { scope, action, operator_token_hash }→ ack. Rejects calls on a non-paused scope (-32004 not_paused) so operators never double-respond.nexo/admin/processing/state { scope }→ProcessingStateResponse { state }.
All four gated on the operator_intervention capability.
Per-scope sub-gates (operator_intervention_conversation,
_batch, …) are a future-proofing slot.
v0 surface
Only the Conversation + Reply combination routes
end-to-end. Non-v0 scopes / actions surface as -32601 not_implemented so callers can probe the wire shape today
without the daemon pretending to support unimplemented
shapes.
Notification (Phase 82.13.b.firehose)
Pause and resume transitions are emitted on the agent event
firehose (nexo/notify/agent_event) as
AgentEventKind::ProcessingStateChanged. Operator UIs render
the pause indicator in real time without polling
processing/state. The constant
PROCESSING_STATE_CHANGED_NOTIFY_METHOD is reserved for any
future dedicated subject; today the variant rides on the same
firehose channel as every other agent event.
{
"jsonrpc": "2.0",
"method": "nexo/notify/agent_event",
"params": {
"kind": "processing_state_changed",
"agent_id": "ana",
"scope": { "kind": "conversation", ... },
"prev_state": { "state": "agent_active" },
"new_state": {
"state": "paused_by_operator",
"scope": { "kind": "conversation", ... },
"paused_at_ms": 1700000000000,
"operator_token_hash": "abcdef0123456789",
"reason": "investigando"
},
"at_ms": 1700000000000
}
}
Idempotent retries (a second pause on an already-paused
scope, a resume on agent_active) skip the emit so
subscribers do not see phantom transitions. Reply
intervention does NOT emit ProcessingStateChanged — state
stays paused; the TranscriptAppended emit on the operator
stamp signals operator activity instead.
Transcript stamping (Phase 82.13.b.1)
When the operator dispatches a reply via
nexo/admin/processing/intervention, the daemon optionally
stamps the reply onto the agent transcript so the agent sees
it on its next turn (after resume). To opt in, the
microapp passes the active session_id in the params:
{
"method": "nexo/admin/processing/intervention",
"params": {
"scope": { "kind": "conversation", "agent_id": "ana", ... },
"action": {
"kind": "reply",
"channel": "whatsapp",
"account_id": "wa.0",
"to": "wa.55",
"body": "ya te resuelvo, dame 1 minuto",
"msg_kind": "text"
},
"operator_token_hash": "abcdef0123456789",
"session_id": "33333333-3333-4333-8333-333333333333"
}
}
After the channel send acks, the daemon appends one entry to the session transcript:
| Field | Value |
|---|---|
role | Assistant (so the agent reads it as natural continuity on its next turn) |
content | The reply body, run through the standard redactor |
source_plugin | intervention:<channel> (e.g. intervention:whatsapp) — distinguishes operator stand-in from native LLM output |
sender_id | operator:<token_hash> — identifies the operator without exposing PII |
message_id | Channel-side provider id when the plugin acked one |
The same redactor + FTS index + Phase 82.11 firehose
pipeline as native agent appends — subscribers of
nexo/notify/agent_event see the operator's reply with
the discriminator above.
The ack includes a transcript_stamped hint:
| Value | Meaning |
|---|---|
Some(true) | Reply persisted on transcript. Agent will see it on next turn. |
Some(false) | Channel send happened, transcript was NOT modified. Either no session_id in params, no transcript appender wired in boot, or persistence failed (logged). |
None (omitted) | Field not applicable (e.g. for non-Reply interventions). |
When transcript_stamped: false and the operator UI knows
the active session, prompt the operator to reopen the
conversation and retry — the agent will otherwise reanudar
"ciega" without seeing what was said during takeover.
The SDK helper threads this through fluently:
#![allow(unused)] fn main() { use nexo_microapp_sdk::admin::{HumanTakeover, SendReplyArgs}; let takeover = HumanTakeover::engage(&admin, scope, token_hash, None).await?; takeover .send_reply( "whatsapp", "wa.0", "wa.55", SendReplyArgs::text("ya te resuelvo") .with_session(active_session_id), ) .await?; takeover.release(None).await?; }
Operator summary on resume (Phase 82.13.b.2)
The operator can hand the agent a free-text summary of what
happened during takeover. The daemon stamps it as a System
transcript entry just after the resume flip, so the agent
reads it as a system directive on its next turn:
{
"method": "nexo/admin/processing/resume",
"params": {
"scope": { "kind": "conversation", "agent_id": "ana", ... },
"operator_token_hash": "abcdef0123456789",
"session_id": "33333333-3333-4333-8333-333333333333",
"summary_for_agent": "cliente confirmó dirección, IA puede continuar con confirmación de envío"
}
}
The stamped entry shape:
| Field | Value |
|---|---|
role | System |
content | [operator_summary] <body> (body trimmed; prefix added server-side) |
source_plugin | intervention:summary |
sender_id | operator:<token_hash> |
message_id | None |
Validation (handler-side, all -32602 invalid_params):
| Code | When |
|---|---|
session_id_required_with_summary | summary_for_agent set but session_id missing |
empty_summary | summary trims to zero length |
summary_too_long | summary > 4096 chars (matches TranscriptsIndex FTS5 doc cap) |
Validation runs BEFORE the state flip, so a rejected call
keeps the scope paused. Stamping itself is best-effort —
appender errors leave the scope AgentActive (resume still
succeeds) and surface only via ack.transcript_stamped: Some(false).
The SDK helper takes the summary on release() after pinning
the session via with_session():
#![allow(unused)] fn main() { let takeover = HumanTakeover::engage(&admin, scope, token_hash, None) .await? .with_session(active_session_id); // ... operator types replies via takeover.send_reply ... takeover .release(Some( "cliente confirmó dirección, IA puede continuar con envío".into(), )) .await?; }
The pinned session is reused by both send_reply (transcript
stamping) and release (summary injection) — set once,
forget. Per-call SendReplyArgs.with_session() overrides the
pinned one when both are present.
Pending inbounds during pause (Phase 82.13.b.3)
While a scope is PausedByOperator, inbound user messages
arriving on the channel are buffered server-side instead of
firing an agent turn. On resume, the buffer is drained and
each inbound is stamped on the agent transcript as a User
entry with its ORIGINAL timestamp — so the agent reads real
chronology of what the customer said during takeover.
| Field | Value |
|---|---|
role | User |
content | Original (already-redacted) inbound body |
source_plugin | Channel that produced the inbound (whatsapp, etc.) |
sender_id | Counterparty id (e.g. WA jid) |
message_id | Channel-side provider id when present |
The cap is configured via NEXO_PROCESSING_PENDING_QUEUE_CAP
(default 50, set to 0 to disable buffering entirely).
When the cap is exceeded, the OLDEST entry is evicted FIFO
and an AgentEventKind::PendingInboundsDropped firehose
event fires so operator UIs can surface the drop.
// Firehose frame on cap-exceeded eviction:
{
"jsonrpc": "2.0",
"method": "nexo/notify/agent_event",
"params": {
"kind": "pending_inbounds_dropped",
"agent_id": "ana",
"scope": { "kind": "conversation", "agent_id": "ana", ... },
"dropped": 1,
"at_ms": 1700000000000
}
}
ProcessingAck.drained_pending: Some(N) on the resume call
reports how many entries were drained — None when the
queue was empty (no field on the wire). Operator UIs render
"replay: 3 messages" so the operator knows what the agent
will see on its next turn.
Round-trip end-to-end (Phase 82.13.c, 2026-05-02):
the inbound dispatcher push hook now lives in
runtime.rs, gated on a shared
Arc<dyn ProcessingControlStore> boot wires to BOTH the
admin RPC dispatcher AND every AgentRuntime. When the
operator pauses via nexo/admin/processing/pause, the very
next inbound channel message is buffered onto the
per-scope queue (cap = NEXO_PROCESSING_PENDING_QUEUE_CAP,
default 50, FIFO eviction). Body is redacted at push time
so the queue never holds raw PII. Resume drains the queue
onto the transcript as User entries with original
timestamps — agent reanudes coherently with full
chronology.
Smoke recipe (manual end-to-end):
# 1. Pause a conversation via admin RPC.
curl -X POST localhost:.../admin -d '{
"method": "nexo/admin/processing/pause",
"params": {
"scope": { "kind": "conversation", "agent_id": "ana",
"channel": "whatsapp", "account_id": "wa.0",
"contact_id": "wa.55" },
"operator_token_hash": "..."
}
}'
# 2. Send 3 WhatsApp inbounds while paused.
# The agent does NOT reply (intake hook buffers them).
# 3. Resume with optional summary.
curl -X POST localhost:.../admin -d '{
"method": "nexo/admin/processing/resume",
"params": {
"scope": { ... },
"session_id": "...",
"summary_for_agent": "cliente confirmó dirección",
"operator_token_hash": "..."
}
}'
# 4. Verify the transcript JSONL contains 3 fresh `User`
# entries with their ORIGINAL timestamps (not now()),
# plus a `[operator_summary] cliente confirmó dirección`
# System entry just after the resume.
# 5. Send 1 more WhatsApp inbound → agent replies normally,
# seeing all 4 buffered + 1 fresh user messages on its
# next turn.
Boot activation still depends on src/main.rs building
the AdminRpcBootstrap (deferred follow-up — same
boot-order refactor that gates the rest of the admin RPC
surface). Until then, the pause check + buffer infra exist
but are dormant in production. Once that lands, this
round-trip works without any further changes.
Agent escalations (Phase 82.14)
Cross-app primitive for the "I need help here" channel:
agents flag work items they cannot complete autonomously,
operators see a list and dismiss / take over. v0 ships the
admin RPC surface (read + resolve) plus the auto-resolve
hook on processing/pause; the escalate_to_human
built-in tool that raises new escalations is deferred to
82.14.b.
Wire shapes
#![allow(unused)] fn main() { enum EscalationReason { OutOfScope, MissingData, NeedsHumanJudgment, Complaint, Error, Ambiguity, PolicyViolation, Other, } enum EscalationUrgency { Low, Normal, High } #[non_exhaustive] enum ResolvedBy { OperatorTakeover, OperatorDismissed { reason: String }, AgentResolved, } #[non_exhaustive] enum EscalationState { None, Pending { scope: ProcessingScope, // 82.13 enum summary, reason, urgency, context: BTreeMap<String, Value>, requested_at_ms, }, Resolved { scope, resolved_at_ms, by }, } }
context is free-form per agent shape: chat agents emit
{"question": …, "customer_phone": …}, batch agents emit
{"job_id": …, "invalid_rows": 47}, image-gen emits
{"prompt": …, "policy": "nudity"}. Keeps the schema
stable while letting each agent surface meaningful detail.
Methods
nexo/admin/escalations/list { filter (default pending), agent_id?, scope_kind?, limit }→EscalationsListResponse { entries }. Newest-first byrequested_at_ms/resolved_at_ms; default cap 100, max 1000.nexo/admin/escalations/resolve { scope, by, dismiss_reason?, operator_token_hash }→EscalationsResolveResponse { changed, correlation_id }.by = "dismissed"requires adismiss_reason;by = "takeover"is the same outcome the auto-resolve hook produces.
Two granular capabilities:
escalations_read— gateslist. Read-only dashboards hold this.escalations_resolve— gatesresolve. Strictly stronger grant for operator UIs that act on escalations.
Auto-resolve on pause
When nexo/admin/processing/pause fires on a scope with a
matching Pending escalation AND both the processing +
escalation stores are wired, the dispatcher
auto-flips the escalation to Resolved { OperatorTakeover } BEFORE applying the pause. Failures
in the auto-resolve path log at warn and never block the
pause itself — operator intent (pause) takes priority over
side-effects.
Notification literals
escalation_requested and escalation_resolved are pinned
as pub const in the wire crate; the emit site lands in
82.14.b alongside the escalate_to_human built-in tool +
the BindingContext→scope derivation.
Limitations
- Bidirectional flow over single stdio:
app:ID prefix disambiguates microapp-initiated requests from daemon-initiated ones. Daemon must not useapp:prefix for its own request IDs. - Audit log writer choice:
InMemoryAuditWriterresets on daemon restart; pickSqliteAdminAuditWriter::open(path)for durable retention + the boot-timesweep_retention()sweeper. channels/doctorstatic-only: live MCP probe stays innexo channel doctor --runtimeCLI.- Live operator approval: every grant is yaml-static. v1 has
no
askinteractive flow (deferred to 82.10.i).
See also
- Building microapps in Rust — SDK + helper crate
surface (where
AdminClientlives behind theadminfeature). - Capability toggles — operator-global INVENTORY kill switches.
- Pairing protocol — Phase 26 underlying pairing infrastructure.
- Config hot-reload — Phase 18 reload trigger that admin RPC mutations hook into.
Microapp contract (Phase 83.6)
This page is the language-agnostic specification for what makes a program a nexo microapp. Every microapp — whether built with the Rust SDK, hand-written in Python, or shipped as a Go binary — implements the wire protocol below. If your code passes this contract, the daemon will load it.
Companion pages:
- Building microapps in Rust — the Rust SDK shortcut that hides the wire details when you don't need them.
- Admin RPC — the operator surface for managing agents/credentials/pairing/transcripts from inside a microapp.
Wire protocol overview
A microapp is a child process the daemon launches once at boot and keeps alive across multiple agent turns. Communication is line-delimited JSON-RPC 2.0 over stdio:
stdin(daemon → microapp): one JSON-RPC frame per line, UTF-8.stdout(microapp → daemon): same shape; mixed responses + notifications + outbound requests.stderr: free-form log lines forwarded to the daemon'stracingsubscriber. Microapps SHOULD prefix log lines with[INFO],[WARN],[ERROR]so the daemon can map them.
Every JSON-RPC frame is exactly one line (no embedded
newlines in the JSON). The daemon's reader splits on \n. A
microapp MUST flush stdout after every frame.
Framing rules
| Direction | Shape | Notes |
|---|---|---|
| Daemon → microapp request | {"jsonrpc":"2.0","id":<int>,"method":...,"params":...} | Numeric id (incrementing). |
| Microapp → daemon response | {"jsonrpc":"2.0","id":<int>,"result":...} or {...,"error":{"code":...,"message":...}} | id MUST echo the request's. |
| Microapp → daemon outbound request | {"jsonrpc":"2.0","id":"app:<uuid>","method":...,"params":...} | id MUST start with "app:" to disambiguate from daemon-initiated. |
| Daemon → microapp response to outbound | {"jsonrpc":"2.0","id":"app:<uuid>","result":...} | Echoes the microapp's id. |
| Either direction notification | {"jsonrpc":"2.0","method":...,"params":...} (no id) | Fire-and-forget; never gets a response. |
Methods (daemon → microapp)
These are the methods the daemon will call on your microapp.
Implement them all. Methods not in this list are reserved for
future versions; respond with error code -32601 (method not
found) for forward-compat.
initialize
Called once per microapp lifetime, immediately after spawn. Returns the microapp's tool catalogue + declared capabilities.
{"method":"initialize","params":{
"extension_id":"agent-creator",
"state_dir":"/path/to/.nexo/extensions/agent-creator/state",
"config":{"...microapp-specific config from extensions.yaml..."}
}}
Result:
{
"tools":[
{"name":"agent_creator_create","description":"...","input_schema":{...}}
],
"version":"0.1.0"
}
tools/list
Re-queried on every binding refresh. Same return shape as
initialize.tools. Microapps SHOULD return identical bytes
across calls so the daemon's tool-cache prefix matcher stays
warm.
tools/call
The core agent-loop entry point. Carries the effective
BindingContext (the agent / channel /
account triple) and the LLM's tool-call args.
{"method":"tools/call","params":{
"tool":"agent_creator_create",
"args":{"name":"alice"},
"binding_context":{...},
"inbound":{...}
}}
Result {"output":<JSON>} (success) or
{"error":"description"} (microapp-side failure — distinct from
JSON-RPC error which signals a protocol-level fault).
agents/updated
Notification (no id). Fired when the daemon's agents.yaml
hot-reload picked up a change that affects this microapp's
binding surface. Payload includes the new agent IDs visible to
this microapp.
hooks/<name>
Called when the daemon dispatches a hook the microapp registered
during initialize (Phase 83.3). Reply with a
HookDecision.
shutdown
Called once before the daemon SIGTERMs the process. Microapps
should flush state and reply with {"ok":true} within 5 s. The
daemon will SIGKILL after 10 s regardless.
Methods (microapp → daemon)
Outbound calls — capability-gated. The operator's
extensions.yaml lists which capabilities this microapp may use.
nexo/dispatch
Phase 82.3. Send an outbound message via a channel plugin (e.g.
WhatsApp). Requires dispatch_outbound capability.
{"id":"app:<uuid>","method":"nexo/dispatch","params":{
"to":"+573000000000",
"channel":"whatsapp",
"body":"Hello"
}}
nexo/admin/*
Phase 82.10. Operator-surface admin RPC: agents CRUD, credentials,
pairing, LLM keys, channels. Each method is gated by a separate
capability (agents_crud, credentials_crud, pairing_initiate,
llm_keys_crud, channels_crud). See
admin-rpc.md for the full surface.
Notifications (daemon → microapp)
Fire-and-forget messages the daemon pushes when an event lands. Microapps subscribe by holding the matching capability.
| Method | Capability | Payload | Phase |
|---|---|---|---|
nexo/notify/transcript_appended | transcripts_subscribe | {session_id, role, body, ts_ms} | 82.11 |
nexo/notify/pairing_status_changed | pairing_initiate | {channel, instance, status} | 82.10 |
nexo/notify/token_rotated | credentials_crud | {old_hash, new} | 82.12 |
nexo/notify/agent_event | transcripts_subscribe | {kind, agent_id, payload} | 82.11 |
Shapes
Binding context
Phase 82.1. Every tools/call carries this triple so the
microapp knows which agent / channel / account fired the tool.
{
"binding_context":{
"agent_id":"ana",
"channel":"whatsapp",
"account_id":"acme",
"binding_id":"whatsapp:acme",
"binding_index":0
}
}
account_id is the multi-tenant key. Multi-tenant SaaS microapps
key their per-tenant SQLite tables on this field. See
multi-tenant SaaS walkthrough.
Inbound message reference
Phase 82.5. Carries the original inbound message metadata (sender, timestamp, kind) so a tool handler can correlate to the trigger.
{
"inbound":{
"kind":"whatsapp_message",
"from":"+573000000000",
"ts_ms":1735689600000,
"session_id":"..."
}
}
Extension config
Loaded from extensions.yaml.entries.<id>.config and threaded
through initialize.params.config. Opaque to the daemon —
microapps validate their own schema (Phase 83.17 will add
boot-time schema validation as opt-in).
Hook decision
Phase 83.3. The microapp's vote on whether a hook should proceed.
{"vote":"allow|deny|abstain","reason":"...","metadata":{...}}
abstain is the default — microapps that don't know about a
particular hook should abstain rather than vote.
Tool call request / response
Already shown above under tools/call. The output field on
success is opaque JSON; the LLM sees its stringified form.
Error envelope
JSON-RPC error field follows the standard:
{"code":-32000,"message":"...","data":{"...optional structured info..."}}
The range -32000 to -32099 is reserved for nexo. Codes
below -32099 and standard JSON-RPC codes (-32700 parse error,
-32600 invalid request, -32601 method not found, -32602
invalid params, -32603 internal error) keep their RFC meaning.
Conventions
Tool name namespacing
Tools MUST be prefixed with the extension id followed by an
underscore: <extension_id>_<tool>. Examples:
- ✅
agent_creator_create - ✅
acme_billing_charge - ❌
create(unprefixed) - ❌
agent-creator/create(wrong separator)
The daemon validates the prefix on every initialize /
tools/list and rejects unprefixed tools so the LLM never sees
two microapps' send tools competing.
Reserved JSON-RPC error codes
-32000 to -32099 are reserved. Common codes microapps SHOULD
emit:
| Code | Meaning |
|---|---|
-32000 | Capability not granted |
-32001 | Tool input failed schema validation |
-32002 | Backend service unavailable |
-32003 | Rate limit (the microapp's own per-tool limit) |
-32004 | Auth error talking to the microapp's external service |
-32099 | Microapp internal error (catchall) |
Timeouts
The daemon's default per-call timeout is 30 seconds.
extensions.yaml.entries.<id>.timeout_secs overrides per
microapp. A timeout closes the in-flight call but leaves the
process alive; the daemon will retry the next call normally.
Backward compatibility
The contract evolves under these rules:
- Additive fields always. New fields on existing shapes
appear behind
#[serde(default)](Rust) / "missing key is default" (other langs). Microapps MUST NOT reject unknown fields. - Deprecation requires N + N+1. To remove a method or
field, the daemon emits a
tracing::warn!+ admin-ui notice in release N. The actual removal lands in N+1. - Capability matrix grows monotonically. New capabilities
default to
falsefor existing microapps; old capabilities never silently change semantics. - Wire format MUST stay UTF-8 line-JSON. A switch to length-prefixed framing or binary protocol would be a breaking change requiring an explicit major-version bump coordinated with all SDK languages.
Worked example: Python hello-world
A volunteer should be able to ship a working microapp in Python using only this doc and the standard library:
#!/usr/bin/env python3
import json
import sys
def respond(req_id, result):
sys.stdout.write(json.dumps({
"jsonrpc": "2.0", "id": req_id, "result": result
}) + "\n")
sys.stdout.flush()
for line in sys.stdin:
req = json.loads(line)
rid = req["id"]
method = req["method"]
if method == "initialize":
respond(rid, {
"tools": [{
"name": "hello_world_greet",
"description": "Echo a greeting",
"input_schema": {"type": "object", "properties": {
"name": {"type": "string"}
}, "required": ["name"]}
}],
"version": "0.1.0"
})
elif method == "tools/call":
name = req["params"]["args"]["name"]
respond(rid, {"output": {"greeting": f"hello, {name}"}})
elif method == "tools/list":
respond(rid, {"tools": [...]}) # same as initialize
elif method == "shutdown":
respond(rid, {"ok": True})
break
else:
sys.stdout.write(json.dumps({
"jsonrpc": "2.0", "id": rid,
"error": {"code": -32601, "message": f"unknown method: {method}"}
}) + "\n")
sys.stdout.flush()
Drop this in extensions/hello/main.py, mark executable, add
extensions.yaml.entries.hello: { path: "extensions/hello/main.py" },
and nexo ext install ./extensions/hello. The daemon will load
it and the LLM will see hello_world_greet in its tool catalogue.
Worked example: Go skeleton
Same protocol, idiomatic Go I/O:
package main
import (
"bufio"
"encoding/json"
"fmt"
"os"
)
type RPC struct {
JSONRPC string `json:"jsonrpc"`
ID interface{} `json:"id,omitempty"`
Method string `json:"method,omitempty"`
Params json.RawMessage `json:"params,omitempty"`
Result interface{} `json:"result,omitempty"`
Error *RPCError `json:"error,omitempty"`
}
type RPCError struct {
Code int `json:"code"`
Message string `json:"message"`
}
func main() {
scanner := bufio.NewScanner(os.Stdin)
enc := json.NewEncoder(os.Stdout)
for scanner.Scan() {
var req RPC
json.Unmarshal(scanner.Bytes(), &req)
switch req.Method {
case "initialize":
enc.Encode(RPC{JSONRPC: "2.0", ID: req.ID, Result: map[string]interface{}{
"tools": []map[string]interface{}{{
"name": "hello_go_greet",
"description": "Echo a greeting",
"input_schema": map[string]interface{}{"type": "object"},
}},
"version": "0.1.0",
}})
// tools/call, tools/list, shutdown … same pattern
default:
enc.Encode(RPC{JSONRPC: "2.0", ID: req.ID, Error: &RPCError{
Code: -32601, Message: fmt.Sprintf("unknown: %s", req.Method),
}})
}
}
}
Worked example: TypeScript / Node skeleton
import * as readline from 'readline';
const rl = readline.createInterface({ input: process.stdin });
function respond(id: any, result: any) {
process.stdout.write(JSON.stringify({ jsonrpc: '2.0', id, result }) + '\n');
}
rl.on('line', (line) => {
const req = JSON.parse(line);
switch (req.method) {
case 'initialize':
respond(req.id, {
tools: [{
name: 'hello_ts_greet',
description: 'Echo a greeting',
input_schema: { type: 'object' }
}],
version: '0.1.0'
});
break;
// tools/call, tools/list, shutdown — same pattern
default:
process.stdout.write(JSON.stringify({
jsonrpc: '2.0', id: req.id,
error: { code: -32601, message: `unknown: ${req.method}` }
}) + '\n');
}
});
Reference: Rust SDK shortcut
For Rust microapps, the nexo-microapp-sdk crate (Phase 83.4)
hides the wire details. See Building microapps in
Rust for the high-level API. The SDK implements this
contract verbatim — anything you can do via the SDK you can do by
hand, but the SDK is the recommended path because it stays in
lockstep with the daemon's contract version.
agent-creator — SaaS meta-microapp (Phase 83.8)
A reference microapp that drives the framework as a multi-tenant SaaS meta-creator of WhatsApp agents. Operators (the SaaS owner) provision one daemon per company; clients (tenants) CRUD their own agents, skills, LLM keys, and conversation views through the microapp.
Lives out of the workspace at
/home/familia/chat/agent-creator-microapp/. Pulls
nexo-microapp-sdk + nexo-tool-meta + nexo-compliance-primitives
via path deps during dev; switch to crates.io once published.
Tool surface (22 tools)
Agents — Phase 83.8.8
| Tool | Backed by |
|---|---|
agent_list | nexo/admin/agents/list |
agent_get | nexo/admin/agents/get |
agent_upsert | nexo/admin/agents/upsert |
agent_delete | nexo/admin/agents/delete |
Skills — Phase 83.8.8
| Tool | Backed by |
|---|---|
skill_list | nexo/admin/skills/list |
skill_get | nexo/admin/skills/get |
skill_upsert | nexo/admin/skills/upsert |
skill_delete | nexo/admin/skills/delete |
The skill body lands at <root>/<name>/SKILL.md — the runtime
SkillLoader reads it on every agent turn, so a CRUD round-trip
shows up in the agent's prompt without a daemon restart.
LLM providers — Phase 83.8.8
| Tool | Backed by |
|---|---|
llm_provider_list | nexo/admin/llm_providers/list |
llm_provider_upsert | nexo/admin/llm_providers/upsert |
llm_provider_delete | nexo/admin/llm_providers/delete |
Pairing — Phase 83.8.9
| Tool | Backed by |
|---|---|
whatsapp_pair_start | nexo/admin/pairing/start |
whatsapp_pair_status | nexo/admin/pairing/status |
whatsapp_pair_cancel | nexo/admin/pairing/cancel |
Conversations — Phase 83.8.9
| Tool | Backed by |
|---|---|
conversation_list | nexo/admin/agent_events/list |
conversation_read | nexo/admin/agent_events/read |
conversation_search | nexo/admin/agent_events/search |
The live firehose (nexo/notify/agent_event) is consumed by the
SDK TranscriptStream::filter_by_agent helper — multi-tenant
defense-in-depth drops events whose agent_id is not in the
tenant's allowed set before the microapp ever sees the frame.
Operator takeover — Phase 83.8.9
| Tool | Backed by |
|---|---|
takeover_engage | SDK HumanTakeover::engage → nexo/admin/processing/pause |
takeover_send | HumanTakeover::send_reply → nexo/admin/processing/intervention |
takeover_release | HumanTakeover::release → nexo/admin/processing/resume |
takeover_send flows operator-typed replies through the
ChannelOutboundDispatcher trait wired in Phase 83.8.4.a — Phase
83.8.4.b ships the production
BrokerOutboundDispatcher (nexo_setup::admin_adapters) that
publishes to the per-channel
plugin.outbound.<channel>[.<account>] topic each plugin's
existing dispatcher already listens on. WhatsApp translator
ships in v1; Telegram + Email translators are TBD per-channel
follow-ups (83.8.4.b.tg / 83.8.4.b.em).
Escalations — Phase 83.8.9
| Tool | Backed by |
|---|---|
escalation_list | nexo/admin/escalations/list |
escalation_resolve | nexo/admin/escalations/resolve |
EscalationReason::UnknownQuery (Phase 83.8.5) covers the
"agent doesn't know" UI notification path.
Compliance hook — Phase 83.8.10
The before_message hook chains:
OptOutMatcher(Spanish + English keywords) →Abort.AntiLoopDetector(3 repetitions in 60 s) →Abort.PiiRedactor(cards / phones / emails) → log redaction stats.
Defaults-on. Per-agent override propagation through
extensions_config.compliance is logged in FOLLOWUPS.md as a
framework follow-up — needs the wire shape on BindingContext.
Capabilities (plugin.toml)
[capabilities.admin]
required = [
"agents_crud", "skills_crud", "llm_keys_crud",
"pairing_initiate", "transcripts_read",
"operator_intervention",
"escalations_read", "escalations_resolve",
]
optional = ["credentials_crud", "channels_crud"]
The operator grants these in
extensions.yaml.<id>.capabilities_grant. Missing required →
boot-time fail-fast; missing optional → handler-time -32004.
SDK opt-in
#![allow(unused)] fn main() { Microapp::new(APP_NAME, env!("CARGO_PKG_VERSION")) .with_admin() // Phase 83.8.8.a .with_hook("before_message", hooks::compliance::before_message) .with_tool("agent_list", tools::agents::agent_list) // … 21 more tools .run_stdio() .await }
with_admin() wires the SDK AdminClient through the same
stdout writer the daemon-reply path uses, intercepts inbound
app: correlation IDs, and exposes the client through
ToolCtx::admin() / HookCtx::admin(). Tool handlers do no
hand-rolled JSON-RPC plumbing — every admin call is one
ctx.admin()?.call("nexo/admin/<method>", ¶ms).await.
Stress-test methodology
This microapp exists to stress-test the framework. Friction encountered during construction triggers a framework fix (agnostic + reusable by other microapps), not a microapp-side workaround. Five gaps closed during the v1 build:
nexo/admin/skills/*CRUD missing → end-to-end shipped.processing.interventiondid not dispatch outbound →ChannelOutboundDispatchertrait + handler wire.- SDK
AdminClienthad no runtime integration →Microapp::with_admin()builder + ToolCtx accessor. - Operator UI needed
EscalationReason::UnknownQuery→ variant added. - SDK lacked
HumanTakeover+TranscriptStream::filter_by_agenthelpers → both shipped.
See FOLLOWUPS.md (workspace root) for the active
deferred-follow-up list.
Templates — language-by-language reference
This page lists the starting points for authoring a nexo microapp in each supported language.
The contract (contract.md) is the source of
truth — line-delimited JSON-RPC over stdio. Every template
below ships a working initialize → tools/list → tools/call → shutdown loop against that contract. They differ only in
ergonomics and per-language idioms.
Rust (recommended) — nexo-microapp-sdk
Where: extensions/template-microapp-rust/ in the
nexo-rs repo.
Why use the SDK: the daemon's contract version evolves under
N+N+1 deprecation rules. The Rust SDK lives in lockstep with
the daemon, so an additive field on the wire becomes an
additive field on ToolCtx / HookCtx automatically. Hand-
rolled parsers risk silent drift.
Quick start:
cp -r /path/to/nexo-rs/extensions/template-microapp-rust ./mi-microapp
cd ./mi-microapp
# rename in Cargo.toml + plugin.toml + src/main.rs
cargo build --release
See rust.md for the full SDK reference and getting-started.md for the 1-hour walkthrough.
SDK feature flags:
| Feature | Adds |
|---|---|
| (default) | Microapp builder + tool/hook handlers |
outbound | OutboundDispatcher for nexo/dispatch outbound calls |
admin | AdminClient for nexo/admin/* calls (capability-gated) |
test-harness | MicroappTestHarness + MockBindingContext for unit tests |
Python — hand-rolled (stdlib only)
No SDK ships today. Authors implement the wire protocol
directly using sys.stdin / sys.stdout / json. The
contract doc has a full worked example.
Skeleton:
#!/usr/bin/env python3
import json
import sys
def respond(req_id, result):
sys.stdout.write(json.dumps({
"jsonrpc": "2.0", "id": req_id, "result": result
}) + "\n")
sys.stdout.flush()
for line in sys.stdin:
req = json.loads(line)
rid = req["id"]
method = req["method"]
if method == "initialize":
respond(rid, {
"tools": [{
"name": "myapp_greet",
"description": "Echo a greeting",
"input_schema": {"type": "object", "properties": {
"name": {"type": "string"}
}, "required": ["name"]}
}],
"version": "0.1.0"
})
elif method == "tools/call":
name = req["params"]["args"]["name"]
respond(rid, {"output": {"greeting": f"hello, {name}"}})
elif method == "tools/list":
respond(rid, {"tools": [...]}) # same as initialize
elif method == "shutdown":
respond(rid, {"ok": True})
break
else:
sys.stdout.write(json.dumps({
"jsonrpc": "2.0", "id": rid,
"error": {"code": -32601, "message": f"unknown method: {method}"}
}) + "\n")
sys.stdout.flush()
plugin.toml:
[plugin]
id = "my-python-microapp"
version = "0.1.0"
name = "My Python Microapp"
[capabilities]
tools = ["myapp_greet"]
[transport]
kind = "stdio"
command = "python3"
args = ["./main.py"]
Library tips:
pydanticfor the JSON-RPC envelopes if you want typed parsing.anyioif you need async tool handlers.- For test, run the binary as a subprocess and pipe JSON-RPC frames in/out.
TypeScript / Node — hand-rolled
Same shape as Python; Node's readline does the line-splitting.
Skeleton:
import * as readline from 'readline';
const rl = readline.createInterface({ input: process.stdin });
function respond(id: any, result: any) {
process.stdout.write(JSON.stringify({ jsonrpc: '2.0', id, result }) + '\n');
}
rl.on('line', (line) => {
const req = JSON.parse(line);
switch (req.method) {
case 'initialize':
respond(req.id, {
tools: [{
name: 'myapp_greet',
description: 'Echo a greeting',
input_schema: { type: 'object' }
}],
version: '0.1.0'
});
break;
case 'tools/call':
respond(req.id, { output: { greeting: `hello, ${req.params.args.name}` } });
break;
case 'shutdown':
respond(req.id, { ok: true });
process.exit(0);
default:
process.stdout.write(JSON.stringify({
jsonrpc: '2.0', id: req.id,
error: { code: -32601, message: `unknown: ${req.method}` }
}) + '\n');
}
});
plugin.toml:
[plugin]
id = "my-ts-microapp"
[transport]
kind = "stdio"
command = "node"
args = ["./dist/main.js"]
Library tips:
@types/nodefor stdio types.zodfor tool input schema validation server-side.bunworks as a drop-in fornodeand gives faster startup.
Go — hand-rolled
Same shape; bufio.Scanner for line reading.
Skeleton:
package main
import (
"bufio"
"encoding/json"
"fmt"
"os"
)
type RPC struct {
JSONRPC string `json:"jsonrpc"`
ID interface{} `json:"id,omitempty"`
Method string `json:"method,omitempty"`
Params json.RawMessage `json:"params,omitempty"`
Result interface{} `json:"result,omitempty"`
Error *RPCError `json:"error,omitempty"`
}
type RPCError struct {
Code int `json:"code"`
Message string `json:"message"`
}
func main() {
scanner := bufio.NewScanner(os.Stdin)
enc := json.NewEncoder(os.Stdout)
for scanner.Scan() {
var req RPC
json.Unmarshal(scanner.Bytes(), &req)
switch req.Method {
case "initialize":
enc.Encode(RPC{JSONRPC: "2.0", ID: req.ID, Result: map[string]interface{}{
"tools": []map[string]interface{}{{
"name": "myapp_greet",
"description": "Echo a greeting",
"input_schema": map[string]interface{}{"type": "object"},
}},
"version": "0.1.0",
}})
case "shutdown":
enc.Encode(RPC{JSONRPC: "2.0", ID: req.ID, Result: map[string]bool{"ok": true}})
return
default:
enc.Encode(RPC{JSONRPC: "2.0", ID: req.ID, Error: &RPCError{
Code: -32601, Message: fmt.Sprintf("unknown: %s", req.Method),
}})
}
}
}
plugin.toml:
[transport]
kind = "stdio"
command = "./my-go-microapp" # the compiled binary
Choosing a language
| Use case | Recommended stack |
|---|---|
| Multi-tenant SaaS, performance-sensitive | Rust + SDK |
| Quick prototype / glue to existing Python data pipeline | Python + stdlib |
| TypeScript shop, integration with web ecosystem | TypeScript + stdlib |
| Single-binary distribution to ops, no runtime dep | Go + stdlib |
Rule of thumb: if your microapp is the product, use Rust + SDK so contract evolution is automatic. If your microapp glues to another runtime you already maintain, use the host language and pin the contract version explicitly in your code.
Contract version pinning
Whichever language you pick, your microapp MUST be aware of the
contract version it was tested against. The Rust SDK pins it
via Cargo.toml = "0.1"; hand-rolled microapps MUST embed a
constant + assert at boot.
NEXO_CONTRACT_VERSION = "0.1"
# Future: read daemon's `initialize` response for a contract_version
# field and warn if it disagrees.
The contract doc's backward compat rules apply: additive fields always, deprecation N + N+1, wire format frozen.
See also
- contract.md — language-agnostic spec
- rust.md — Rust SDK reference
- getting-started.md — 1-hour walkthrough
- compliance-primitives.md — when to use which compliance helper (Rust today; spec is portable)
CLI reference
Single source of truth for every agent subcommand, flag, exit code,
and env var. agent is the one binary you'll ever run in production
— this is everything it can do.
Source: src/main.rs (Mode enum + parse_args),
crates/extensions/src/cli/, crates/setup/src/.
Invocation
agent [--config <dir>] [<subcommand> ...]
- Arg parser: hand-rolled, not
clap.--help/-hwork;-cis not an alias for--config(case-sensitive exact match). - No subcommand → run the daemon (default).
- Global flag:
--config <dir>(default./config).
Global environment variables
| Variable | Values | Purpose |
|---|---|---|
RUST_LOG | tracing-subscriber filter | Log level (e.g. info,agent=debug). Default info. |
AGENT_LOG_FORMAT | pretty | compact | json | Log format. Default pretty. |
AGENT_ENV | production (or prod) | Triggers JSON logs unless AGENT_LOG_FORMAT overrides. |
TASKFLOW_DB_PATH | file path | Flow CLI DB (default ./data/taskflow.db). |
CONFIG_SECRETS_DIR | dir path | Whitelists an extra root for ${file:...} YAML refs. |
Exit codes (generic)
| Code | Meaning |
|---|---|
0 | Success |
1 | General failure (not found, config invalid, connection refused) |
2 | Warnings-only outcome (currently only --check-config non-strict) |
Ext subcommand has its own richer code table — see below.
Subcommand index
| Subcommand | Purpose |
|---|---|
| (default) | Run the agent daemon |
init | Scaffold sample YAMLs (Phase 95) |
set-broker | Switch broker.yaml between local and nats (Phase 92.9) |
setup | Interactive credential wizard |
status | Query running agent instances |
dlq | Dead-letter queue inspection |
ext | Extension management |
flow | TaskFlow operations |
mcp-server | Run as MCP stdio server |
admin | Run the web admin UI behind a Cloudflare quick tunnel |
reload | Trigger config hot-reload on a running daemon |
--check-config | Pre-flight config validation |
--dry-run | Load config and print the plan |
Daemon (default)
agent [--config ./config]
Boots every configured agent runtime, connects to the broker (NATS or
local fallback), starts metrics (:9090), health (:8080), and admin
(:9091 loopback) servers.
Exit codes:
0— clean shutdown via SIGTERM / Ctrl+C1— config load failed, broker unreachable at startup, plugin failed to initialize
Logs to: stderr. See Logging.
init
Scaffold sample YAMLs into the config dir. Templates are baked
into the binary at compile time (include_str!), so this works
on a fresh install with zero network access.
agent init # all 19 templates → ${XDG_CONFIG_HOME:-~/.config}/nexo
agent init --output /etc/nexo-rs # custom dir
agent init --yaml broker,llm # shorthand: only those two
agent init --yaml plugins/whatsapp # plugin subdir templates
agent init --force # overwrite existing files
agent init --stdout --yaml broker # print one template to stdout (no file write)
Yaml filter shorthand: bare names (broker, agents, llm,
memory, extensions, mcp, mcp_server, runtime, pollers,
taskflow, transcripts, pairing, webhook_receiver) resolve
to top-level YAMLs. Plugin subdir templates: plugins/whatsapp,
plugins/telegram, plugins/email, plugins/browser,
plugins/discovery. Persona templates: personas/discovery.
Exit codes: 0 on write, 1 on filter mismatch, 2 if --force
not passed and target exists.
Postinst scripts in the .deb / .rpm / Termux packages call
agent init --output <CONFIG_DIR> automatically on first install
so a fresh-from-package operator never starts from a blank dir.
set-broker
Switch the broker mode without editing broker.yaml by hand.
Rewrites broker.yaml to the requested kind, then (by default)
sends SIGHUP to every running daemon that loaded this config
dir — the daemon respawns with the new broker (~3s blackout for
in-flight messages, drained from the persistence layer).
agent set-broker local # stdio bridge (no NATS server)
agent set-broker nats --url nats://localhost:4222 # multi-host mode
agent set-broker local --no-signal # edit YAML only, daemon stays on old broker until restart
agent --config /etc/nexo-rs set-broker nats --url nats://10.0.0.5:4222
local mode uses the daemon-derived stdio_bridge transport for
subprocess plugins — no NATS server required. nats mode requires
a reachable NATS server at --url; subprocess plugins inherit
NEXO_BROKER_URL and connect via async-nats.
See broker shapes for the full architectural picture and zero-config quickstart for the typical operator flow.
Exit codes: 0 on success, 1 if nats requested without --url
or YAML write failed, 2 if no daemon matched (YAML still updated;
user must start the daemon manually).
setup
Interactive credential wizard. Launches a prompt-driven flow for every service you want to enable — LLM keys, WhatsApp QR, Telegram bot token, Google OAuth, etc.
agent setup # full interactive wizard
agent setup list # list installable service ids
agent setup <service> # configure one service (e.g. minimax, whatsapp)
agent setup doctor # validate every credential / token (also runs the Phase 70.6 pairing-store audit)
agent setup telegram-link # print Telegram bot link-to-chat URL
Exit codes: 0 on completion; 1 on error.
See Setup wizard for the step-by-step.
status
Query the running daemon via the loopback admin console.
agent status # every agent, table
agent status ana # one agent, table
agent status --json # raw JSON
agent status --endpoint http://remote:9091 # override endpoint
Table output columns: ID | MODEL | BINDINGS | DELEGATES | DESCRIPTION
Exit codes:
0— query succeeded1— endpoint unreachable or agent id not found
dlq
Dead-letter queue inspection. See DLQ operations for the full picture.
agent dlq list # plain-text table, up to 1000 entries
agent dlq replay <id> # move back to pending_events for retry
agent dlq purge # drop every entry (destructive)
Exit codes: 0 success; 1 failure (entry not found, DB error).
list columns: id | topic | failed_at | reason.
ext
Extension management. See Extensions — CLI for details and workflows.
agent ext list [--json]
agent ext info <id> [--json]
agent ext enable <id>
agent ext disable <id>
agent ext validate <path>
agent ext doctor [--runtime] [--json]
agent ext install <path> [--update] [--enable] [--dry-run] [--link] [--json]
agent ext uninstall <id> --yes [--json]
Flags:
| Flag | Where | Purpose |
|---|---|---|
--json | list / info / doctor / install / uninstall | Machine-readable output |
--runtime | doctor | Also spawn stdio extensions to verify handshake |
--update | install | Overwrite if already installed |
--enable | install | Flip to enabled: true in extensions.yaml |
--link | install | Symlink source (absolute path required) instead of copy |
--dry-run | install | Validate without writing |
--yes | uninstall | Required confirmation |
Exit codes (extension-specific):
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Extension not found / --update target missing |
| 2 | Invalid manifest / invalid source / --link needs absolute path |
| 3 | Config write failed |
| 4 | Invalid id (reserved or empty) |
| 5 | Target exists (use --update) |
| 6 | Id collision across roots |
| 7 | uninstall missing --yes confirmation |
| 8 | Copy / atomic swap failed |
| 9 | Runtime check(s) failed (doctor --runtime) |
flow
TaskFlow operations. See TaskFlow — FlowManager.
agent flow list [--json]
agent flow show <id> [--json]
agent flow cancel <id>
agent flow resume <id>
Env var: TASKFLOW_DB_PATH (default ./data/taskflow.db).
Exit codes: 0 success; 1 on error (flow not found, wrong
state, DB inaccessible).
list sorts by updated_at DESC; show includes every recorded
step; resume only works on Manual or ExternalEvent waits.
mcp-server
Run the agent as an MCP stdio server so MCP clients (Claude Desktop, Cursor, Zed) can consume its tools.
agent mcp-server
- Reads JSON-RPC from stdin, writes responses to stdout
- Does not boot a daemon or broker
- Requires
config/mcp_server.yamlwithenabled: true
Exit codes: 0 on clean exit; 1 if mcp_server.yaml disabled.
See MCP — Agent as MCP server for deployment recipes (Claude Desktop config, allowlist, auth token).
admin
Run the web admin UI behind a fresh Cloudflare quick tunnel. A new ephemeral trycloudflare.com URL is minted on every launch — no account, no DNS, no TLS setup.
agent admin # listen on 127.0.0.1:9099 (default)
agent admin --port 9199 # pick a different loopback port
agent admin --port=9199 # same thing, equals form
What happens on launch:
- Install cloudflared if missing. The tunnel crate detects the host OS/arch and downloads the matching cloudflared binary into the platform data dir. Subsequent launches reuse the cached copy.
- Mint a fresh random password. 24 URL-safe characters from the
OS RNG. Printed once to stdout — copy it now; there is no
recovery short of relaunching
agent admin. - Start a loopback HTTP server. Listens on
127.0.0.1:<port>and serves the React bundle embedded at Rust compile time (seeadmin-ui/) behind HTTP Basic Auth. A bundle-missing fallback page is served ifadmin-ui/dist/was empty whencargo buildran. - Open a quick tunnel.
cloudflared tunnel --url http://127.0.0.1:<port>returns an ephemeralhttps://…trycloudflare.comURL, which the command prints to stdout alongside the username (admin) and the freshly-minted password. - Wait for Ctrl+C / SIGTERM. Graceful shutdown kills the cloudflared child and stops the HTTP listener.
Exit codes:
0— clean shutdown1— cloudflared install failed, port already bound, or tunnel negotiation failed
Notes:
- URL is re-generated every launch. If you need a stable URL, switch to a named Cloudflare tunnel (requires an account and wrangler config — out of scope for this command).
- Auth is HTTP Basic for now; the browser prompts for
admin/<password>on first load. Username is fixed; password is fresh every launch. Keep the shell scrollback if you need to re-paste it. - The password is never persisted — losing it means stopping
agent adminand starting again (which also rotates the tunnel URL).
reload
Triggers a config hot-reload on a running daemon. Publishes
control.reload on the broker the daemon is listening to (resolved
from broker.yaml), subscribes-before-publish to
control.reload.ack, waits up to 5 s, and prints the outcome.
agent reload # human-readable summary
agent reload --json # serialized ReloadOutcome
Example output:
$ agent reload
reload v7: applied=2 rejected=0 elapsed=18ms
✓ ana
✓ bob
Exit codes:
0— at least one agent reloaded1— no ack within 5 s (daemon not running)2— every agent rejected
Full semantics — what's reloaded, apply-on-next-message, failure modes — in Config hot-reload.
--check-config
Pre-flight validation. Loads every YAML file, resolves env vars, checks schema, validates credentials. No broker, no daemon. Meant for CI.
agent --check-config # warnings-only mode
agent --check-config --strict # warnings become errors
Exit codes:
0— all clear1— hard errors (missing required creds, invalid schema)2— warnings only (non-strict mode)
--dry-run
Load the config and print a plan. Doesn't connect to the broker or start any runtime task.
agent --dry-run
agent --dry-run --json
Output (plain text):
- Config directory
- Broker kind (nats | local)
- Plugin list
- Agent directory table (id, model, bindings, delegates, description)
Exit codes: 0 valid; 1 on error.
Daemon admin endpoints
Reference for status --endpoint and anyone wiring a custom
dashboard:
| Endpoint | Method | Bind | Purpose |
|---|---|---|---|
/admin/agents | GET | 127.0.0.1:9091 | List every agent (JSON) |
/admin/agents/<id> | GET | 127.0.0.1:9091 | Single agent (JSON) |
/admin/tool-policy | GET | 127.0.0.1:9091 | Tool policy queries |
/admin/credentials/reload | POST | 127.0.0.1:9091 | Phase 17 — re-read agents/plugins YAML and atomically swap the credential resolver. Returns ReloadOutcome JSON. See config/credentials.md. |
/health | GET | 0.0.0.0:8080 | Liveness probe |
/ready | GET | 0.0.0.0:8080 | Readiness probe |
/metrics | GET | 0.0.0.0:9090 | Prometheus |
/whatsapp/pair* | GET | 0.0.0.0:8080 | WhatsApp pairing QR (first instance) |
/whatsapp/<instance>/pair* | GET | 0.0.0.0:8080 | Multi-instance WhatsApp pairing |
Cross-links
Gotchas
- Hand-rolled parser. Unexpected flag ordering can produce "unknown argument" errors that are less forgiving than clap-based CLIs. Stick to the form shown in each subcommand.
- Global
--configmust come before the subcommand.agent --config ./x ext listworks;agent ext list --config ./xdoes not. - Admin console is loopback-only.
status --endpointagainst a remote host requires a tunnel; it won't listen publicly.
Background agents — agent run --bg / agent ps / agent attach / agent discover
Operator-side CLI for spawning, listing, and inspecting goals that
should outlive the spawning shell. Pairs with assistant mode:
the agent runs in the background while you go do something else;
you check in via agent ps / agent attach whenever convenient.
SessionKind
Every goal handle carries a SessionKind enum identifying how it
was spawned and how it should survive a daemon restart:
| Kind | Meaning | Survives restart |
|---|---|---|
Interactive | User-driven REPL turn or chat-channel inbound (default) | No — Phase 71 reattach flips Running → LostOnRestart |
Bg | Operator spawned a detached goal via agent run --bg | Yes — keeps Running |
Daemon | Persistent supervised goal (e.g. assistant_mode binding's always-on agent loop) | Yes |
DaemonWorker | Worker spawned BY a Daemon goal — short-lived sub-agent | Yes (treated like Bg for reattach) |
Schema migration v5 adds the kind column to agent_handles SQLite
table; pre-80.10 rows default to Interactive automatically.
agent run [--bg] <prompt>
Spawns a new goal handle. With --bg, sets kind = Bg + phase_id = "cli-bg" + returns the goal_id immediately so the operator can
detach. Without --bg, sets kind = Interactive + phase_id = "cli-run".
$ nexo agent run --bg "review the latest commits and post a summary"
[agent-run] goal_id=a9f62654-688b-4e41-95c9-1ec2a1a39f6d
[agent-run] kind=bg
[agent-run] status=running (queued for daemon pickup)
[agent-run] prompt: review the latest commits and post a summary
[agent-run] detached — re-attach later with `nexo agent attach a9f62654-...`
Validates that the prompt is non-empty. JSON output via --json.
Note: the slim MVP inserts the row into
agent_handlesbut the daemon-side pickup of queued goals is deferred to 80.10.g. For now, the row sitsRunninguntil manually transitioned viaagent attach(Phase 80.16) or a future supervisor.
agent ps [--all] [--kind=...] [--db=<path>] [--json]
Reads agent_handles.db read-only and renders a markdown table.
$ nexo agent ps
# Agent runs (db: /home/.../state/agent_handles.db)
| ID | Kind | Status | Phase | Started | Ended |
|----------|------|---------|----------|---------------------|-------|
| a9f62654 | bg | Running | cli-bg | 2026-04-30T19:03:01 | - |
1 rows shown.
Default filter: only Running goals. Use --all to include
terminal rows. Use --kind=bg (or interactive / daemon /
daemon_worker) to narrow.
JSON output for scripting:
$ nexo agent ps --json | jq '.[] | select(.kind == "bg")'
Empty / missing-DB case prints a friendly message and exits 0:
(no agent runs recorded yet — db not found at /home/.../agent_handles.db)
agent discover [--include-interactive] [--db=<path>] [--json]
Operator's "what is running detached?" view. Filtered to Bg /
Daemon / DaemonWorker kinds by default. Pass
--include-interactive to broaden.
$ nexo agent discover
# Discoverable goals (db: /home/.../state/agent_handles.db)
| ID | Kind | Phase | Started | Last activity |
|----------|------|--------|---------------------|----------------------|
| a9f62654 | bg | cli-bg | 2026-04-30T19:03:01 | 2026-04-30T19:25:42 |
1 goal(s).
Sort: started_at descending (newest first).
Empty result includes a hint:
(no detached / daemon goals running; pass --include-interactive to broaden)
agent attach <goal_id> [--db=<path>] [--json]
Read-only viewer of a goal's latest persisted snapshot. Live
event streaming via NATS lands in 80.16.b — for now, the command
shows the most recent AgentSnapshot from the registry.
$ nexo agent attach a9f62654-688b-4e41-95c9-1ec2a1a39f6d
# Agent Goal a9f62654-688b-4e41-95c9-1ec2a1a39f6d
- **kind**: bg
- **status**: Running
- **phase_id**: cli-bg
- **started_at**: 2026-04-30 19:03:01 UTC
## Last progress
Reviewed last 12 commits, drafted summary, awaiting outbound channel hookup.
- **turn_index**: 4/30
- **last_event_at**: 2026-04-30 19:25:42 UTC
[attach] Live event stream requires daemon connection — re-run with NATS available (Phase 80.16.b follow-up).
For terminal goals, the hint changes:
[attach] Goal is in terminal state Done; no further updates expected.
Validates the UUID upfront (exit 1 with "is not a valid UUID"); exits 1 with "no agent handle found" when the row is absent.
Database path resolution
All four commands resolve the agent_handles.db path the same
3-tier way (mirrors agent dream from Phase 80.1.d):
--db <path>(explicit override, beats everything)NEXO_STATE_ROOTenv →<state_root>/agent_handles.db- XDG default
~/.local/share/nexo/state/agent_handles.db
The YAML tier is intentionally absent — agents.state_root is
not a config field today; state_root flows into BootDeps
directly per Phase 80.1.b.b.b documentation. Set
NEXO_STATE_ROOT to align the CLI with your daemon's actual
data dir.
Reattach kind-aware
Phase 71 reattach (boot-time recovery) is now SessionKind-aware
since 80.10:
kind == Interactive+Runningpre-restart → flip toLostOnRestart(the user is gone; no caller waiting)kind ∈ {Bg, Daemon, DaemonWorker}+Runningpre-restart → keepRunning(the operator expects them to survive)
Use reattach_running_kind_aware() from
SqliteAgentRegistryStore. The legacy non-kind-aware
reattach_running stays for backward callers.
Deferred follow-ups
- 80.10.b = Phase 80.16 —
nexo agent attachTTY re-attach (already shipped in DB-only viewer mode) - 80.10.c — Daemon supervisor process for
Daemon/DaemonWorkerkinds (separate process lifecycle distinct from the interactive daemon) - 80.10.d —
nexo agent kill <goal_id>graceful abort signal - 80.10.e —
nexo agent logs <goal_id>re-stream goal output without attaching - 80.10.f — Phase 77.17 schema-migration system integration
- 80.10.g — Daemon-side pickup of queued goals: today the CLI inserts the row but no daemon worker consumes it automatically
- 80.16.b — Live event streaming via NATS subscribe
(
agent.registry.snapshot.<goal_id>+agent.driver.>filtered by goal_id payload) - 80.16.c — User input piping via
agent.inbox.<goal_id>(already wired in Phase 80.11 — multi-agent coordination uses the same channel; CLI input piping is the user-facing consumer)
See also
Migrations CLI
Versioned YAML schema migrations are now available for operator config
files under config/.
Commands
nexo setup migrate --dry-run(default behavior) — reports pending file migrations and target schema version without writing files.nexo setup migrate --apply— applies pending migrations in place.
Each migrated file carries a top-level schema_version marker. The
loader tolerates this metadata field and strips it before strict typed
deserialization.
Boot and hot-reload behavior
runtime.yamlaccepts:
migrations:
auto_apply: true
auto_apply: truemakes boot + Phase 18 hot-reload apply pending config schema migrations before loading the runtime snapshot.auto_apply: false(default) leaves files untouched and prints a pending-migrations warning with file/version pairs.
Notes
- The migration functions are idempotent and versioned.
setup migrate --applyis the safest path for explicit review-driven upgrades in production environments.
See also:
Docker
Production deployment as a compose stack: nats broker + nexo
runtime, Docker secrets for credentials, persistent volumes for SQLite
data and the disk queue.
Source: docker-compose.yml, Dockerfile, config/docker/.
Pre-built image at GHCR
Every push to main and every v* tag publishes a multi-arch image
(linux/amd64 + linux/arm64) at:
ghcr.io/lordmacu/nexo-rs:latest # latest tagged release
ghcr.io/lordmacu/nexo-rs:v0.1.1 # exact version
ghcr.io/lordmacu/nexo-rs:edge # latest main commit
ghcr.io/lordmacu/nexo-rs:main-<sha> # pinned to a specific commit
Pull and run:
docker pull ghcr.io/lordmacu/nexo-rs:latest
docker run --rm \
-v $(pwd)/config:/app/config:ro \
-v $(pwd)/data:/app/data \
-p 8080:8080 -p 9090:9090 \
ghcr.io/lordmacu/nexo-rs:latest
Build pipeline: .github/workflows/docker.yml. Tags + labels follow
OCI image spec and are
generated by docker/metadata-action. Image carries SBOM and SLSA
provenance attestations (verify with docker buildx imagetools inspect).
Compose layout
flowchart LR
subgraph STACK[docker-compose]
NATS[nats:2.10<br/>:4222 client<br/>:8222 monitoring]
AG[nexo<br/>:8080 health<br/>:9090 metrics]
end
AG --> NATS
VOL1[(./config RO)] --> AG
VOL2[(./data RW)] --> AG
VOL3[(./extensions RO)] --> AG
SEC[/run/secrets/...] --> AG
IDE[MCP clients] -.->|port 8080| AG
PROM[Prometheus] -.->|port 9090| AG
docker-compose.yml
Two services, healthchecks on both, shared volumes:
nats—nats:2.10-alpine, exposes:4222for agent clients and:8222for monitoring (healthcheck hits:8222/healthz)nexo— the main runtime- Ports:
:8080(health),:9090(metrics) - Environment:
RUST_LOG=info,AGENT_ENV=production shm_size: 1gb— required for Chrome processes (browser plugin)- Bind mounts:
./config:/app/config:ro,./data:/app/data:rw,./extensions:/app/extensions:ro depends_on: { nats: { condition: service_healthy } }
- Ports:
Dockerfile
Multi-stage:
- Builder — Rust
cargo build --release --locked - Runtime —
debian:bookworm-slimwith operational tools baked in:ca-certificates,libsqlite3-0- Python + ffmpeg + tmux + yt-dlp + tesseract (for skills that need them)
- Google Chrome on amd64 (OAuth + Widevine work); falls back to Chromium on arm64
cloudflared(downloaded perTARGETARCHat build time)dumb-initas PID 1
Entry point: /usr/local/bin/nexo --config /app/config.
Exposed ports: 8080, 9090.
Config overrides — config/docker/
Mirrors the main config layout. The compose service mounts the production overrides path:
command: ["nexo", "--config", "/app/config/docker"]
Key differences in the docker overrides:
broker.yaml— NATS URL points at the Docker service name (nats://nats:4222); persistence at/app/data/queue/broker.dbllm.yaml— reads API keys from/run/secrets/<name>- Other files (
agents.yaml,memory.yaml,extensions.yaml) override defaults for container paths
Secrets
The compose file declares Docker secrets and the config overrides reference them:
services:
nexo:
secrets:
- minimax_api_key
- minimax_group_id
- google_client_id
- google_client_secret
secrets:
minimax_api_key:
file: ./secrets/minimax_api_key.txt
minimax_group_id:
file: ./secrets/minimax_group_id.txt
...
Config reads them via the ${file:/run/secrets/...} syntax. Secrets
appear as mode-0400 files inside the container — nothing ever touches
env vars.
Operating the stack
docker compose up -d # start
docker compose logs -f nexo # follow logs
docker compose exec nexo nexo ext list
docker compose exec nexo nexo dlq list
docker compose restart nexo # rolling reload (SIGTERM → 5 s grace)
docker compose down # stop (preserves volumes)
Scaling
- Horizontal scaling needs an external NATS cluster. Running the
compose with two
agentreplicas pointed at a single NATS server works for isolated workloads but duplicate-delivery across agents on the same topic is not avoided by the compose itself — the single-instance lockfile (see Fault tolerance) assumes one agent process per data directory. - For real scale: one NATS cluster + N agent processes, each with its
own
./data/volume.
Health checks for orchestration
services:
nexo:
healthcheck:
test: ["CMD", "curl", "-f", "http://127.0.0.1:8080/ready"]
interval: 10s
timeout: 3s
retries: 3
start_period: 30s
Readiness gate is /ready (covered in metrics + health).
start_period needs to cover first-boot extension discovery + all
agent runtimes attaching to their topics.
Gotchas
- Volume ownership. Don't mount
./dataas root-owned if your container runs as non-root. The runtime will fail to write the SQLite files and you'll only see crypticreadonly databaseerrors. - Chrome needs
/dev/shmspace. Theshm_size: 1gbis not optional when the browser plugin is active — Chrome processes silently corrupt their state if starved. config/docker/is committed, secrets are not../secrets/is gitignored. Populate it before the firstcompose up.
Slim daemon builds (Cargo feature-gates)
Phase 93.12.a (2026-05-15) introduced Cargo feature-gates for canonical plugin crates so operators targeting embedded or mobile (Android Flutter FFI, slim Docker images) can ship a daemon binary without the optional plugin crates in its compile graph.
Available features
| Feature | Default | Drops crate |
|---|---|---|
plugin-telegram | ✅ on | nexo-plugin-telegram |
plugin-whatsapp | ✅ on | nexo-plugin-whatsapp |
plugin-browser | off | (no-op placeholder; browser already has no Cargo dep) |
email is NOT feature-gated — structurally in-process by
design (Phase 93.11 audit, bucket D). Autonomous worker +
EmailToolContext + /metrics rendering all hold
Arc<EmailPlugin> in-process. No subprocess driver today.
whatsapp gate (93.12.c.1 + 93.12.c.2, shipped)
Both halves shipped — slim daemon binary can be built
without nexo-plugin-whatsapp in its compile graph:
cargo build --release --bin nexo --no-default-features
cargo tree --no-default-features -i nexo-plugin-whatsapp
# expected: error: package ID specification ... did not match any packages
Gated sites:
| Crate | Site | Detail |
|---|---|---|
src/main.rs | RuntimeHealth.wa_pairing | typed BTreeMap<String, SharedPairingState> field |
src/main.rs | spawn_whatsapp_pairing_state_subscriber | broker subscriber fn |
src/main.rs | spawn_whatsapp_typing_presence_subscriber | typing-presence broker bridge fn |
src/main.rs | build_known_pairing_registry | WhatsappPairingAdapter::new |
src/main.rs | admin pairing trigger map | WhatsappPairingTrigger::from_configs |
src/main.rs | instance loop | wa_pairing + wa_tunnel_cfg population |
src/main.rs | subscriber spawn block | spawn_whatsapp_pairing_state_subscriber call |
src/main.rs | tunnel auto-open | /whatsapp/pair Cloudflare quick tunnel |
src/main.rs | tool fallback (boot) | register_whatsapp_tools |
src/main.rs | tool fallback (hot-spawn) | register_whatsapp_tools |
src/main.rs | HTTP handler | /whatsapp/* route dispatcher |
crates/setup/src/writer.rs | pairing flow | session::pair_once + helpers + dual-shape wipe_channel_session |
crates/setup/src/admin_bootstrap.rs | admin RPC | with_wa_bot_handle + outbound translator |
crates/setup/src/admin_adapters.rs | outbound translator | WhatsAppTranslator struct + impl + tests |
crates/setup/tests/channel_outbound_end_to_end.rs | e2e test | file-level #![cfg] |
Runtime impact when --no-default-features:
- Admin RPC
/whatsapp/*returns channel-unavailable. - HTTP
/whatsapp/*route returns 404 (handler block absent). - Auto-open Cloudflare quick tunnel for pairing is skipped.
- Pairing trigger map has no whatsapp entry —
admin pairing/startreturns "channel not supported". - Outbound dispatcher rejects whatsapp routes with typed
TranslationError::UnsupportedChannel.
WhatsApp still runs as a discovered subprocess if its
manifest sits in plugins.discovery.search_paths and the
binary is installed — the gate removes only compile-time
imports. Subprocess broker path is unaffected.
Building a telegram-less daemon
cargo build --release --bin nexo --no-default-features
Verify the crate dropped from the dep graph:
cargo tree --no-default-features -i nexo-plugin-telegram
# expected: error: package ID specification `nexo-plugin-telegram` did not match any packages
cargo tree -i nexo-plugin-telegram (without
--no-default-features) prints the canonical nexo-rs v0.1.x
parent — proving the gate is the only thing keeping
telegram in.
Runtime behaviour
A feature-gated build still runs telegram as a discovered
subprocess if its manifest sits in
plugins.discovery.search_paths and the
nexo-plugin-telegram binary is installed (via
cargo install nexo-plugin-telegram or release tarball).
The gate removes only the daemon's compile-time imports
(pairing adapter constructor + outbound-tool fallback
registration). The subprocess path uses broker JSON-RPC,
not direct Rust imports, so it is unaffected.
Tradeoff: the feature-disabled daemon loses the daemon-side
fallback that registers telegram_* outbound tools into
the agent's ToolRegistry if the plugin manifest does not
yet declare [[plugin.tools.outbound]]. Standalone telegram
v0.3.0+ ships the manifest section, so the fallback is dead
weight for any operator running a current plugin binary.
CI matrix
The release workflow validates both shapes:
cargo build --bin nexo # default (telegram in)
cargo build --bin nexo --no-default-features # slim (telegram out)
Both targets must compile clean for release-fast and release profiles before the binary ships.
When to add a new feature-gate
Add plugin-<id> = ["dep:nexo-plugin-<id>"] if:
- The plugin has a non-trivial Cargo dep with transitive cost (binary size, link time, native dep like OpenSSL).
- The plugin is genuinely optional for the target audience (Android, embedded, slim Docker).
- The compile-time integration points are localised — no
cross-crate admin-RPC entanglement that would force the
gate to bubble through
crates/setuporcrates/core.
If any of (1)-(3) fail, prefer subprocess discovery over a feature-gate — manifest-driven runtime decoupling avoids the conditional-compilation noise.
Metrics & health
Prometheus metrics on :9090/metrics, health/readiness on :8080,
admin console on 127.0.0.1:9091. Everything an operator or
orchestrator needs to decide "is the agent healthy?" without reading
logs.
Source: crates/core/src/telemetry.rs, src/main.rs.
Ports at a glance
| Port | Binding | Purpose |
|---|---|---|
:9090 | 0.0.0.0 | Prometheus /metrics scrape |
:8080 | 0.0.0.0 | Health /health, readiness /ready, WhatsApp pairing pages |
:9091 | 127.0.0.1 | Admin console (loopback only) |
Ports are not configurable yet — if you need to remap, port-forward outside the agent (Docker, k8s service).
/metrics (Prometheus)
Exposed metrics:
| Name | Type | Labels | What |
|---|---|---|---|
llm_requests_total | counter | agent, provider, model | Every LLM completion request |
llm_latency_ms | histogram | agent, provider, model | Buckets 50, 100, 250, 500, 1000, 2500, 5000, 10000 ms |
messages_processed_total | counter | agent | Inbound messages that reached an agent |
nexo_extensions_discovered | counter | status={ok,disabled,invalid} | Emitted on every discovery sweep |
nexo_tool_calls_total | counter | agent, outcome={ok,error,blocked,unknown}, tool | Tool invocations |
nexo_tool_cache_events_total | counter | agent, event={hit,miss,put,evict}, tool | Tool-level memoization |
nexo_tool_latency_ms | histogram | agent, tool | Per-tool latency |
circuit_breaker_state | gauge | breaker | 0 = Closed, 1 = Open; always includes nats |
credentials_accounts_total | gauge | channel | Per-channel labelled instance count (Phase 17) |
credentials_bindings_total | gauge | agent, channel | 1 when the agent has a credential bound, 0 otherwise |
channel_account_usage_total | counter | agent, channel, direction={inbound,outbound}, instance | Every credential use |
channel_acl_denied_total | counter | agent, channel, instance | Outbound calls rejected by allow_agents |
credentials_resolve_errors_total | counter | channel, reason | Resolver failures (unbound, not_found, not_permitted) |
credentials_breaker_state | gauge | channel, instance | 0=closed, 1=half-open, 2=open. Per-(channel, instance) circuit breaker — a 429 from one number cannot trip the breaker for a sibling account. |
credentials_boot_validation_errors_total | counter | kind | Gauntlet errors by kind at boot |
credentials_insecure_paths_total | gauge | — | Credential files with lax permissions at boot |
credentials_google_token_refresh_total | counter | account_fp, outcome={ok,err} | Google OAuth refresh attempts (fp = sha256[..8], not raw email) |
pairing_inbound_challenged_total | counter | channel, result={delivered_via_adapter,delivered_via_broker,publish_failed,no_adapter_no_broker_topic} | DM-challenge dispatch attempts (Phase 26.x) |
pairing_approvals_total | counter | channel, result={ok,expired,not_found} | nexo pair approve outcomes (Phase 26.y) |
pairing_codes_expired_total | counter | — | Setup codes pruned past TTL or rejected as expired on approve |
pairing_bootstrap_tokens_issued_total | counter | profile | Bootstrap tokens minted by BootstrapTokenIssuer::issue |
pairing_requests_pending | gauge | channel | Pending pairing requests (push-tracked; PairingStore::refresh_pending_gauge exposed for drift recovery after a daemon restart) |
Circuit-breaker state for the nats breaker is sampled at scrape
time from broker readiness, so a stalled publish path shows up in
the next scrape without needing an eager push.
The credentials_* and channel_* series are documented with full
schema examples in config/credentials.md.
account_fp is always an 8-byte sha256 fingerprint of the account id,
never the raw JID or email, so scraped metrics stay safe to share.
Useful alerts
LLM provider flapping
- alert: LlmError5xxHigh
expr: sum(rate(llm_requests_total{outcome="error"}[5m])) by (provider) > 0.1
for: 5m
NATS circuit open
- alert: NatsBreakerOpen
expr: circuit_breaker_state{breaker="nats"} == 1
for: 1m
Tool call failures
- alert: ToolErrorSpike
expr: |
sum(rate(nexo_tool_calls_total{outcome="error"}[5m])) by (tool) > 0.5
for: 10m
Health endpoints
flowchart LR
GET1[GET /health] --> OK[200 OK<br/>always<br/>{status:ok}]
GET2[GET /ready] --> CHK{broker ready<br/>AND agents > 0?}
CHK -->|yes| RDY[200 OK<br/>{status:ready,<br/>agents_running:N}]
CHK -->|no| NOT[503 Service Unavailable<br/>{status:not_ready,<br/>broker_ready,<br/>agents_running}]
GET /health— liveness probe. Returns 200 as long as the process is accepting connections. Don't use this as a traffic gate.GET /ready— readiness probe. Returns 200 only when the broker is ready and at least one agent runtime is attached to inbound topics. Returns 503 during boot, shutdown, or broker outage.GET /whatsapp/*— QR pairing pages and the/whatsapp/pairtunnel endpoint; see WhatsApp plugin.
Kubernetes probes
livenessProbe:
httpGet: { path: /health, port: 8080 }
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet: { path: /ready, port: 8080 }
initialDelaySeconds: 30
periodSeconds: 5
initialDelaySeconds: 30 for readiness covers extension discovery
and every agent runtime attaching its subscriptions.
Admin console (:9091)
Loopback-only. Exposes:
| Path | Purpose |
|---|---|
/admin/agents | Agent directory with live status, session counts |
/admin/tool-policy | Query the tool-policy registry |
The agent status [--endpoint URL] [--agent-id ID] [--json] CLI
subcommand hits this endpoint and prints a table or JSON; good for
scripting ops without grepping logs.
Remote access requires an explicit tunnel — the port is never exposed publicly by default.
Scrape config sample
# prometheus.yml
scrape_configs:
- job_name: nexo-rs
scrape_interval: 15s
static_configs:
- targets: ['agent:9090']
For Docker compose: the service name is agent. For k8s: use the
service DNS.
Gotchas
circuit_breaker_stateonly labels per-breaker, not per-provider. Multiple LLM providers each have their own breaker instance, but they surface as distinctbreakerlabel values. If you expected{provider="anthropic"}you'll need a label rename in your Prometheus relabel config.- Histograms are non-configurable. Buckets are compiled in. If your SLO requires fine-grained buckets below 50 ms, it is worth opening an issue.
/ready503 during shutdown is expected. Don't alert on 5 s of 503 bursts — alert onrate(> 30 s).
Logging
tracing under the hood. Human-readable in dev, JSON in production,
always to stderr (stdout is reserved for wire protocols like MCP
JSON-RPC).
Source: src/main.rs::init_tracing.
Quick reference
| Env var | Default | Meaning |
|---|---|---|
RUST_LOG | info | EnvFilter syntax (nexo_core=debug,async_nats=warn,*=info) |
AGENT_LOG_FORMAT | pretty (json in AGENT_ENV=production) | pretty | compact | json |
AGENT_ENV | unset | Set to production to default to JSON logs |
Levels
Pick the lowest verbosity that still surfaces the signal you care about:
| Level | Use |
|---|---|
error | Unrecoverable — operator action needed |
warn | Degraded but running (circuit open, retry budget burning) |
info | Lifecycle (startup, shutdown, reconnects) |
debug | Per-turn detail (tool invoked, session created) |
trace | Per-event firehose — only when chasing a bug |
Log formats
pretty (dev default)
Coloured, multi-line. Good at the terminal, bad in log pipelines.
2026-04-24T17:22:13Z INFO agent::runtime: agent runtime ready
at src/main.rs:1243
in agent_boot with agent="ana"
compact
One line per event. Middle ground.
2026-04-24T17:22:13Z INFO agent="ana" agent runtime ready
json
Structured. One JSON object per line. Default when AGENT_ENV=production.
{"ts_unix_ms":1714000000000,"level":"INFO","target":"agent::runtime","thread_id":"ThreadId(3)","file":"src/main.rs","line":1243,"spans":[{"name":"agent_boot","agent":"ana"}],"message":"agent runtime ready"}
Every entry carries:
ts_unix_ms— milliseconds since epoch (stable for ingestion)level,targetthread_id,file,line— for pinpointingspans— span hierarchy with attached fields- Any structured fields passed via
tracing::info!(agent = %id, ...)
Correlating across agents
Cross-agent work lands on agent.route.<target_id> with a
correlation_id. In logs, the correlation id shows up as a field on
every event that happened inside a delegation span.
flowchart LR
A[agent A<br/>info: tool_call agent.route.ops] --> MSG[NATS message<br/>correlation_id=req-123]
MSG --> B[agent B<br/>info: handling agent.route with correlation_id=req-123]
B --> REPLY[reply on agent.route.A<br/>correlation_id=req-123]
REPLY --> A2[agent A<br/>info: delegation returned correlation_id=req-123]
Grep logs by correlation_id to see the whole fan-out+in as a single
thread.
Structured-field conventions
Convention for fields that show up across the codebase:
| Field | Where |
|---|---|
agent | Any log tied to a specific agent runtime |
session | Any log inside a session context (usually UUID) |
extension (or ext) | Any log from extension runtimes |
tool | Any tool invocation log |
provider, model | LLM client logs |
correlation_id | Delegation-related logs |
topic | Broker publish/subscribe logs |
When adding new code, reuse these names — log pipelines can count on them.
Where stdout goes
stdout is reserved for:
- MCP server mode (
agent mcp serve) — JSON-RPC traffic - CLI subcommands that return data (
agent ext list --json,agent flow show --json,agent dlq list)
Everything else, including normal log output, goes to stderr.
Don't pipe agent … 2>&1 | jq unless you know the subcommand never
writes non-JSON to stdout.
Practical setups
Local dev
export RUST_LOG=agent=debug,nexo_core=debug,info
cargo run --bin agent -- --config ./config
Production (Docker)
services:
agent:
environment:
AGENT_ENV: production
RUST_LOG: info,async_nats=warn
Everything lands on stderr → container runtime picks it up → your log pipeline ingests JSON directly.
Chasing a specific agent
export RUST_LOG=agent=info
# then grep by field
docker compose logs agent | jq 'select(.spans[].agent == "ana")'
Gotchas
tracingis compile-time filtered. If you grep logs for a debug-level event and see nothing, verifyRUST_LOGcovers the module.- JSON mode drops ANSI colors. Rightly so — but don't pipe it through a TTY colorizer and then be confused by escape sequences.
stderrordering isn't guaranteed againststdout. Never assume a log line printed right after aprintln!happens in log order — pipes buffer independently.
Dead-letter queue operations
The DLQ is where events end up when they exhaust their retry budget or fail to deserialize at all. The runtime never silently drops an event — if it can't be delivered, it lands here for an operator to inspect or replay.
Source: crates/broker/src/disk_queue.rs, src/main.rs
(agent dlq ... subcommands).
When items land there
flowchart LR
PUB[publish event] --> NATS{NATS up?}
NATS -->|yes| OK[delivered]
NATS -->|no| DQ[pending_events]
DQ --> DRAIN[disk queue drain]
DRAIN -->|attempts < 3| DQ
DRAIN -->|attempts >= 3| DLQ[dead_letters]
DQ -.->|deserialization error| DLQ
- 3 attempts (
DEFAULT_MAX_ATTEMPTS) without success → row moves todead_letters - Unparseable payload → moves immediately (a poison pill is not worth retrying)
- Circuit-breaker-open on publish counts as an attempt — if the breaker stays open, the queue will eventually flush into DLQ
See Fault tolerance for the full retry flow.
The DeadLetter row
#![allow(unused)] fn main() { struct DeadLetter { id: String, // UUID topic: String, // NATS subject payload: String, // JSON event body failed_at: i64, // unix timestamp (ms) reason: String, // error text } }
Storage: SQLite table dead_letters in the broker DB (typically
./data/queue/broker.db).
CLI
agent dlq list # list up to 1000 entries
agent dlq replay <id> # move one entry back to pending_events
agent dlq purge # delete every entry
list output
Columns: id | topic | failed_at | reason. Plain text, one entry per
line, suitable for grep / awk piping.
2f9c2e4a-... plugin.inbound.whatsapp 2026-04-24T17:22:13Z circuit breaker open
b1a3a9f5-... plugin.outbound.telegram 2026-04-24T17:23:01Z deserialization error: unexpected field `...`
replay
Moves the row back to pending_events with attempts = 0:
$ agent dlq replay 2f9c2e4a-...
replayed 2f9c2e4a-... → pending_events (next daemon drain will retry it)
The retry happens on the next drain() cycle of the running agent —
replay itself does not attempt delivery. That way a running agent
in a different shell picks it up; a stopped agent leaves the event
safely in pending_events for its next startup.
purge
Destructive. Drops every row in dead_letters:
$ agent dlq purge
purged 42 dead-letter entries
Use with care — there is no per-topic filter. If you need a scoped
purge, inspect with list, selectively replay what you want to
keep, then purge the rest.
Exit codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Failure (event not found for replay, DB access error, etc.) |
Common workflows
Post-outage triage
# See what piled up during the NATS outage
agent dlq list | wc -l
# Spot-check
agent dlq list | head
agent dlq list | awk '{print $2}' | sort | uniq -c
# If reasons look transient (circuit open, timeouts):
agent dlq list | awk '{print $1}' | while read id; do
agent dlq replay "$id"
done
Poison-pill cleanup
If reason mentions deserialization errors, the payload is malformed
— no amount of retry will help. Collect the offenders, fix the
producer side, then:
agent dlq list | grep deserialization | awk '{print $1}' > /tmp/poison.txt
# ... verify they're truly poison ...
agent dlq purge
Preview without modifying
The CLI has no --dry-run flag today. Use agent dlq list to preview
first; the DB rows are stable until you explicitly replay or
purge.
Monitoring
There is no dedicated DLQ metric yet. Approximations:
- A spike in
circuit_breaker_state{breaker="nats"} == 1time strongly predicts DLQ growth — alert on it. - Consider wrapping
agent dlq list | wc -lin a cron job that pushes the count to Prometheus via the textfile collector if you want a direct gauge.
Gotchas
replaydoesn't wake a stopped agent. If no agent is running against the same data directory, the row just moves back topending_eventsand waits for the next startup drain.- No replay deduplication. Replaying an event that was already successfully delivered later will deliver it again. If your consumer isn't idempotent, spot-check downstream state before replaying.
purgeis global. Scope it withlist | replayselectively if you need to preserve a subset.
Config hot-reload
Operators rotate per-agent knobs (allowlists, model strings, prompts,
rate limits, delegation gates) without restarting the daemon. Sessions
currently handling a message finish their turn on the old snapshot;
the next event picks up the new one (apply-on-next-message). Plugin
configs (whatsapp.yaml, telegram.yaml, …) are not hot-reloadable
yet — see limitations.
What triggers a reload
| Trigger | Source |
|---|---|
File save under config/ | notify-based watcher, debounced 500 ms |
agent reload CLI | Publishes control.reload on the broker |
| Direct broker publish | Any integration can emit control.reload |
What's reloaded
Files watched by default (paths relative to the config dir):
agents.yamlagents.d/(recursive)llm.yamlruntime.yaml
Extra paths listed under runtime.reload.extra_watch_paths are
appended to the list.
The fields that apply live without a restart:
| Field | Location | Effect |
|---|---|---|
allowed_tools (agent + binding) | agents.d/*.yaml | Tool list visible to the LLM + per-call guard |
outbound_allowlist | same | Defense-in-depth in whatsapp_send_* / telegram_send_* |
skills | same | Skill blocks rendered into the system prompt |
model.model (binding-level) | same | LLM model string on next turn |
system_prompt + system_prompt_extra | same | System block composition |
sender_rate_limit | same | Per-binding token bucket |
allowed_delegates | same | Delegation ACL |
providers.<name>.api_key | llm.yaml | Rotated via a fresh LlmClient on next turn |
lsp.languages, lsp.idle_teardown_secs, lsp.prewarm (agent + binding) | agents.d/*.yaml | LSP tool reads policy per call (C2) |
team.max_members, team.max_concurrent, team.idle_timeout_secs, team.worktree_per_member (agent + binding) | same | Team* tools read policy per call (C2) |
config_tool.allowed_paths, config_tool.approval_timeout_secs | same | Read on the next ConfigTool call (M11 follow-up promotes the rest) |
repl.allowed_runtimes (agent + binding) | same | ReplTool gates spawn on the per-call allowlist (C2) |
remote_triggers (agent + binding) | same | RemoteTriggerTool reads allowlist per call |
cron_* model fields | same | CronCreateTool reads effective.model per call |
proactive.tick_interval_secs, proactive.jitter_pct, proactive.max_idle_secs | same | Proactive driver reads on the next tick |
All Phase 16 binding overrides (allowed_tools, outbound_allowlist, skills, model.model, system_prompt_extra, sender_rate_limit, allowed_delegates, language, link_understanding, web_search, pairing_policy, dispatch_policy, remote_triggers, proactive, repl, lsp, team, config_tool) | agents.d/*.yaml, inbound_bindings[].<field> | Resolved fresh per snapshot build; consumed at handler entry via ctx.effective_policy() |
Fields that require a restart (logged as warn during reload):
id,plugins,workspace,skills_dir,transcripts_dirheartbeat.enabled,heartbeat.intervalconfig.debounce_ms,config.queue_capmodel.provider(binding-level provider must match agent provider — theLlmClientis wired once per agent)broker.yaml,memory.yaml,mcp.yaml,extensions.yaml- Boolean enable flips:
lsp.enabled,team.enabled,repl.enabled,config_tool.self_edit,proactive.enabled(any per-binding override of these). Flippingfalse → truerequires registering the tool in the per-agenttool_base(immutable post-boot —Arc<ToolRegistry>); flippingtrue → falsewould leave a registered-but-refused tool that the LLM still sees in its catalogue. The handler refuses with a<feature>Disablederror in the second case, but operators should restart for clean semantics. - Subsystem actor lifecycle:
LspManagerchild processes,ReplRegistrysubprocess pool,TeamMessageRouterbroker subscriptions stay alive across reloads. Operator restart is required to recycle child processes (e.g. after a toolchain update forrust-analyzer).
The "boolean enable flips" + "subsystem actor lifecycle"
limitations match prior art: upstream agent CLI useManageMCPConnections.ts:624 does invalidate-and-refetch
without killing the MCP child stdio process; OpenClaw
research/src/plugins/services.ts:33-78 boots plugin services
once per process and keeps them resident across config changes.
Adding or removing an agent also requires a restart in this release; see limitations.
Configuration
config/runtime.yaml is optional. Defaults:
reload:
enabled: true # master switch
debounce_ms: 500 # notify-debouncer-full window
extra_watch_paths: [] # appended to the built-in list
cron:
one_shot_retry:
max_retries: 3
base_backoff_secs: 30
max_backoff_secs: 1800
Set enabled: false to turn off the file watcher + the
control.reload subscriber. The CLI agent reload still works — the
daemon never opens a privileged socket, it just listens on the shared
broker.
The reload pipeline
file save / CLI / broker
│
▼
debouncer (500 ms)
│
▼
AppConfig::load (YAML + env resolution)
│
▼
validate_agents_with_providers ──fail──▶ log warn, bump
│ config_reload_rejected_total,
▼ keep old snapshot
RuntimeSnapshot::build (per agent)
│
▼
ArcSwap::store (atomic per agent)
│
▼
events.runtime.config.reloaded
Validation failure never swaps. The daemon always serves a snapshot that passed its boot gauntlet.
CLI
# Human-readable output
$ agent reload
reload v7: applied=2 rejected=0 elapsed=18ms
✓ ana
✓ bob
# Machine-readable
$ agent reload --json
{
"version": 7,
"applied": ["ana", "bob"],
"rejected": [],
"elapsed_ms": 18
}
Exit codes:
0— at least one agent reloaded.1— nocontrol.reload.ackwithin 5 s (daemon not running).2— every agent rejected (partial-fail signal for CI).
Broker contract
| Topic | Direction | Payload |
|---|---|---|
control.reload | → daemon | {requested_by: string} |
control.reload.ack | ← daemon | serialized ReloadOutcome |
ReloadOutcome JSON shape:
{
"version": 7,
"applied": ["ana", "bob"],
"rejected": [
{"agent_id": "ana", "reason": "snapshot build: ..."}
],
"elapsed_ms": 18
}
Telemetry
| Metric | Type | Labels |
|---|---|---|
config_reload_applied_total | counter | — |
config_reload_rejected_total | counter | — |
config_reload_latency_ms | histogram | — |
runtime_config_version | gauge | agent_id |
Scrape via the metrics endpoint (ops/metrics).
Apply-on-next-message semantics
A reload does not interrupt sessions that are currently handling a message. Specifically:
- The LLM turn in flight keeps its captured
Arc<RuntimeSnapshot>for the life of the turn — tool calls inside that turn all see the same policy, even if several reloads land during the turn. - The next event delivered to the agent reads the latest snapshot
via
snapshot.load()on the intake hot path.
If you need a "force-apply now" semantic (terminate in-flight sessions,
respawn), use agent reload --kick-sessions — not implemented yet,
tracked in Phase 19.
Security model
control.reloadtopic has no application-level auth. Anyone with broker publish rights can trigger a reload. In production with NATS, restrict thecontrol.>subject pattern via NATS account permissions; see NATS with TLS + auth. The local-broker fallback is in-process only — no remote attack surface.- File-watcher trust = filesystem write. Whoever can edit
config/agents.d/*.yamlcan change capability surface. Treat the config dir as a privileged resource: 0600 on YAML files, 0700 on the directory. events.runtime.config.reloadedpayload includes agent ids and rejection reasons. Subscribers see them. Single-process deployments are fine; in multi-tenant setups, gate theevents.runtime.>pattern in NATS auth.- Outbound allowlist scope. The Phase 16 outbound allowlist governs WhatsApp + Telegram tools only. Google tools are gated by the OAuth scopes granted at credential creation (see Per-agent credentials) — there is no per-recipient list for Google.
- Apply-on-next-message and tightening reloads. A reload that
narrows an allowlist for security reasons does not affect
in-flight sessions until they next receive an event. If you need
the change to take effect immediately, restart the daemon (or wait
for the upcoming
agent reload --kick-sessionsflag in Phase 19).
Failure modes
- Bad YAML:
AppConfig::loadfails. Old snapshot keeps serving.config_reload_rejected_totalbumps. The warn log names the file + line. - Validation errors: aggregate — every problem across every agent shows in one warn block. Fix them in one edit instead of restart-and-repeat.
- Unknown provider: rejected at boot + at reload by
KnownProviderscheck. Boot validation lists what's registered. - Missing tool in binding's
allowed_tools: caught by the post-registry validation pass during reload. - Agent added / removed: Phase 18 rejects these with a clear message; restart the daemon to reshape the fleet.
Limitations
Intentional scope gaps for Phase 18, tracked for Phase 19:
- Add / remove agent at runtime. The coordinator rejects new ids and left-over registered handles with an actionable message. Restart needed.
- Plugin config hot-reload (
whatsapp.yaml,telegram.yaml,browser.yaml,email.yaml). Plugin daemons own I/O (QR pairing, long-polling). Reshaping them live requires a dedicated lifecycle refactor. config_reloadedhook for extensions to react. Pending.- SIGHUP trigger as an extra UX path. Deferred — use the broker topic or the CLI.
See also
- Layout — where these files live
- agents.yaml — the per-agent surface
- llm.yaml — provider credentials
- Metrics (Prometheus)
Plugin trust (cosign + trusted_keys.toml)
Phase 31.3. Operators control which plugin authors are trusted by
maintaining <config_dir>/extensions/trusted_keys.toml. The
nexo plugin install CLI reads this file before extracting any
tarball; cosign verification of .sig + .cert (+ optional
.bundle) assets gates the install.
The framework's own release signing precedent — see Verifying releases — uses the same Sigstore keyless flow. Plugin trust applies that flow per author, with operator-side allowlisting.
Trust modes
| Mode | What happens |
|---|---|
ignore | Skip cosign verification entirely. Useful for dev / CI / installing a plugin you built locally. |
warn (default) | Verify when .sig + .cert are present in the release; if absent, log a stderr warning and proceed unverified. |
require | Reject any install whose tarball does not produce a valid allowlisted signature. |
Mode resolution precedence on each install:
- CLI flag (
--require-signature/--skip-signature-verify). - Per-author
[[authors]]modefield, when the install's owner matches. - Global
defaultfield. - Built-in fallback (
warn).
Mutually exclusive flags --require-signature +
--skip-signature-verify fail the install at parse time.
Sample trusted_keys.toml
schema_version = "1.0"
default = "warn"
# Optional override; falls back to $PATH walk + well-known
# locations (/usr/local/bin/cosign, /opt/homebrew/bin/cosign,
# ~/go/bin/cosign).
# cosign_binary = "/usr/local/bin/cosign"
[[authors]]
owner = "lordmacu"
identity_regexp = "^https://github.com/lordmacu/[^/]+/\\.github/workflows/release\\.yml@.*$"
oidc_issuer = "https://token.actions.githubusercontent.com"
mode = "require"
A copy with comments lives at
config/extensions/trusted_keys.toml.example in the repo root.
How identity_regexp is matched
Every cosign keyless signature carries a Subject Alternative Name (SAN) on its certificate. In GitHub Actions flow the SAN encodes the workflow URL plus the ref:
https://github.com/<owner>/<repo>/.github/workflows/release.yml@refs/tags/v0.2.0
The operator regex must match that string. Make it specific enough to lock in the workflow path but loose enough to tolerate ref / repo additions. Examples:
| Goal | Regex |
|---|---|
Trust everything from this owner via release.yml | ^https://github\.com/lordmacu/[^/]+/\.github/workflows/release\.yml@.*$ |
| Trust a specific repo only | ^https://github\.com/lordmacu/nexo-plugin-slack/\.github/workflows/release\.yml@.*$ |
| Trust any owner-prefix workflow path | ^https://github\.com/lordmacu/.*$ |
Required prerequisite: cosign on the host
The verifier shells out to cosign verify-blob. Install before
using any non-ignore trust mode:
brew install cosign # macOS
sudo apt install cosign # Debian/Ubuntu
sudo dnf install cosign # Fedora/RHEL
The framework pins to cosign 2.4.1 (matching its own release-signing workflow). Any ≥ 2.4 should work; older versions predate the keyless argv shape used here.
CLI flags
# Use the trusted_keys.toml default for this install:
nexo plugin install lordmacu/nexo-plugin-slack@v0.2.0
# Force `Require` for this call regardless of config:
nexo plugin install lordmacu/nexo-plugin-slack@v0.2.0 --require-signature
# Force `Ignore` (skip verification) for this call:
nexo plugin install lordmacu/nexo-plugin-slack@v0.2.0 --skip-signature-verify
JSON output additions
Every install report (--json) now includes:
| Field | Value |
|---|---|
signature_verified | true when cosign verification succeeded. |
signature_identity | SAN string parsed from cosign output (Subject: line). Omitted when verification was skipped. |
signature_issuer | OIDC issuer the cert was minted by. |
trust_mode | "ignore" / "warn" / "require" — the effective mode used. |
trust_policy_matched | Repo owner that matched a [[authors]] entry, or omitted. |
The error report (PluginInstallErrorReport) gains five new
kind values: CosignNotFound, CosignFailed, VerifyIo,
PolicyRequiresSig, AssetIncomplete, TrustedKeysParse,
IdentityRegexpInvalid. Plus the parse-time conflict
FlagsConflict (mutually-exclusive flags).
Troubleshooting
cosign binary not found— install cosign. Or setcosign_binaryin your trust file. Or pass--skip-signature-verifyfor a one-off install of trusted bytes you already vetted.trust policy requires signature for <owner>— yourmode = "require"rejected an unsigned plugin. Ask the author to enableCOSIGN_ENABLED=trueon their publish workflow (see Publishing a plugin), or relax the per-authormodetowarn.cosign verify-blob exited non-zero— the cert SAN did not match youridentity_regexp. Check the publisher's workflow URL (it appears in their release's actions log) and update the regex. Capture the full cosign stderr from the error message for the exact mismatch.identity_regexp ... invalid— your regex did not compile. Common cause: forgetting to escape.or/. The Rustregexcrate's syntax docs are here.
See also
- Publishing a plugin — author side of the cosign signing chain.
- Verifying releases — same Sigstore flow, applied to the framework's own release artifacts.
Capability toggles
Several bundled extensions ship with dangerous capabilities off by default — write paths, secret reveal, cache purges. Each capability is gated by a single environment variable. The operator flips it on by exporting the var in the agent process's environment.
agent doctor capabilities enumerates every known toggle, its
current state, and a hint for enabling it.
$ agent doctor capabilities
Capability toggles
──────────────────────────────────────────────────────────────────
EXT ENV VAR STATE RISK EFFECT
onepassword OP_ALLOW_REVEAL disabled HIGH Reveal raw secret values…
onepassword OP_INJECT_COMMAND_ALLOWLIST disabled HIGH Allow `inject_template` to pipe…
cloudflare CLOUDFLARE_ALLOW_WRITES disabled HIGH Create / update / delete DNS…
cloudflare CLOUDFLARE_ALLOW_PURGE disabled CRITICAL Purge zone cache…
docker-api DOCKER_API_ALLOW_WRITE disabled HIGH Start / stop / restart…
proxmox PROXMOX_ALLOW_WRITE disabled CRITICAL VM / container lifecycle…
ssh-exec SSH_EXEC_ALLOWED_HOSTS disabled HIGH Allow `ssh_run` against…
ssh-exec SSH_EXEC_ALLOW_WRITES disabled CRITICAL Allow `scp_upload`…
Pass --json for machine-readable output (admin UI, dashboards):
agent doctor capabilities --json
Toggle reference
| Env var | Extension | Kind | Risk | Effect |
|---|---|---|---|---|
OP_ALLOW_REVEAL | onepassword | bool | high | Returns secret values verbatim instead of fingerprints |
OP_INJECT_COMMAND_ALLOWLIST | onepassword | allowlist | high | Enables inject_template exec mode for the listed commands |
CLOUDFLARE_ALLOW_WRITES | cloudflare | bool | high | Authorizes create_dns_record, update_dns_record, delete_dns_record |
CLOUDFLARE_ALLOW_PURGE | cloudflare | bool | critical | Authorizes purge_cache |
DOCKER_API_ALLOW_WRITE | docker-api | bool | high | Authorizes start_container, stop_container, restart_container |
PROXMOX_ALLOW_WRITE | proxmox | bool | critical | Authorizes VM/container lifecycle actions |
SSH_EXEC_ALLOWED_HOSTS | ssh-exec | allowlist | high | Hosts the agent may target with ssh_run |
SSH_EXEC_ALLOW_WRITES | ssh-exec | bool | critical | Authorizes scp_upload |
Boolean kinds accept true, 1, or yes (case-insensitive).
Anything else — including unset — counts as disabled.
Allowlist kinds are comma-separated. Empty / whitespace-only inputs count as disabled. The agent never falls back to "anything goes" when the variable is unset.
When to enable
The default is off because every toggle moves the agent from "informational" to "consequential" — failures are no longer just a bad reply, they can mutate real systems or leak secrets.
Enable a toggle only when:
- The agent will provably need that capability for the next session.
- The operator (you) is present and the session is observed.
- There is a way to revert quickly — a wrapper script, a per-shell
.envrc, or a systemd unit drop-in you can comment out.
Avoid enabling toggles globally in ~/.profile. Scope them to the
specific shell or systemd unit that runs the agent.
How to revoke
- Boolean:
unset CLOUDFLARE_ALLOW_WRITES(or restart the shell / service). - Allowlist:
unset OP_INJECT_COMMAND_ALLOWLISTto disable, orexport OP_INJECT_COMMAND_ALLOWLIST=(empty string) to keep the intent visible while still treating the feature as disabled.
The agent reads these on each call (no caching), so revocation is
immediate without a restart for most paths. The single exception is
OP_INJECT_COMMAND_ALLOWLIST reading happens at tool-call time, not
extension-spawn time, so it also picks up changes live.
Adding a new toggle
When a future extension introduces a new write/reveal env var, add a
matching CapabilityToggle to
crates/setup/src/capabilities.rs::INVENTORY. Without that entry,
agent doctor capabilities is silently incomplete — the inventory
is the operator-facing source of truth.
Backup + restore
Nexo state lives under NEXO_HOME (default ~/.nexo/ for native
installs, /var/lib/nexo-rs/ for the systemd package, /app/data/
in the Docker image). Backing it up + restoring it is the operator's
responsibility today; a proper nexo backup / nexo restore
subcommand is tracked under Phase 36.
Quickest path — scripts/nexo-backup.sh
The repo ships a shell script that does the right thing without stopping the daemon:
# Single-shot, output to ./
NEXO_HOME=/var/lib/nexo-rs sudo -E scripts/nexo-backup.sh
# Custom output dir, exclude secrets (default)
scripts/nexo-backup.sh --out /backups/
# Include secrets/ for full recovery (encrypt the archive yourself)
scripts/nexo-backup.sh --include-secrets
What it does:
- Hot snapshot every SQLite DB via
sqlite3 .backup— the official online-backup mechanism. Captures a consistent point-in-time image even with concurrent writers; no daemon stop required. - rsync non-DB state — JSONL transcripts, the agent
workspace-git dir if Phase 10.9 is enabled, any operator
files dropped under
NEXO_HOME. Skips*.tmp,*.lock, and thequeue/disk-queue dir (replays on next boot from NATS, no need to back up). secret/excluded by default. Re-run with--include-secretsto include them; encrypt the resulting tarball before transit (useage,gpg, or push to an encrypted bucket).- sha256 manifest at
MANIFEST.sha256inside the archive so restore can verify integrity. - zstd-19 compression — typical 10× ratio over raw SQLite.
- Sidecar
<archive>.sha256with the archive's outer hash so backup pipelines can detect transit corruption.
Restore
# Pull the archive locally first
scp ops@host:/backups/nexo-backup-20260426T121500Z.tar.zst .
# Extract
zstd -dc nexo-backup-20260426T121500Z.tar.zst | tar -xf -
# Verify the manifest
cd nexo-backup-20260426T121500Z
sha256sum -c MANIFEST.sha256
# Stop the daemon (state must not be mid-write)
sudo systemctl stop nexo-rs
# Replace state
sudo rsync -a --delete --chown=nexo:nexo \
./ /var/lib/nexo-rs/
# Start
sudo systemctl start nexo-rs
sudo journalctl -u nexo-rs -f
The daemon must be stopped during the rsync — SQLite WAL files do not survive a parallel-write replacement.
Cron schedule
Drop in /etc/cron.daily/nexo-backup:
#!/bin/sh
set -eu
ARCHIVE_DIR=/backups/nexo
mkdir -p "$ARCHIVE_DIR"
# Snapshot, retain locally
NEXO_HOME=/var/lib/nexo-rs \
/opt/nexo-rs/scripts/nexo-backup.sh --out "$ARCHIVE_DIR"
# Push to remote (Backblaze, S3, Wasabi, etc.)
rclone copy --include '*.tar.zst*' "$ARCHIVE_DIR" remote:nexo-backups/
# Retain 30 days locally + 90 days remote
find "$ARCHIVE_DIR" -name 'nexo-backup-*.tar.zst*' -mtime +30 -delete
rclone delete --min-age 90d remote:nexo-backups/
chmod +x /etc/cron.daily/nexo-backup. Single-host operators get
a tested daily backup pipeline in 6 lines.
What survives a backup
| Component | In backup | Notes |
|---|---|---|
| Long-term memory (vector + relational) | ✅ | memory.db |
| Transcripts | ✅ | transcripts/ JSONL + transcripts.db FTS |
| TaskFlow state | ✅ | taskflow.db |
| Pairing store + setup-code key | ⚠️ | DB included; key only with --include-secrets |
| LLM credentials | ⚠️ | secret/ only with --include-secrets |
| Per-agent SOUL.md + MEMORY.md | ✅ | rsync from workspace |
| Agent workspace git | ✅ | full .git dir included if Phase 10.9 is on |
| Disk-queue (NATS replay buffer) | ❌ | regenerates from NATS on boot |
| Process logs | ❌ | journalctl handles those separately |
Migrations
Schema migrations across Nexo versions are still ad-hoc — ALTER TABLE … .ok() patterns inside the runtime. Phase 36 adds:
nexo migrate status— show the applied vs available migration setnexo migrate up [target]— apply pending migrations forwardnexo migrate down [target]— roll back if a release ships reversible migrations- A
migrations/dir with versioned, checksummed SQL files
Until then, pin to a specific Nexo version per deployment and test upgrades on a copy of the backup before applying to production.
Status
Tracked as Phase 36 — Backup, restore, migrations.
| Sub-phase | Status |
|---|---|
scripts/nexo-backup.sh shell bridge | ✅ shipped |
| Operator doc (this page) | ✅ shipped |
nexo backup --out <dir> subcommand | ⬜ deferred |
nexo restore --from <archive> subcommand | ⬜ deferred |
nexo migrate up/down/status versioned migrations | ⬜ deferred |
| Encrypted archive output (age / gpg) | ⬜ deferred |
| CI test that backup → restore round-trips on a fixture | ⬜ deferred |
The shell script + this doc are the bridge. Once the runtime subcommands ship, this page rewrites to point at them and the script gets retired.
Agent memory snapshots
Atomic point-in-time snapshots of an agent's full memory state, packaged as a single verifiable bundle. Built for rollback after a corrupt dream, forensic audit ("what did the agent know at T?"), portable export between hosts, and pre-restore safety nets in autonomous mode.
What goes in a bundle
| Layer | Source | In-bundle path |
|---|---|---|
| Memory git repo | <memdir>/.git/ | git/** |
| Operator-curated files | <memdir>/MEMORY.md + topic files | memory_files/** |
| Long-term SQLite | <sqlite>/long_term.sqlite | sqlite/long_term.sqlite |
| Vector SQLite | <sqlite>/vector.sqlite | sqlite/vector.sqlite |
| Concepts | <sqlite>/concepts.sqlite | sqlite/concepts.sqlite |
| Compactions | <sqlite>/compactions.sqlite | sqlite/compactions.sqlite |
| Extractor cursor | runtime state provider | state/extract_cursor.json |
| Last dream run row | agent registry | state/dream_run.json |
| Manifest | seal | manifest.json |
Bundle layout on disk
<state_root>/tenants/<tenant>/snapshots/<agent_id>/
├── <id>.tar.zst # bundle body (or .tar.zst.age when encrypted)
└── <id>.tar.zst.sha256 # whole-file SHA-256 sibling
Two independent integrity checks ride together:
- Manifest seal —
manifest.bundle_sha256= SHA-256 of every per-artifact hex digest concatenated in declared order. Verifiable from the manifest alone, no recursion on the tar bytes. - File-level seal — sibling
.sha256text file = SHA-256 of the bundle file as it lives on disk (post-encryption when encrypted). Detects bit-flips during transit / cold storage even when the body is age-wrapped.
Both must pass for verify to report ok.
CLI
nexo memory snapshot --agent <id> [--tenant <t>] [--label <s>]
[--redact-secrets] [--encrypt age:<recipient>]
nexo memory restore --agent <id> [--tenant <t>] --from <bundle>
[--dry-run] [--no-auto-pre-snapshot]
[--decrypt-identity <path>]
nexo memory list --agent <id> [--tenant <t>] [--json]
nexo memory diff --agent <id> [--tenant <t>] <id-a> <id-b>
nexo memory export --agent <id> [--tenant <t>] --id <snapshot-id> --to <path>
nexo memory verify --bundle <path>
nexo memory delete --agent <id> [--tenant <t>] --id <snapshot-id>
--tenant defaults to default for single-tenant deployments. Multi-
tenant SaaS deployments require explicit values aligned with the
canonicalized identifier rules described in
capabilities.
nexo memory restore is gated on NEXO_MEMORY_RESTORE_ALLOW=true (see
capabilities). Without the flag the subcommand
refuses, even with --yes.
Configuration
Lives in config/memory.yaml under memory.snapshot:
memory:
snapshot:
enabled: true
root: ${NEXO_HOME}/state
auto_pre_dream: false # opt-in safety net before autoDream
auto_pre_restore: true # always snapshot before restore
auto_pre_mutating_tool: false # opt-in: pre-Plan-mode mutating tool
lock_timeout_secs: 60
redact_secrets_default: true
encryption:
enabled: false
recipients: [] # age public keys (age1...)
identity_path: ${NEXO_HOME}/secret/snapshot-identity.txt
retention:
keep_count: 30
max_age_days: 90
gc_interval_secs: 3600
events:
mutation_subject_prefix: "nexo.memory.mutated"
lifecycle_subject_prefix: "nexo.memory.snapshot"
mutation_publish_enabled: true
Hot-reload via the standard ConfigReloadCoordinator path: edit YAML
and the retention worker picks up the new policy at the next tick.
Lifecycle events (NATS)
Best-effort published when a broker is wired. Subjects are formed
from EventsSection.lifecycle_subject_prefix (default
nexo.memory.snapshot) — operators that override the prefix in
YAML get the override on every event topic.
LifecycleEvent is serde(tag = "kind", rename_all = "snake_case"),
so every payload below carries an extra "kind": "<verb>"
discriminator field flattened alongside the documented fields:
| Subject | Trigger | Payload (after serde(flatten)) |
|---|---|---|
<prefix>.<agent_id>.created | snapshot success | {kind:"created", ...SnapshotMeta} — flattened: id, agent_id, tenant, label?, created_at_ms, bundle_path, bundle_size_bytes, bundle_sha256, git_oid?, schema_versions, encrypted, redactions_applied |
<prefix>.<agent_id>.restored | restore success | {kind:"restored", ...RestoreReport} — flattened: agent_id, from, pre_snapshot?, git_reset_oid?, sqlite_restored_dbs[], state_files_restored[], workers_restarted, dry_run |
<prefix>.<agent_id>.deleted | delete success | {kind:"deleted", agent_id, tenant, snapshot_id, ts_ms} |
<prefix>._all.gc | retention sweep | {kind:"gc", ts_ms, report:{bundles_deleted, orphan_staging_dirs_removed, agents_visited, errors}} |
The _all segment in the gc subject is a sentinel — gc events are
cross-agent and have no single agent_id to fan-out on. Subscribers
filtering with nexo.memory.snapshot.<agent>.> therefore miss gc;
use nexo.memory.snapshot.> (or the configured equivalent) to catch
both.
Mutation events (one per memory write) flow to
<events.mutation_subject_prefix>.<agent_id> (default prefix
nexo.memory.mutated) when
memory.snapshot.events.mutation_publish_enabled = true. Subscribers
can stream them into an audit log without forking memory writes.
Encryption
Optional, behind the snapshot-encryption Cargo feature:
cargo build --features snapshot-encryption
nexo memory snapshot --agent ana --encrypt age:age1xyz...
nexo memory restore --agent ana --from <bundle>.tar.zst.age \
--decrypt-identity ~/.nexo/secret/snapshot-identity.txt
The body is wrapped in an age stream; the manifest stays plaintext
inside the encrypted payload but the per-artifact hashes commit to it.
The sibling .sha256 file always covers the bytes that land on disk
(post-encryption), so transit integrity stays verifiable without the
identity.
Multi-recipient encryption (admin UI)
Phase 90 follow-up — when the snapshot is captured via the admin
UI (nexo/admin/memory/create_snapshot { encrypt: true }), the
daemon wraps the bundle for every recipient listed under
memory.snapshot.encryption.recipients, not just the first. Each
operator with a matching identity file can independently restore
the bundle.
memory:
snapshot:
encryption:
enabled: true
recipients:
- "age1backupadmin..." # backup operator's age public key
- "age1dradmin..." # disaster-recovery operator's key
identity_path: ${NEXO_HOME}/secret/snapshot-identity.txt
Both recipients above receive a header section in every admin-UI snapshot. Either operator's identity file can decrypt it. Duplicate recipient strings (operator paste-twice typo) are silently deduplicated.
The CLI's single-recipient --encrypt age:age1xyz... flag is
unchanged — it remains the power-user / scripted path. To capture
a multi-recipient bundle from the CLI today, use the admin RPC
via nexo/admin/memory/create_snapshot.
Boot-time validation: at daemon startup the runtime parses every
recipient string. A typo (e.g. age1xyz truncated by accident)
fails the daemon boot with a clear recipients[N] failed to parse
error so operators discover the issue before relying on the
encryption.
Threat model
- Loss of identity → encrypted bundle is unrecoverable. Mirror identity files into your operator-credential store with the same retention as your other long-lived secrets.
- Sibling
.sha256missing →verifyreportsbundle_sha256_ok = falsebut does not error. Operators must treat this as a hard fail before restore. - Bundle smaller than the live state → expected: restore overwrites
whatever was there, including untracked files in the memdir. Use
--dry-runfirst. - Cross-tenant restore → blocked at path validation. A bundle
whose tenant string does not match the request errors with
CrossTenantErrorbefore any disk mutation. - Last snapshot deletion →
deleterefuses to drop the agent's only remaining bundle. Retention sweeps obey the same floor. - Auto-pre-snapshot during restore → on by default. Disable with
--no-auto-pre-snapshotonly when the rollback anchor is unwanted (e.g. you are restoring into a fresh agent with no prior state). - Encrypted bundles +
verify→ without the identity the per-artifact hashes inside the body cannot be checked; the report'smanifest_okandper_artifact_okare reported astrueby convention whileage_protectedis set. Operators who must verify the manifest of an encrypted bundle should runverifyafter a decrypt + restore round-trip.
Retention
A background worker sweeps every gc_interval_secs:
- Orphan staging cleanup — any
.staging-<id>/or.restore-staging-<id>/directory left behind by a process kill is deleted at startup and at every tick. - Per-agent count + age — bundles older than
max_age_daysor exceedingkeep_countare deleted oldest-first via the samedelete()path the CLI uses, so the "never delete the last snapshot" floor is respected.
Restore mechanics
The full sequence for a real (non---dry-run) restore:
verifythe bundle. Schema-too-new and checksum mismatch fail here without touching live state.auto_pre_snapshot(default on): take a snapshot labelledauto:pre-restore-<orig_id>so the operation is reversible.- Acquire the per-agent lock. Concurrent snapshot/restore for the
same agent will fail with
Concurrent. - Unpack to
.restore-staging-<uuid>/. - Tag the live HEAD with
pre-restore-<id>so prior state stays reachable viagit reflog show pre-restore-<id>. - SQLite swap: each live DB is renamed to
<name>.sqlite.pre-restore.bakand the staging copy moves into place. The.bakfiles survive the restore for manual recovery. - Memdir replace: live memdir is renamed to
<memdir>-pre-restore-<id>/and the staging contents are written on top. Failures roll the rename back. - State provider replay: extractor cursor + last dream-run row.
- Drop staging dir + lock.
Admin RPC surface (Phase 90.x.memory-snapshot + .create-restore)
The nexo-plugin-admin SPA at /m/memory drives four admin RPCs that
mirror the CLI's list, delete, snapshot, and restore verbs.
All four are gated by the memory_snapshot capability — operators
that already grant the read-only pair (list_snapshots +
delete_snapshot) automatically get write access via the same trust
boundary.
| Method | Capability | Behaviour |
|---|---|---|
nexo/admin/memory/list_snapshots | memory_snapshot | Newest-first list + encryption_available flag |
nexo/admin/memory/delete_snapshot | memory_snapshot | Idempotent removal by snapshot_id |
nexo/admin/memory/create_snapshot | memory_snapshot | Capture fresh bundle (label?, encrypt?) |
nexo/admin/memory/restore_snapshot | memory_snapshot | Restore by snapshot_id (dry_run?) |
Defaults forced server-side
Unlike the CLI, the admin path forces a fixed contract so operator mistakes via the SPA don't leak secrets or skip the safety net:
redact_secrets = true— UI-driven snapshots always run the secret-guard scanner. The CLI keeps--no-redactfor power users who want raw bundles.auto_pre_snapshot = true— every UI restore captures a pre-restore bundle so the operation is reversible. The CLI keeps--no-auto-pre-snapshotfor fresh-agent restores.created_by = "admin-ui"— provenance trace lands in the bundle manifest'screated_bycolumn for audit reads.
Restore by snapshot_id, not bundle_path
The wire never carries a filesystem path. The daemon resolves
snapshot_id → bundle_path via its own list() lookup before
opening the bundle. This forecloses on accidentally turning the
admin endpoint into an arbitrary-file-read primitive.
Defensive tenant validation
restore_snapshot requires tenant in the params. The adapter
reads the bundle manifest's recorded tenant and refuses if they
disagree, with both tenants quoted in the error. Operator typos
that would have crossed staging ↔ prod accidentally are
caught before any disk mutation.
Encryption recipient resolution
When create_snapshot is invoked with encrypt: true, the daemon
resolves the actual age recipient from
memory.snapshot.encryption.recipients[0] — the wire never carries
the recipient string, and operators rotate recipients via YAML +
restart. The same EncryptionSection clone surfaces
encryption_available on every list response so the SPA can grey
out the encrypt toggle when no recipients are configured.
For restore of an encrypted bundle the adapter resolves
identity_path from the same EncryptionSection. Missing
identity_path with an encrypted bundle errors with "encrypted
but no identity_path configured; restore via CLI".
Dry-run UX
restore_snapshot { dry_run: true } runs the full validation
pipeline (tenant check + bundle resolution + identity resolution)
but stops short of mutating live state. The returned
RestoreReportWire { dry_run: true } carries the
sqlite_restored_dbs[] and state_files_restored[] the SPA
renders as a preview table — the operator inspects the diff
before flipping the toggle and re-issuing destructively.
Lock semantics
Restore takes the same per-agent AgentLockMap lock the CLI uses.
A restore against an agent already holding the lock (concurrent
snapshot, retention sweep, second restore) will time out with
Concurrent after lock_timeout_secs. The handler bubbles the
error through; the SPA renders it as a retryable warning.
See also
- Backup + restore — operator backup script (Phase 36.1)
- Memdir scanner — secret-guard configuration
- Capabilities —
NEXO_MEMORY_RESTORE_ALLOW
Health checks
Three layers of health probes for a Nexo deployment, each tuned for a different consumer:
/health— liveness. Cheap (atomic flag check). HTTP 200 means the process is up; doesn't guarantee it can serve work./ready— readiness. Expensive (verifies broker connection, agents loaded, snapshot warm). HTTP 200 means the runtime can accept inbound traffic. Use this for load-balancer health checks.scripts/nexo-health.sh— operator + monitoring. JSON summary with counter snapshots. Bridge untilnexo doctor health(Phase 44) ships.
Liveness — /health
Returns HTTP 200 + ok body when the agent process is alive.
The runtime sets a RUNNING flag at startup and clears it on
graceful shutdown. Does not verify any subsystem — useful
for "is the daemon there at all" probes.
curl -fsSL http://127.0.0.1:8080/health
# ok
Kubernetes liveness probe:
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
A failing liveness probe should restart the container. Be
generous on initialDelaySeconds — first-boot extension
discovery + memory open + agent runtime spin-up can take 15-25s.
Readiness — /ready
Returns 200 only when all of:
- Broker (NATS or local) is reachable
- Every configured agent has loaded its tool registry
- The hot-reload snapshot has been warmed (Phase 18)
- Pairing store is open (if
pairing_policy.auto_challengeis on)
Returns 503 with a JSON body listing the failing subsystem otherwise:
{
"ready": false,
"reasons": [
{"subsystem": "broker", "detail": "nats://localhost:4222: connection refused"}
]
}
Use this for load-balancer / service-mesh routing decisions.
A node that's live but not ready should not receive
traffic.
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 1
Operator one-shot — scripts/nexo-health.sh
Single-shot JSON summary intended for watch -n 5 nexo-health.sh during ops, cron health-mailers, and uptime
monitors that want one structured payload covering everything.
# Default — pretty human output
scripts/nexo-health.sh
# JSON only (cron, monitoring scrapers)
scripts/nexo-health.sh --json
# Custom hosts (e.g., probing through a service mesh)
scripts/nexo-health.sh --host nexo.internal:8080 \
--metrics-host nexo.internal:9090
# Strict mode — open circuit breaker counts as unhealthy.
# Default mode tolerates breaker-open (degraded-but-up).
scripts/nexo-health.sh --strict
Pretty output:
============================================================
nexo-rs health · 2026-04-26T15:30:00Z
============================================================
overall: ok
admin: 127.0.0.1:8080
metrics: 127.0.0.1:9090
probes:
✓ live ok
✓ ready ok
✓ metrics ok
counters:
tool_calls_total 4711
llm_stream_chunks_total 28391
web_search_breaker_open_total 0
JSON shape (for monitoring scrapers):
{
"overall": "ok",
"timestamp": "2026-04-26T15:30:00Z",
"endpoints": { "admin": "127.0.0.1:8080", "metrics": "127.0.0.1:9090" },
"probes": [
{"name": "live", "status": "ok", "detail": "ok"},
{"name": "ready", "status": "ok", "detail": "{...}"},
{"name": "metrics", "status": "ok", "detail": "# HELP nexo_..."}
],
"counters": {
"tool_calls_total": 4711,
"llm_stream_chunks_total": 28391,
"web_search_breaker_open_total": 0
}
}
Exit codes:
0— overall healthy1— at least one probe failed (or--strictand a breaker is open)
Cron health mailer
# /etc/cron.d/nexo-health
*/5 * * * * nexo /opt/nexo-rs/scripts/nexo-health.sh --json --strict \
>> /var/log/nexo-rs/health.jsonl 2>&1 \
|| (tail -1 /var/log/nexo-rs/health.jsonl | mail -s "nexo unhealthy" ops@yourorg)
Five-minute resolution, one line of JSONL per check, mail on failure.
Uptime monitor integration
UptimeRobot / BetterStack / Pingdom:
URL: https://nexo.example.com/ready
Interval: 60s
Timeout: 5s
Expected: HTTP 200
That's all most monitors need. The JSON body of /ready
explains the failure when the alert fires.
What nexo-health.sh adds beyond /ready
| Signal | /ready | nexo-health.sh |
|---|---|---|
| Process up + accepting traffic | ✅ | ✅ |
| Counter snapshot (tool calls, LLM chunks) | ❌ | ✅ |
| Web-search breaker state | ❌ | ✅ |
| Single JSON payload | ❌ (HTTP 200/503) | ✅ |
| Suitable for HTTP probe | ✅ | ❌ (shells out) |
Use /ready for the orchestrator. Use nexo-health.sh for the
operator's eyeballs and the alerting pipeline.
Status
Tracked as Phase 44 — Auxiliary observability surfaces.
| Capability | Status |
|---|---|
/health liveness endpoint | ✅ shipped (Phase 9) |
/ready readiness endpoint | ✅ shipped (Phase 9) |
scripts/nexo-health.sh operator one-shot | ✅ shipped |
| Operator runbook (this page) | ✅ shipped |
nexo doctor health aggregating subcommand | ⬜ deferred |
nexo inspect <session_id> state-transition pretty-print | ⬜ deferred |
Per-session structured event log under data/events/ | ⬜ deferred |
Cost & quota controls
Operator runbook for tracking + capping LLM spend. Today the
runtime emits enough Prometheus metrics for an operator to build
their own picture; the proper nexo costs subcommand + budget
caps land in Phase 45.
Estimating spend — scripts/nexo-cost-report.sh
Aggregates nexo_llm_stream_chunks_total by provider, multiplies
by a price table, prints (or emits JSON) per-provider rolling
totals.
# Human-readable report against the local /metrics endpoint
scripts/nexo-cost-report.sh
# JSON for monitoring / dashboards
scripts/nexo-cost-report.sh --json
# Custom price table (your negotiated enterprise rates)
scripts/nexo-cost-report.sh --prices ~/our-enterprise-rates.tsv
# Probe a remote daemon
scripts/nexo-cost-report.sh --metrics-host nexo.internal:9090
Pretty output:
============================================================
nexo-rs cost report · 2026-04-26T15:30:00Z
============================================================
PROVIDER CHUNKS EST_TOKENS EST_USD
anthropic 28391 85173 $0.7666
minimax 4711 14133 $0.0042
ollama 1208 3624 $0.0000
total estimated: $0.7708
disclaimer: heuristic estimate. Calibrate
NEXO_TOKENS_PER_CHUNK once you have a measured baseline.
Calibration
The default tokens-per-chunk = 3 is a heuristic. To get an
accurate number for your deployment:
- Find a typical conversation in transcripts (
session_logstool output). - Sum the
usage.total_tokensfrom thechat.completionend event(s). - Divide by the total chunk count emitted during that
conversation (visible in
nexo_llm_stream_chunks_total{provider="...",kind="text_delta"}). - Set
NEXO_TOKENS_PER_CHUNKenv to the result.
Example:
# Anthropic typical: 4-token granularity per delta
NEXO_TOKENS_PER_CHUNK=4 scripts/nexo-cost-report.sh
# OpenAI typical: 1 token per delta on streaming
NEXO_TOKENS_PER_CHUNK=1 scripts/nexo-cost-report.sh
When the runtime ships nexo_llm_tokens_total{provider,model,direction}
(Phase 45 deliverable), the heuristic is replaced by direct token
counts and the calibration step disappears.
Built-in price table
| Provider | Model | $/1M in | $/1M out |
|---|---|---|---|
| anthropic | claude-opus-4 | 15.00 | 75.00 |
| anthropic | claude-sonnet-4 | 3.00 | 15.00 |
| anthropic | claude-haiku-4 | 0.80 | 4.00 |
| openai | gpt-4o | 2.50 | 10.00 |
| openai | gpt-4o-mini | 0.15 | 0.60 |
| minimax | abab6.5s | 0.20 | 0.60 |
| minimax | M2.5 | 0.30 | 1.50 |
| gemini | gemini-1.5-pro | 1.25 | 5.00 |
| gemini | gemini-1.5-flash | 0.075 | 0.30 |
| deepseek | deepseek-chat | 0.14 | 0.28 |
| ollama | * | 0.00 | 0.00 |
These are public list prices as of 2026-04. Operators with
enterprise contracts override via --prices:
provider model in_per_1m out_per_1m
anthropic claude-sonnet-4 2.40 12.00
openai gpt-4o 2.00 8.00
(One row per provider×model. * model = applies to any model
from that provider.)
Daily budget alerts via cron
Snapshot every 24h, mail the operator if estimated spend > cap:
# /etc/cron.daily/nexo-cost-alert
#!/bin/sh
set -eu
CAP=10.00 # $/day soft cap
REPORT=$(/opt/nexo-rs/scripts/nexo-cost-report.sh --json)
TOTAL=$(echo "$REPORT" | jq -r '.total_estimated_usd')
if awk -v t="$TOTAL" -v c="$CAP" 'BEGIN { exit !(t > c) }'; then
echo "$REPORT" | mail -s "nexo daily spend over \$$CAP: \$$TOTAL" \
ops@yourorg.com
fi
This is alerting only, not enforcement — the runtime keeps serving traffic. For hard caps, wait for Phase 45.
Hard quota caps (deferred)
Phase 45 ships per-agent monthly budget caps:
# config/agents.yaml — once 45.x lands
agents:
- id: kate
cost_cap_usd:
monthly: 50.00
daily: 5.00
action: refuse_new_turns # or: warn_only, throttle
warn_topic: alerts.kate.budget
When hit:
refuse_new_turns— agent returns a fixed response ("I've reached my budget for the period; please ask the operator to extend.") to every new inbound. Existing in-flight turns finish.warn_only— log + telemetry but keep serving.throttle— switch to a cheaper model variant (claude-haiku-4instead ofclaude-opus-4) for the rest of the period.
Per-binding token rate limits (e.g. "WhatsApp sales binding
capped at 5k tokens/hour") layer on top of the existing
sender_rate_limit. Phase 45.x.
Inspecting the metrics directly
If the script is too coarse:
# Top providers by total chunks (last 5m rate)
curl -sS http://127.0.0.1:9090/metrics | \
awk '/^nexo_llm_stream_chunks_total/{gsub(/.*provider="/, "", $1); gsub(/".*/, "", $1); n[$1]+=$2} END{for (p in n) print n[p], p}' | \
sort -rn
# TTFT p95 by provider (curl + jq if you have promtool):
promtool query instant http://127.0.0.1:9090 \
'histogram_quantile(0.95, sum by (provider, le) (rate(nexo_llm_stream_ttft_seconds_bucket[5m])))'
The full metric inventory lives in
Grafana dashboards → metric coverage
(in repo as ops/grafana/README.md).
Status
Tracked as Phase 45 — Cost & quota controls.
| Capability | Status |
|---|---|
scripts/nexo-cost-report.sh heuristic estimator | ✅ shipped |
| Operator runbook (this page) | ✅ shipped |
nexo_llm_tokens_total{provider,model,direction} metric | ⬜ deferred |
| Per-agent monthly budget cap (config + enforcement) | ⬜ deferred |
agents.<id>.cost_cap_usd schema | ⬜ deferred |
| Per-binding token rate limit | ⬜ deferred |
| Pre-flight token-count predictor in agent prompt | ⬜ deferred |
nexo costs CLI rolling 24h/7d/30d aggregator | ⬜ deferred |
/api/costs admin endpoint | ⬜ deferred |
Privacy toolkit
GDPR-style operator workflows for handling user data requests until
the proper nexo forget / nexo export-user subcommands ship
(tracked under Phase 50).
Right to be forgotten
scripts/nexo-forget-user.sh does cascading delete across every
SQLite DB and JSONL transcript under NEXO_HOME, then VACUUMs
the databases so the deleted rows don't survive in free pages.
# Stop the daemon first — SQLite WAL doesn't survive parallel writes
sudo systemctl stop nexo-rs
# DRY RUN — shows what would be deleted, doesn't change anything
NEXO_HOME=/var/lib/nexo-rs sudo -E scripts/nexo-forget-user.sh \
--id "+5491155556666"
# When the dry-run looks right, re-run with --apply
NEXO_HOME=/var/lib/nexo-rs sudo -E scripts/nexo-forget-user.sh \
--id "+5491155556666" \
--apply
# Restart
sudo systemctl start nexo-rs
What gets deleted (cascading across all DBs):
| Table column | Match | Source DB |
|---|---|---|
user_id | exact | every DB |
sender_id | exact | every DB (used in pairing, transcripts) |
account_id | exact | every DB (used in WA / TG plugins) |
contact_id | exact | memory + transcripts |
peer_id | exact | agent-to-agent routing |
Plus JSONL transcript lines where any of those keys equals the target id.
The script emits forget-user-<id>-<timestamp>.json with the
exact deletion counts — this is the operator's GDPR audit
trail, ship it back to the requester as proof of compliance.
--keep-audit flag
Strict GDPR says even the admin-audit row recording the deletion
should be removed (the user has the right to no trace). But that
breaks operator audit chains. Use --keep-audit to opt out of
that single specific erasure:
nexo-forget-user.sh --id "<id>" --apply --keep-audit
The script keeps the admin_audit table row showing that the
deletion happened (without the user-id field, which is hashed).
Other tables fully wiped either way.
Right to data export
Until nexo export-user --id <id> ships, manual SQL works:
USER_ID="+5491155556666"
OUT_DIR="export-${USER_ID}-$(date -u +%Y%m%dT%H%M%SZ)"
mkdir -p "$OUT_DIR"
# Stop the daemon for a consistent point-in-time export
sudo systemctl stop nexo-rs
# Per-DB extraction
for db in /var/lib/nexo-rs/*.db; do
name=$(basename "$db" .db)
sqlite3 "$db" \
".headers on" \
".mode json" \
".output $OUT_DIR/${name}.json" \
"SELECT * FROM ($(sqlite3 "$db" '
SELECT GROUP_CONCAT(
\"SELECT '\" || name || \"' AS table_name, * FROM \" || name ||
\" WHERE user_id = '\" || ? || \"' OR sender_id = '\" || ? || \"' OR account_id = '\" || ? || \"'\",
\" UNION ALL \"
)
FROM sqlite_master m
WHERE m.type='table'
AND EXISTS (
SELECT 1 FROM pragma_table_info(m.name) p
WHERE p.name IN ('user_id','sender_id','account_id')
)
'))" -- "$USER_ID" "$USER_ID" "$USER_ID"
done
# Per-JSONL extraction
for f in /var/lib/nexo-rs/transcripts/*.jsonl; do
name=$(basename "$f")
jq -c \
--arg id "$USER_ID" \
'select((.user_id // .sender_id // .account_id // "") == $id)' \
"$f" > "$OUT_DIR/$name"
done
# Restart
sudo systemctl start nexo-rs
# Tar + zstd, optionally encrypt
tar -C "$(dirname "$OUT_DIR")" -cf - "$(basename "$OUT_DIR")" | \
zstd -19 -T0 > "${OUT_DIR}.tar.zst"
# (Recommended) age-encrypt before transit
age -r age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \
-o "${OUT_DIR}.tar.zst.age" \
"${OUT_DIR}.tar.zst"
shred -u "${OUT_DIR}.tar.zst"
The result is a tarball the operator hands to the requester —
JSON files per DB + filtered transcript JSONLs — encrypted with
the requester's age public key.
When nexo export-user --id <id> ships, this whole shell pipeline
collapses into one command with built-in encryption.
Retention policy
Operator-defined per deployment. Recommended defaults:
| Surface | Retention | Why |
|---|---|---|
| Transcripts | 90 days | Enough for ops debugging + agent recall |
| Memory (long-term) | indefinite | Agent's working memory; pruned by recall signals |
| TaskFlow finished flows | 30 days | Audit trail for completed work |
| TaskFlow failed flows | 365 days | Forensics |
| Admin audit log | 365 days | Compliance |
| Disk-queue (NATS replay) | 7 days | Disaster recovery |
| Pairing pending requests | 60 min | TTL-enforced by the store |
Apply via cron (until nexo retention apply ships):
# /etc/cron.daily/nexo-retention
#!/bin/sh
set -eu
DB=/var/lib/nexo-rs/transcripts.db
# 90-day rolling window on transcripts
sqlite3 "$DB" "DELETE FROM transcripts
WHERE timestamp < strftime('%s', 'now', '-90 days');"
sqlite3 "$DB" 'VACUUM;'
# Same for taskflow finished + failed
DB=/var/lib/nexo-rs/taskflow.db
sqlite3 "$DB" "DELETE FROM flows
WHERE status='Finished'
AND finished_at < datetime('now', '-30 days');"
sqlite3 "$DB" "DELETE FROM flows
WHERE status='Failed'
AND finished_at < datetime('now', '-365 days');"
PII detection (deferred)
Phase 50 plans inbound PII flagging — separate from the existing outbound redactor. The rough shape:
- Regex pre-screen for SSN-shape, credit-card-shape (Luhn-checked), phone-number-shape per locale.
- Optional LLM-backed second-pass via the future Phase 68 local tier (gemma3-270m).
- Hits land in
data/pii-flags.jsonlfor operator review; agent dialog continues unimpeded.
Today: nothing automated. The outbound redactor in
crates/core/src/redaction.rs (regex-based) catches the obvious
shapes before they reach long-term memory or the LLM, but doesn't
emit a queue for operator review.
Encryption at rest
Two roads, both deferred to Phase 50.x:
- Application-level —
sqlcipherbuild oflibsqlite3-syswith a key fed fromsecrets/. Every page encrypted; backups need the same key to restore. - Filesystem-level —
dm-crypt/ LUKS on the volume hostingNEXO_HOME. Operator does it once at provision, no Nexo changes required.
The native install + Hetzner / Fly recipes assume filesystem-level
crypto handled by the host (LUKS on Hetzner, encrypted EBS on AWS,
Fly volumes are encrypted at rest by default). When sqlcipher is
ready we'll document switching tiers.
Status
| Capability | Status |
|---|---|
scripts/nexo-forget-user.sh cascading delete | ✅ shipped |
| Operator data-export shell pipeline (above) | ✅ documented |
| Retention policy + cron template | ✅ documented |
nexo forget --user <id> subcommand | ⬜ deferred |
nexo export-user --id <id> subcommand | ⬜ deferred |
| Inbound PII detection + review queue | ⬜ deferred |
sqlcipher encryption at rest | ⬜ deferred |
| Admin-action audit log (separate from this script's manifest) | ⬜ deferred |
Tracked as Phase 50 — Privacy toolkit.
Anonymous telemetry (opt-in)
Nexo can emit a weekly heartbeat with anonymous, aggregated deployment shape so the project knows what configurations are actually in production. The heartbeat is disabled by default — nothing leaves your host until you explicitly opt in.
This page documents exactly what's sent, what isn't, and how to inspect the payload before enabling it.
What is sent
Every 7 days (drift-resistant — 7d ± 1h jitter), if telemetry is
enabled, Nexo POSTs a single JSON document to
https://telemetry.lordmacu.dev/nexo over HTTPS:
{
"schema_version": 1,
"instance_id": "0fa3...",
"version": "0.1.1",
"rust_version": "1.80.1",
"os": "linux",
"arch": "aarch64",
"uptime_days": 14,
"agents": {
"total": 3,
"active_24h": 2
},
"channels": {
"whatsapp": 1,
"telegram": 1,
"email": 0,
"browser": 1
},
"llm_providers": [
"minimax",
"anthropic"
],
"memory_backend": "sqlite-vec",
"sessions": {
"average_per_agent_24h": 12,
"p95_per_agent_24h": 28
},
"extensions_loaded": 4,
"broker_kind": "nats"
}
What is not sent
- ❌ Message content. Not a single byte of any conversation, prompt, response, or tool call ever leaves the host.
- ❌ Identifiers. No phone numbers, email addresses, contact
names, agent names, channel handles. The
instance_idis a random UUID generated on first opt-in and stored in~/.nexo/telemetry-id; it can't be tied to anything except a rerun of the same install. - ❌ API keys / tokens / secrets. None. The provider list is
the literal string
"minimax", never the key. - ❌ IP addresses. The receiving server (
telemetry.lordmacu.dev) drops the source IP at ingress before the payload hits any database. The HTTP access log retains only the country code derived from a one-way hash of the IP, used solely to plot the geographic distribution gauge on the public dashboard. - ❌ Hostname. Not in the payload. Not derived from anything in the payload.
- ❌ Time of day. The heartbeat is jittered so the timestamp doesn't reveal a pattern.
Why opt in
It's the only honest signal the project has about what's actually deployed. Without it, every roadmap discussion is guessing. With it, prioritization improves: if 80% of opt-in deployments use Anthropic + WhatsApp, then a regression on that combo gets a hot-fix; a niche feature goes to maintenance mode.
The aggregate dashboard at
https://lordmacu.github.io/nexo-rs/usage/ (published once
Phase 41 fully ships) shows everyone what everyone else is doing
in aggregate — same data the maintainers see.
Enable / disable
# Show current state + what would be sent right now
nexo telemetry status
# Enable (writes to /etc/nexo-rs/telemetry.yaml or ~/.nexo/telemetry.yaml)
nexo telemetry enable
# Inspect exactly what tomorrow's heartbeat will contain
nexo telemetry preview
# Disable + remove the instance_id file
nexo telemetry disable
Hot-reload aware (Phase 18) — toggling doesn't require a daemon restart. The runtime watches the telemetry config; the next heartbeat tick respects whatever is currently on disk.
First-launch banner
On first nexo boot in a fresh install, the daemon prints once
to the journal:
========================================================================
nexo telemetry is DISABLED.
Enabling it sends an anonymous, aggregated weekly heartbeat
describing your deployment shape (channel mix, LLM provider mix,
agent count). No message content, no identifiers, no API keys.
Inspect the payload: nexo telemetry preview
Enable: nexo telemetry enable
Read the full spec: https://lordmacu.github.io/nexo-rs/ops/telemetry.html
========================================================================
Subsequent boots stay silent. Toggling on or off prints a one-line confirmation.
Server-side guarantees
The receiving endpoint at telemetry.lordmacu.dev:
- Drops the source IP at the load balancer, before the request reaches any application code or log aggregator.
- Stores the JSON document verbatim with no enrichment.
- Aggregates documents per
instance_idonly to compute theactive_install_countcardinality on the public dashboard. - Retains raw documents for 90 days, then aggregates and deletes the originals.
- Does not correlate documents across
instance_idrotations — if younexo telemetry disable && nexo telemetry enable, you become a fresh install in the dataset.
The server source code lives at
https://github.com/lordmacu/nexo-telemetry-server (deferred —
opens once Phase 41 finishes server side). Reproducible build,
verifiable signatures.
Inspecting in transit
The HTTP request is plain HTTPS POST with the JSON payload above as the body. Easy to mitm in a corp environment:
mitmproxy -p 8888 -s drop_telemetry.py &
NEXO_TELEMETRY_PROXY=http://127.0.0.1:8888 nexo telemetry preview
The runtime respects HTTPS_PROXY / HTTP_PROXY / standard
proxy env vars for the heartbeat HTTP client (it goes through
the same reqwest client every other Nexo egress uses).
Disabling at the firewall
If you just want to make sure no telemetry can leave even if it gets accidentally enabled:
sudo iptables -A OUTPUT -d telemetry.lordmacu.dev -j REJECT
The runtime will see a network error in its logs every 7 days (rate-limited to once-per-week to not flood). It does not retry-forever — one attempt per scheduled tick.
Compliance notes
- GDPR: anonymous aggregate data with no identifiers and no
PII falls outside Article 4(1) "personal data". The
instance_idis technical metadata, not a pseudonym — it can't be re-tied to a natural person via any data the project holds. - HIPAA: no PHI is collected; the field set is infrastructure metadata only.
- Corporate sec teams: the receiving endpoint speaks only
HTTPS, no fallback to HTTP. The server cert is publicly
pinnable. The payload schema is documented + versioned; new
fields require bumping
schema_versionand a documented changelog entry below.
Schema changelog
| Version | Released | What changed |
|---|---|---|
| 1 | TBD when Phase 41 ships | Initial schema as documented above |
Future schema changes append a row here. Old clients are not
forced to upgrade — the server accepts every advertised
schema_version indefinitely (rolled-up dashboard panels
include only the fields a given schema carries).
Out of scope
- Per-agent / per-binding metrics — that's the Prometheus
/metricsendpoint, scraped locally by your own Prometheus (see Grafana dashboards). The telemetry heartbeat is deployment-shape only. - Crash reports — Nexo emits anyhow backtraces to the local journal but never sends them off-host.
- Real-time analytics — heartbeat is once weekly. There's no call-home for live metrics, ever.
Recipes
End-to-end walkthroughs that wire multiple subsystems together. Each recipe runs against a clean checkout of nexo-rs — prerequisites are at the top.
| Recipe | What you build |
|---|---|
| WhatsApp sales agent | A drop-in agent that greets WhatsApp leads, asks qualifying questions, and notifies a human on hot leads. |
| Agent-to-agent delegation | Route work from one agent to another using agent.route.* with correlation ids. |
| Python extension | Write a stdlib-only extension that adds a custom tool to any agent. |
| MCP server from Claude Desktop | Expose the agent's tools to the Anthropic desktop client. |
| NATS with TLS + auth | Harden the broker for a multi-node deployment. |
| Rotating config without downtime | Three Phase 18 hot-reload scenarios: API key rotation, A/B prompt swap, narrowing an outbound allowlist mid-incident. |
| Future marketing plugin (multi-client) | Prepare multi-client autonomous marketing agents with strict instance/model isolation before plugin implementation. |
If a recipe drifts from reality, open an issue — it means the docs didn't get updated alongside a code change.
WhatsApp sales agent
Build a drop-in agent that handles a sales line on WhatsApp:
- Greets the lead with the right operator (ETB / Claro / generic)
- Qualifies via a short scripted flow (address, package, budget)
- Notifies a human on hot leads, narrows the tool surface so the LLM only ever sees the lead-notification tool
This is the production shape of the shipped ana agent.
Prerequisites
agentbuilt (cargo build --release)- NATS running (
docker run -p 4222:4222 nats:2.10-alpine) - A MiniMax M2.5 key
- A phone with WhatsApp ready to scan a QR
1. Provide the LLM key
export MINIMAX_API_KEY=...
export MINIMAX_GROUP_ID=...
2. Create a gitignored agent file
config/agents.d/ana.yaml is gitignored; put the business-sensitive
content there.
agents:
- id: ana
model:
provider: minimax
model: MiniMax-M2.5
plugins: [whatsapp]
inbound_bindings:
- plugin: whatsapp
allowed_tools:
- notify_lead # only this tool is visible
outbound_allowlist:
whatsapp:
- "573000000000@s.whatsapp.net" # human advisor's WA
workspace: ./data/workspace/ana
workspace_git:
enabled: true
heartbeat:
enabled: false
system_prompt: |
You are Ana, a sales advisor for ETB and Claro. Help customers
choose the best internet, TV, and phone package.
On the first incoming message:
- If it contains "etb" -> route directly to the ETB flow.
- If it contains "claro" -> route directly to the Claro flow.
- Otherwise, ask which operator they prefer.
Capture: name, address, socioeconomic stratum, preferred package
(internet only / internet+TV / triple play).
When the lead is ready, invoke `notify_lead` with JSON containing:
{name, phone, address, operator, package, notes}. Do not call any
other tool — this is your only tool.
3. Pair WhatsApp for this agent
./target/release/agent setup whatsapp
The wizard creates ./data/workspace/ana/whatsapp/default/, flips
config/plugins/whatsapp.yaml::whatsapp.session_dir to point at it,
and renders a QR. Scan from the WhatsApp app.
4. Ship the notify_lead tool as an extension
Copy the Rust template and rename:
cp -r extensions/template-rust extensions/notify-lead
cd extensions/notify-lead
Edit plugin.toml:
[plugin]
id = "notify-lead"
version = "0.1.0"
[capabilities]
tools = ["notify_lead"]
[transport]
type = "stdio"
command = "./target/release/notify-lead"
Implement tools/notify_lead in src/main.rs — it should publish
to plugin.outbound.whatsapp.default with a recipient = the human
advisor number you listed in outbound_allowlist.
Build and install:
cargo build --release
cd ../..
./target/release/agent ext install ./extensions/notify-lead --link --enable
./target/release/agent ext doctor --runtime
5. Run
./target/release/agent --config ./config
Flow diagram
sequenceDiagram
participant U as Lead
participant WA as WhatsApp
participant N as NATS
participant A as Ana
participant H as Human advisor
U->>WA: "Hi, I want internet service"
WA->>N: plugin.inbound.whatsapp
N->>A: deliver
A->>A: qualify (address, package)
A->>A: invoke notify_lead(json)
A->>N: plugin.outbound.whatsapp (advisor number)
N->>WA: deliver
WA->>H: "🚨 New lead — Luis, 573111111111, triple play"
Why this shape works
allowed_tools: [notify_lead]prevents the LLM from hallucinating other actions — the model literally cannot see other tools.outbound_allowlist.whatsappis defense-in-depth: even if the LLM crafts a send to an unexpected number, the runtime rejects it.workspace_git.enabled: truelets you audit what Ana remembered over time viamemory_history— useful for reviewing tough calls.- Gitignored
agents.d/ana.yamlkeeps tarifarios and business content out of the public repo.
Testing
- Open WhatsApp on a second phone and send "hi, ETB"
- Watch
agent status anafor session activity - Watch
docker compose logs agent | jq 'select(.agent == "ana")'for turn-by-turn reasoning
Cross-links
Agent-to-agent delegation
Route work from one agent to another using agent.route.<target_id>
with a correlation id. Typical shapes:
- Kate delegates research to
opsand waits for the reply - Ana fans out lead data to
crm-bot,ticket-bot, andlogger - A supervisor agent orchestrates specialist subagents
Prerequisites
- Two agents configured in
config/agents.yaml(and/oragents.d/) - NATS running
- Either agent can be the caller or callee; the topology is symmetric
Agent config
agents:
- id: kate
model: { provider: minimax, model: MiniMax-M2.5 }
plugins: [telegram]
inbound_bindings: [{ plugin: telegram }]
allowed_delegates: [ops, crm-bot]
description: "Personal assistant; delegates research to ops."
- id: ops
model: { provider: minimax, model: MiniMax-M2.5 }
accept_delegates_from: [kate]
description: "Operations agent; answers factual questions about systems."
Key fields:
allowed_delegates(on the caller) — globs of peer ids this agent may route to. Empty = no restriction.accept_delegates_from(on the callee) — inverse gate. Empty = no restriction.description— injected into both sides'# PEERSblock so the LLM knows who can do what.
Both gates are glob lists and can be set on either side or both.
Wire shape
sequenceDiagram
participant K as Kate
participant B as NATS
participant O as Ops
Note over K: LLM decides to delegate
K->>B: publish agent.route.ops<br/>{correlation_id: "req-abc", body: "what's the latest DB migration status?"}
B->>O: deliver
O->>O: on_message + LLM turn
O->>B: publish agent.route.kate<br/>{correlation_id: "req-abc", body: "migration 0042 is running..."}
B->>K: deliver
K->>K: correlate reply by req-abc
Correlation ids are caller-chosen strings. The callee echoes the id back on the reply; the caller uses it to match replies to requests (especially for fan-out + reassemble patterns).
Using the delegate tool
The runtime exposes a delegate tool whenever allowed_delegates is
non-empty. LLM call shape:
{
"name": "delegate",
"args": {
"to": "ops",
"body": "what's the latest DB migration status?"
}
}
The runtime:
- Generates a fresh
correlation_id - Publishes to
agent.route.opswith that id - Waits (bounded) for the reply on
agent.route.kate - Returns the body as the tool result
Timeouts and retry policy match the broker defaults — the circuit breaker on the target topic protects against an unreachable callee.
Fan-out
To fan out to multiple peers, the LLM can issue several delegate
calls in one turn. The runtime issues each with a unique
correlation_id and gathers the replies in parallel.
Guardrails
- Self-delegation is rejected at the manager level.
- Unknown target id → tool returns an error result, no broker traffic.
allowed_delegatesempty + no constraint means the agent can delegate to any peer — prefer an explicit list in production.
Observability
Every delegation emits two log lines (dispatch + reply) with structured fields:
{"agent": "kate", "target": "ops", "correlation_id": "...", "event": "delegate_dispatch"}
{"agent": "kate", "target": "ops", "correlation_id": "...", "event": "delegate_reply", "latency_ms": 1342}
Filter on correlation_id to trace a single delegation end to end.
Cross-links
Python extension
Ship a custom tool written in Python — no dependencies beyond stdlib. The agent spawns your script, handshakes with it over stdin/stdout, and exposes your tool to the LLM.
Prerequisites
python3on the host$PATH- A running nexo-rs install with
extensions.enabled: true
1. Copy the template
cp -r extensions/template-python extensions/word-count
cd extensions/word-count
2. Edit plugin.toml
[plugin]
id = "word-count"
version = "0.1.0"
description = "Count words in a piece of text."
priority = 0
[capabilities]
tools = ["count_words"]
[transport]
type = "stdio"
command = "python3"
args = ["./main.py"]
[requires]
bins = ["python3"]
[meta]
license = "MIT OR Apache-2.0"
[requires] bins = ["python3"] gates the extension: if Python
isn't on $PATH, the runtime skips the extension with a warn log
instead of crash-looping.
3. Write main.py
#!/usr/bin/env python3
import sys, json
def reply(id, result=None, error=None):
msg = {"jsonrpc": "2.0", "id": id}
if error is None:
msg["result"] = result
else:
msg["error"] = error
sys.stdout.write(json.dumps(msg) + "\n")
sys.stdout.flush()
def log(*args):
print(*args, file=sys.stderr, flush=True)
HANDSHAKE = {
"server_version": "0.1.0",
"tools": [{
"name": "count_words",
"description": "Count whitespace-separated words in a string.",
"input_schema": {
"type": "object",
"properties": {"text": {"type": "string"}},
"required": ["text"]
}
}],
"hooks": []
}
def main():
log("word-count starting")
for line in sys.stdin:
try:
req = json.loads(line)
except json.JSONDecodeError:
continue
method = req.get("method", "")
rid = req.get("id")
if method == "initialize":
reply(rid, HANDSHAKE)
elif method == "tools/count_words":
params = req.get("params", {}) or {}
text = params.get("text", "")
count = len(text.split())
reply(rid, {"count": count})
else:
reply(rid, error={"code": -32601, "message": f"unknown method: {method}"})
if __name__ == "__main__":
main()
Make it executable:
chmod +x main.py
4. Validate and install
cd ../..
./target/release/agent ext validate ./extensions/word-count/plugin.toml
./target/release/agent ext install ./extensions/word-count --link --enable
./target/release/agent ext doctor --runtime
--link creates a symlink instead of a copy — good for the
edit-test loop. doctor --runtime actually spawns the extension
and runs the handshake, so a Python error that kills the interpreter
during init surfaces here rather than in production logs.
5. Allow the tool per agent
The registered tool name is ext_word-count_count_words. Add it to
the right agent's allowed_tools (or use a glob):
agents:
- id: kate
allowed_tools:
- ext_word-count_*
# ...
6. Run
./target/release/agent --config ./config
Send a message that would prompt the LLM to use the tool; watch
the logs for tools/count_words on stderr.
Debugging
- stderr of the Python process is forwarded to the agent's log
pipeline.
print(..., file=sys.stderr)lines show up in the agent's tracing output with theextension=word-countfield. - Handshake failures are visible in
ext doctor --runtimeand prevent the tool from being registered at all. - Per-tool latency shows up in the
nexo_tool_latency_ms{tool="ext_word-count_count_words"}Prometheus histogram.
Productionizing
- Pin
commandto an absolute path or a virtualenv-local interpreter;python3on$PATHmay vary across hosts. - Pick your dependency strategy carefully — the template is stdlib
only. If you need
requestsor similar, ship arequirements.txt- bootstrap script, or switch to the Rust template.
- If the extension holds a connection to a remote service, add a heartbeat loop so you can detect liveness.
- For long-running tool calls,
printstatus events to stderr — they become structured log entries and help debug hung tools.
Cross-links
Build a poller module (V1 — deprecated)
⚠ Deprecated since Phase 96 (
nexo-poller 0.2.0). The in-tree builtins this page documents (gmail,rss,google_calendar) have been extracted to standalone subprocess plugin repos. New pollers should follow Build a poller plugin (V2). TheOutboundDelivery/TickOutcometypes referenced below are replaced byPollerHost::broker_publish+TickAckas of Phase 96. Treat this page as historical reference.
Three steps. No main.rs edit, no scheduler, no breaker, no SQLite
work. The runner gives you all of that — your code only describes
what to fetch, what to dispatch, and (optionally) what kind-specific
LLM tools to expose.
Reference (post-Phase-96): crates/poller/src/builtins/ for the two
remaining in-tree examples (webhook_poll.rs + agent_turn.rs).
Phase 96 extractions live in standalone repos: nexo-rs-poller-rss,
nexo-rs-poller-google-calendar, nexo-rs-poller-gmail.
Step 1 — implement the trait
#![allow(unused)] fn main() { // crates/poller/src/builtins/jira.rs use std::sync::Arc; use nexo_poller::{ OutboundDelivery, PollContext, Poller, PollerError, TickOutcome, }; use async_trait::async_trait; use serde::Deserialize; use serde_json::{json, Value}; #[derive(Debug, Deserialize, Clone)] #[serde(deny_unknown_fields)] struct JiraConfig { base_url: String, project_key: String, deliver: nexo_poller::builtins::gmail::DeliverCfg, } pub struct JiraPoller; #[async_trait] impl Poller for JiraPoller { fn kind(&self) -> &'static str { "jira" } fn description(&self) -> &'static str { "Polls Jira for newly assigned issues in a project." } fn validate(&self, config: &Value) -> Result<(), PollerError> { serde_json::from_value::<JiraConfig>(config.clone()) .map(drop) .map_err(|e| PollerError::Config { job: "<jira>".into(), reason: e.to_string(), }) } async fn tick(&self, ctx: &PollContext) -> Result<TickOutcome, PollerError> { let cfg: JiraConfig = serde_json::from_value(ctx.config.clone()) .map_err(|e| PollerError::Config { job: ctx.job_id.clone(), reason: e.to_string(), })?; // 1. Pull data. Use ctx.cursor for incremental fetches. // 2. Decide what to dispatch. // 3. Build OutboundDelivery items — the runner publishes them // via Phase 17 credentials so you never touch the broker. let payload = json!({ "text": "(jira tick — replace with real fetch)" }); Ok(TickOutcome { items_seen: 0, items_dispatched: 1, deliver: vec![OutboundDelivery { channel: nexo_auth::handle::TELEGRAM, recipient: cfg.deliver.to.clone(), payload, }], next_cursor: None, next_interval_hint: None, }) } } }
Anything Poller::validate returns Err(PollerError::Config { … })
fails this job at boot — siblings keep going.
Poller::tick returns:
Ok(TickOutcome)— the runner persistsnext_cursor, increments counters, dispatches everyOutboundDeliveryvia the agent's Phase 17 binding, and sleeps until next slot.Err(PollerError::Transient(…))— counts toward the breaker; next tick retries with backoff.Err(PollerError::Permanent(…))— auto-pauses the job and fires thefailure_toalert.
PollContext.stores exposes the credential stores when your module
needs paths (e.g., Gmail / Calendar built-ins read
client_id_path from there). Plain ctx.credentials.resolve(…) is
enough when you only need a CredentialHandle.
Step 2 — register
#![allow(unused)] fn main() { // crates/poller/src/builtins/mod.rs pub mod gmail; pub mod google_calendar; pub mod jira; // ← new pub mod rss; pub mod webhook_poll; pub fn register_all(runner: &PollerRunner) { runner.register(Arc::new(gmail::GmailPoller::new())); runner.register(Arc::new(rss::RssPoller::new())); runner.register(Arc::new(webhook_poll::WebhookPoller::new())); runner.register(Arc::new(google_calendar::GoogleCalendarPoller::new())); runner.register(Arc::new(jira::JiraPoller)); // ← new } }
That is the only place wiring is touched. main.rs already calls
register_all.
Step 3 — declare a job
# config/pollers.yaml
pollers:
jobs:
- id: ana_jira_assigned
kind: jira
agent: ana
schedule: { every_secs: 300 }
config:
base_url: https://company.atlassian.net
project_key: ENG
deliver:
channel: telegram
to: "1194292426"
Run the daemon. Verify with:
agent pollers list # ana_jira_assigned shows up
agent pollers run ana_jira_assigned # tick on demand
Add per-kind LLM tools
Your module can ship its own tools alongside the generic
pollers_* ones. Override Poller::custom_tools:
#![allow(unused)] fn main() { fn custom_tools(&self) -> Vec<nexo_poller::CustomToolSpec> { use nexo_llm::ToolDef; use nexo_poller::{CustomToolHandler, CustomToolSpec, PollerRunner}; use async_trait::async_trait; struct JiraSearch; #[async_trait] impl CustomToolHandler for JiraSearch { async fn call( &self, runner: Arc<PollerRunner>, args: Value, ) -> anyhow::Result<Value> { // Use `runner` to inspect / mutate jobs the same way // built-in `pollers_*` tools do — list_jobs, run_once, // set_paused, reset_cursor are all available. let id = args["id"] .as_str() .ok_or_else(|| anyhow::anyhow!("`id` required"))?; let outcome = runner.run_once(id).await?; Ok(json!({ "matching": outcome.items_seen })) } } vec![CustomToolSpec { def: ToolDef { name: "jira_search".into(), description: "Run the Jira poll job once without persisting state.".into(), parameters: json!({ "type": "object", "properties": { "id": { "type": "string" } }, "required": ["id"] }), }, handler: Arc::new(JiraSearch), }] } }
The agent then sees jira_search automatically — no extra
registration step. The adapter in
nexo-poller-tools::register_all walks every registered Poller's
custom_tools() and wires each spec into the per-agent
ToolRegistry.
What the runner gives you for free
- Per-job
tokiotask withevery | cron | atschedule + jitter. - Cross-process atomic lease in SQLite (lease takeover after TTL expiry — daemon crash mid-tick is recoverable).
- Cursor persistence — your
next_cursoris the next tick'sctx.cursor. Survives restarts.agent pollers reset <id>clears it. - Exponential backoff on
Transient, auto-pause onPermanent. - Per-job circuit breaker keyed on
("poller", job_id). - Outbound dispatch via Phase 17 —
OutboundDeliverylands atplugin.outbound.<channel>.<instance>resolved from the agent's binding. You never touch the broker. - 7 Prometheus series labelled by
kind,agent,job_id,status. Audit log undertarget=credentials.audit. - Admin endpoints + CLI subcommands (
agent pollers …). - Six generic LLM tools (
pollers_list,pollers_show,pollers_run,pollers_pause,pollers_resume,pollers_reset). - Hot-reload via
POST /admin/pollers/reload—add | replace | remove | keepplan applied atomically.
Tests pattern
#![allow(unused)] fn main() { #[tokio::test] async fn validate_accepts_minimal() { let p = JiraPoller; let cfg = json!({ "base_url": "https://x.atlassian.net", "project_key": "ENG", "deliver": { "channel": "telegram", "to": "1" }, }); p.validate(&cfg).unwrap(); } #[tokio::test] async fn validate_rejects_unknown_field() { let p = JiraPoller; let cfg = json!({ "wat": true, "deliver": { "channel": "x", "to": "1" }}); assert!(p.validate(&cfg).is_err()); } }
Cursor / dispatch tests follow the same pattern as the in-tree
built-ins (gmail.rs, rss.rs, webhook_poll.rs).
Anti-patterns
- Don't publish to the broker directly from
tick. ReturnOutboundDeliveryso the runner uses Phase 17 + audit log. - Don't share global state across modules. Use cursors for
per-job state; use
DashMapinside your struct for per-account caches (gmail does this forGoogleAuthClient). - Don't sleep inside
tickfor backoff. ReturnPollerError::Transientand let the runner own the backoff schedule — that wayagent pollers resetand hot-reload still cancel cleanly. - Don't auto-create jobs from inside an LLM tool. The runner
intentionally exposes only read + control on existing jobs.
Operators own
pollers.yaml.
Build a poller plugin (V2 — out-of-tree subprocess)
Phase 96 introduced the [plugin.poller] manifest section. Out-of-tree
poller plugins ship as standalone Cargo crates publishing to crates.io,
spawned as subprocesses by the daemon, and communicating with the
runtime via broker JSON-RPC. The daemon's nexo-poller runtime stays
provider-agnostic — pollers reach the world through a single egress
trait (PollerHost) for outbound, credentials, logs, metrics, and LLM
invocations.
If you maintained an in-tree builtin under crates/poller/src/builtins/
before Phase 96, migrate to this recipe. The legacy
nexo-poller-ext StdioRuntime bridge is deprecated since v0.2.0 and
slated for deletion two release cycles after Phase 96 ships.
Three steps
- Scaffold a new Cargo crate that depends on
nexo-microapp-sdkwith thepollerfeature. - Implement
PollerHandler::tick— fetch, parse, dispatch viahost.broker_publish, return aTickAck. - Write a
nexo-plugin.tomldeclaring[plugin.poller]plus the broker topics your plugin needs to subscribe / publish on.
Step 1 — Cargo.toml
[package]
name = "nexo-poller-jira"
version = "0.1.0"
edition = "2021"
[[bin]]
name = "nexo-poller-jira"
path = "src/main.rs"
[lib]
name = "nexo_poller_jira"
path = "src/lib.rs"
[dependencies]
nexo-microapp-sdk = { version = "0.2", features = ["plugin", "poller"] }
nexo-poller = "0.2"
nexo-broker = "0.1"
nexo-config = "0.1"
tokio = { version = "1", features = ["macros", "rt-multi-thread", "sync", "time", "io-util", "io-std"] }
async-trait = "0.1"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
anyhow = "1"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
reqwest = { version = "0.12", default-features = false, features = ["rustls-tls"] }
Step 2 — PollerHandler implementation
#![allow(unused)] fn main() { // src/lib.rs use std::sync::Arc; use async_trait::async_trait; use serde::Deserialize; use serde_json::json; use nexo_microapp_sdk::poller::{PollerHandler, TickRequest}; use nexo_poller::{PollerError, PollerHost, TickAck, TickMetrics}; #[derive(Debug, Deserialize, Clone)] #[serde(deny_unknown_fields)] pub struct JiraJobConfig { pub base_url: String, pub project_key: String, pub deliver: DeliverCfg, } #[derive(Debug, Deserialize, Clone)] #[serde(deny_unknown_fields)] pub struct DeliverCfg { pub channel: String, #[serde(alias = "recipient")] pub to: String, } pub struct JiraHandler { http: reqwest::Client, } impl JiraHandler { pub fn new() -> Self { Self { http: reqwest::Client::builder() .timeout(std::time::Duration::from_secs(30)) .build() .expect("reqwest"), } } } #[async_trait] impl PollerHandler for JiraHandler { async fn tick( &self, req: TickRequest, host: Arc<dyn PollerHost>, ) -> Result<TickAck, PollerError> { let cfg: JiraJobConfig = serde_json::from_value(req.config.clone()) .map_err(|e| PollerError::Config { job: req.job_id.clone(), reason: e.to_string() })?; // Fetch from Jira (replace with real API call). let resp = self.http.get(&format!("{}/rest/api/3/search?jql=project={}", cfg.base_url, cfg.project_key)) .send().await .map_err(|e| PollerError::Transient(anyhow::Error::from(e)))?; if !resp.status().is_success() { return Err(PollerError::Transient(anyhow::anyhow!("HTTP {}", resp.status()))); } let _body: serde_json::Value = resp.json().await .map_err(|e| PollerError::Transient(anyhow::Error::from(e)))?; // Resolve the outbound channel's account_id via reverse-RPC. let cred = host.credentials_get(cfg.deliver.channel.clone()).await .map_err(|e| PollerError::Permanent(anyhow::anyhow!("credentials_get: {e}")))?; let account_id = cred.get("account_id") .and_then(|v| v.as_str()) .ok_or_else(|| PollerError::Permanent(anyhow::anyhow!("no account_id")))? .to_string(); let topic = format!("plugin.outbound.{}.{}", cfg.deliver.channel, account_id); // Dispatch one message per new issue. let payload = json!({ "to": cfg.deliver.to, "text": "new Jira issue" }); let payload_bytes = serde_json::to_vec(&payload) .map_err(|e| PollerError::Transient(anyhow::Error::from(e)))?; host.broker_publish(topic, payload_bytes).await .map_err(|e| PollerError::Transient(anyhow::anyhow!("broker_publish: {e}")))?; Ok(TickAck { next_cursor: None, next_interval_hint: None, metrics: Some(TickMetrics { items_seen: 1, items_dispatched: 1 }), }) } } }
Step 3 — nexo-plugin.toml
manifest_version = 2
[plugin]
id = "jira"
version = "0.1.0"
name = "Jira Poller"
description = "Jira issues poller — fetches new issues, dispatches via deliver channel."
min_nexo_version = ">=0.2.0"
[plugin.entrypoint]
command = "nexo-poller-jira"
[plugin.requires]
nexo_capabilities = ["broker"]
[plugin.capabilities.broker]
subscribe = [
"plugin.poller.jira.tick",
"_inbox.>",
]
publish = [
"daemon.rpc.jira",
"plugin.outbound.whatsapp.>",
"plugin.outbound.telegram.>",
"_inbox.>",
]
[plugin.poller]
kinds = ["jira"]
broker_topic_prefix = "plugin.poller.jira"
lifecycle = "long_lived"
max_concurrent_ticks = 1
tick_timeout_secs = 60
Operator config
# pollers.yaml
jobs:
- id: backend_jira
kind: jira
agent: ana
schedule: { every: 15m }
config:
base_url: "https://acme.atlassian.net"
project_key: "ENG"
deliver:
channel: telegram
to: "-1001234567"
Install + boot
cargo install nexo-poller-jira
agent run
The daemon discovers the plugin via its [plugin.entrypoint] line,
registers the jira kind in the PluginPollerRouter, and routes
matching jobs through broker JSON-RPC. The plugin's broker subscriber
receives ticks on plugin.poller.jira.tick, dispatches to your
PollerHandler::tick, encodes the TickAck into the wire reply, and
publishes back on the message's reply_to topic.
What PollerHost exposes
The poller reaches the runtime through one trait. Four methods:
| Method | Use case |
|---|---|
broker_publish(topic, payload) | Outbound — direct to broker (Phase 92 path) |
credentials_get(channel) | Resolve { account_id, … } for the outbound channel |
log(level, message, fields) | Structured log forwarded to daemon tracing |
metric_inc(name, labels) | Counter increment forwarded to daemon Prometheus |
llm_invoke(request) | LLM completion through daemon's LlmRegistry |
No OutboundDelivery, no Channel enum, no credential bundle types
in your code — your plugin owns its own outbound logic and topic
construction.
Migrating from V1 (in-tree builtin)
If you maintained a builtin under crates/poller/src/builtins/:
- Create the standalone repo from the recipe above.
- Copy your
Poller::tickbody intoPollerHandler::tick. Three rename rules:ctx.credentials.resolve(agent, channel)→host.credentials_get(channel).awaitOutboundDelivery { channel, recipient, payload }push → build the topic yourself (plugin.outbound.<channel>.<account_id>) and callhost.broker_publish(topic, payload_bytes)TickOutcome { items_seen, items_dispatched, deliver, next_cursor, next_interval_hint }→TickAck { next_cursor, next_interval_hint, metrics: Some(TickMetrics { items_seen, items_dispatched }) }
- Drop your entry from
crates/poller/src/builtins/mod.rs::register_all. - Publish your new crate to crates.io. The daemon's
[plugin.poller]manifest discovery picks it up at boot.
The reference Phase 96 extractions live at
nexo-rs-poller-rss, nexo-rs-poller-google-calendar, and
nexo-rs-poller-gmail — see those repos for end-to-end examples
with broker subscriber boot, reverse-RPC credential refresh, and
serde-driven config parsing.
Deploy on Hetzner Cloud (CX22)
A concrete recipe for a single-VPS production deploy. CX22 is the Hetzner sweet spot — €3.79/mo, 2 vCPU, 4 GB RAM, 40 GB SSD, ARM64, 20 TB transfer included. Runs the Nexo daemon + an internal NATS broker comfortably with headroom for the browser plugin (Chrome).
This recipe targets a single-tenant personal-agent deploy. For multi-tenant or multi-process see Phase 32.
What you end up with
- Nexo daemon under systemd, auto-start on boot
- NATS broker on the same host (
nats-serverfrom the official Debian package), auto-start - Cloudflare Tunnel for inbound HTTPS without opening ports
- UFW firewall: only outbound + cloudflared
- Unattended security upgrades
- TLS handled by Cloudflare; no Let's Encrypt cert renewal to babysit
Estimated cost: ~€4/month (CX22 only; Cloudflare Tunnel is free).
0. Prerequisites
- Hetzner Cloud account with API token
- Cloudflare account with a domain pointed at it
- SSH key uploaded to Hetzner (
hcloud ssh-key create --name ops --public-key-from-file ~/.ssh/id_ed25519.pub)
1. Provision the VPS
Via Hetzner Cloud console: New Server → Location: any close to
your users → Image: Debian 12 → Type: CX22 (ARM64, shared
vCPU). Add your SSH key. Name it nexo-1.
CLI alternative:
hcloud server create \
--name nexo-1 \
--type cx22 \
--image debian-12 \
--ssh-key ops \
--location nbg1
Wait ~30s, grab the IPv4 from the dashboard.
2. Initial hardening (one-time)
SSH in as root, then drop privileges to a sudo user:
ssh root@<ip>
adduser ops
usermod -aG sudo ops
rsync --archive --chown=ops:ops ~/.ssh /home/ops
exit
ssh ops@<ip>
sudo apt update && sudo apt full-upgrade -y
sudo apt install -y unattended-upgrades ufw fail2ban
sudo dpkg-reconfigure -p low unattended-upgrades
# Firewall: deny inbound, allow outbound + ssh from your IP only
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow from <your-home-ip> to any port 22 proto tcp
sudo ufw enable
# Disable root SSH + password auth
sudo sed -i 's/^#\?PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config
sudo sed -i 's/^#\?PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
sudo systemctl restart ssh
3. Install Nexo from the .deb
Once Phase 27.4 ships and a release exists with an arm64 .deb:
curl -LO https://github.com/lordmacu/nexo-rs/releases/latest/download/nexo-rs_arm64.deb
# Verify the signature first (Phase 27.3):
curl -LO https://github.com/lordmacu/nexo-rs/releases/latest/download/nexo-rs_arm64.deb.bundle
cosign verify-blob \
--bundle nexo-rs_arm64.deb.bundle \
--certificate-identity-regexp 'https://github.com/lordmacu/nexo-rs/.*' \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
nexo-rs_arm64.deb \
|| { echo "REFUSING TO INSTALL UNSIGNED PACKAGE"; exit 1; }
sudo apt install ./nexo-rs_arm64.deb
The post-install scaffolds the nexo user, owns
/var/lib/nexo-rs/, and prints next steps. Does not auto-start
the service — that comes after we wire config.
4. Install + enable NATS
# Hetzner Debian repo doesn't ship nats-server; use the upstream .deb
NATS_VERSION=2.10.20
curl -LO "https://github.com/nats-io/nats-server/releases/download/v${NATS_VERSION}/nats-server-v${NATS_VERSION}-linux-arm64.deb"
sudo apt install ./nats-server-v${NATS_VERSION}-linux-arm64.deb
sudo systemctl enable --now nats-server
NATS now listens on 127.0.0.1:4222 (loopback only) — exactly
what we want; only Nexo running on the same host should reach it.
5. Wire Nexo config
sudo -u nexo nexo setup
The wizard asks for:
- LLM provider keys (Anthropic / MiniMax / etc.) — paste them; they
land in
/var/lib/nexo-rs/secret/mode 0600 owned bynexo:nexo - WhatsApp / Telegram pairing — defer if not needed yet
- Memory backend — pick
sqlite-vec(default for single-host)
The wizard writes /etc/nexo-rs/{agents,broker,llm,memory}.yaml.
Verify broker.yaml points at nats://127.0.0.1:4222.
6. Cloudflare Tunnel for HTTPS
The Nexo admin port (8080) shouldn't be exposed directly. Use a tunnel:
# Install cloudflared
curl -LO https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-arm64.deb
sudo apt install ./cloudflared-linux-arm64.deb
# Authenticate (opens a browser link — visit it on your laptop)
cloudflared tunnel login
# Create tunnel
cloudflared tunnel create nexo-1
# Route a hostname
cloudflared tunnel route dns nexo-1 nexo.yourdomain.com
# Config
sudo mkdir -p /etc/cloudflared
sudo tee /etc/cloudflared/config.yml >/dev/null <<EOF
tunnel: nexo-1
credentials-file: /home/ops/.cloudflared/<UUID>.json
ingress:
- hostname: nexo.yourdomain.com
service: http://127.0.0.1:8080
- service: http_status:404
EOF
# Run as a service
sudo cloudflared service install
sudo systemctl enable --now cloudflared
Now https://nexo.yourdomain.com reaches the Nexo admin via
Cloudflare's edge — TLS terminated at Cloudflare, no cert renewal,
DDoS protection bundled.
7. Start Nexo
sudo systemctl enable --now nexo-rs
sudo journalctl -u nexo-rs -f
You should see the boot sequence: config validated → broker connected → agents loaded → ready.
8. Verify
# Local health check (over the loopback)
curl -fsSL http://127.0.0.1:8080/health
# External via the tunnel
curl -fsSL https://nexo.yourdomain.com/health
# Metrics endpoint
curl -fsSL http://127.0.0.1:9090/metrics | head -20
9. Backups
The state lives in /var/lib/nexo-rs/. Daily snapshot to S3 /
Backblaze:
# /etc/cron.daily/nexo-backup
#!/bin/sh
set -eu
TIMESTAMP=$(date -u +%Y%m%dT%H%M%SZ)
BACKUP="/tmp/nexo-${TIMESTAMP}.tar.zst"
# Pause the runtime briefly so SQLite isn't mid-write.
systemctl stop nexo-rs
tar -I 'zstd -19 -T0' \
-cf "$BACKUP" \
-C /var/lib/nexo-rs \
--exclude='./queue/*.tmp' \
.
systemctl start nexo-rs
# Upload — adjust to your storage backend
rclone copy "$BACKUP" remote:nexo-backups/
rm "$BACKUP"
# Retain last 30
rclone delete --min-age 30d remote:nexo-backups/
chmod +x /etc/cron.daily/nexo-backup.
For a sub-second pause-free backup, use SQLite's
VACUUM INTO-based hot backup — track Phase 36 (backup, restore,
migrations) for the upcoming nexo backup subcommand.
10. Updates
# Pull the latest .deb
curl -LO https://github.com/lordmacu/nexo-rs/releases/latest/download/nexo-rs_arm64.deb
# Verify (always)
cosign verify-blob ...
# Install (apt restarts the service automatically)
sudo apt install ./nexo-rs_arm64.deb
Or wire the apt repo (Phase 27.4 follow-up) and run
apt upgrade nexo-rs like any other system package.
Limits + escape hatches
- Browser plugin uses ~300 MB RAM per Chrome process. CX22 has 4 GB; budget 2 instances tops. Bump to CX32 (€7/mo, 4 vCPU, 8 GB) when you start hitting OOM.
- NATS on the same host is fine for single-tenant; for multi-host, run NATS on its own VM (CX12, €3.29/mo).
- TLS at Cloudflare only means traffic between Cloudflare's edge and your VPS is plain HTTP over the tunnel. The tunnel is encrypted at the transport layer (QUIC + mTLS to Cloudflare), so this is fine — but if you want defense-in-depth, terminate TLS again locally with caddy or nginx.
Troubleshooting
- Tunnel disconnects after reboot —
systemctl status cloudflared. The credentials file moved if you reinstalled cloudflared with a differentservice install. Re-runcloudflared service installaftercloudflared tunnel login. - NATS refuses connections — the upstream .deb binds
0.0.0.0:4222by default. Edit/etc/nats-server/nats-server.confto sethost: 127.0.0.1andsystemctl restart nats-server. - Nexo can't write to /var/lib/nexo-rs/ —
sudo chown -R nexo:nexo /var/lib/nexo-rs && sudo chmod 0750 /var/lib/nexo-rs.
Related
- Docker compose — single-machine but containerized (vs systemd-native here)
- Native install — the underlying mechanics of step 3 if you skip the .deb
- Phase 27.4 (Debian / RPM packages) — source of the
.debthis recipe consumes
Deploy on Fly.io
Recipe for a single-region Fly.io deploy. Fly's strengths fit Nexo well: persistent volumes (for the SQLite state), health checks, free TLS, easy multi-region scale-out, and a generous free tier (up to 3 shared-1x VMs free) that covers a personal agent.
What you end up with
- Nexo daemon + bundled local NATS broker on a single Fly machine
- Persistent volume mounted at
/var/lib/nexo-rs/ - Free TLS via
fly.iosubdomain (custom domain optional) - Auto-redeploy on every git push to
main(via Fly GitHub Action) - Fly's built-in metrics + log streaming
Estimated cost: $0–$5/mo (free tier covers shared-1x VM + small volume; bigger Chrome workloads = $5-15/mo on a performance-1x).
0. Prerequisites
# Install flyctl
curl -L https://fly.io/install.sh | sh
fly auth login
fly auth signup # if first time
# Confirm:
fly version
1. Initialize the app
From the repo root:
fly launch \
--name nexo-yourname \
--region <closest-region> \
--vm-cpu-kind shared \
--vm-cpus 1 \
--vm-memory 1024 \
--no-deploy
--no-deploy lets us tweak the generated fly.toml before the
first build.
2. fly.toml
Replace the auto-generated fly.toml with this:
app = "nexo-yourname"
primary_region = "ams" # or whichever closest
# Use the published GHCR image instead of building per-deploy.
[build]
image = "ghcr.io/lordmacu/nexo-rs:latest"
# Persistent state — Fly volumes survive restarts and are
# mounted into the VM. SQLite + transcripts + secret/ live here.
[mounts]
source = "nexo_data"
destination = "/app/data"
# Override the container CMD so config + state align with the
# fly volume layout. NEXO_HOME defaults to /app/data so
# everything writable lands on the volume.
[env]
RUST_LOG = "info"
NEXO_HOME = "/app/data"
# `services` block tells Fly which container ports to expose.
[[services]]
internal_port = 8080
protocol = "tcp"
auto_stop_machines = false # keep the agent running 24/7
auto_start_machines = true
min_machines_running = 1
[[services.ports]]
port = 80
handlers = ["http"]
force_https = true
[[services.ports]]
port = 443
handlers = ["tls", "http"]
[services.concurrency]
type = "connections"
soft_limit = 200
hard_limit = 250
[[services.tcp_checks]]
interval = "15s"
timeout = "2s"
grace_period = "30s"
# Metrics endpoint — Fly scrapes Prometheus-style automatically.
[metrics]
port = 9090
path = "/metrics"
# VM sizing — bump to performance-1x when the browser plugin is on.
[[vm]]
cpu_kind = "shared"
cpus = 1
memory_mb = 1024
3. Create the volume
fly volumes create nexo_data --region ams --size 3
3 GB covers SQLite + a few months of transcripts. Bump as needed.
4. Set secrets
Fly's secret store injects them as env vars at runtime. Reference
them from config/llm.yaml via ${ENV_VAR} placeholders:
fly secrets set ANTHROPIC_API_KEY=sk-ant-...
fly secrets set MINIMAX_API_KEY=...
fly secrets set MINIMAX_GROUP_ID=...
# Anything else your llm.yaml references via ${...}
The Nexo config loader resolves ${ANTHROPIC_API_KEY} placeholders
from the process env — works the same whether the env vars come
from /run/secrets/, ~/.bashrc, or Fly secrets.
5. Pre-bake the config
Fly mounts /app/data from the volume but /app/config lives
inside the image. Two options:
Option A — bake config into a custom image (recommended). Wrap the GHCR image in a tiny Dockerfile:
# Dockerfile.fly
FROM ghcr.io/lordmacu/nexo-rs:latest
# Copy your operator config tree into the image. Adjust to
# whatever your setup needs — just don't ship secrets here, use
# fly secrets for those.
COPY ./config/fly /app/config
# fly.toml's CMD already passes `--config /app/config`.
Then change fly.toml:
[build]
dockerfile = "Dockerfile.fly"
Option B — write config to the volume on first boot. Use a
Fly machine init script that runs nexo setup --non-interactive --from-env once, then exits.
6. Deploy
fly deploy
First deploy spins up the volume + machine. Subsequent deploys hot-swap the image with zero-downtime rolling restart.
7. Verify
# Health
fly status
curl https://nexo-yourname.fly.dev/health
# Metrics (over the Fly internal network)
fly proxy 9090:9090 -a nexo-yourname &
curl http://127.0.0.1:9090/metrics | head -20
# Logs
fly logs
# SSH in if something looks off
fly ssh console
8. Custom domain
fly certs add nexo.yourdomain.com
# Add the CNAME to your DNS as instructed
fly certs check nexo.yourdomain.com
9. Continuous deploy on push
Drop this into .github/workflows/fly-deploy.yml:
name: fly-deploy
on:
push:
branches: [main]
permissions:
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: superfly/flyctl-actions/setup-flyctl@master
- run: flyctl deploy --remote-only
env:
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
Get a token: fly tokens create deploy -x 999999h. Drop in repo
secrets as FLY_API_TOKEN.
10. Backups
# Manual snapshot
fly volumes snapshots create nexo_data
fly volumes snapshots list nexo_data
# Restore (creates a new volume from the snapshot)
fly volumes create nexo_data_restored \
--snapshot-id vs_xxxxxxxxxxxx \
--region ams
For automated backups, set up a daily Fly cron machine that runs
fly volumes snapshots create against the data volume.
Limits + escape hatches
- Free tier shared-1x has 1 vCPU + 256 MB RAM — too small for
the browser plugin. Disable Chrome (
plugins.browser.enabled: false) on shared-1x; or bump to performance-1x ($15/mo, 1 vCPU + 2 GB). - Single-region by default — Fly has a multi-region story
but the broker (NATS) doesn't speak Fly's distributed
primitives. For multi-region, run NATS on a dedicated VM with
NatsBrokercluster mode and pin Nexo machines to the same region as their broker. - Volume snapshots cost $0.15/GB/month — small but adds up if you keep many. Auto-prune via the snapshot cron.
Troubleshooting
- Volume mount fails on machine start —
fly volumes listmust show the volume in the same region as the machine. Mismatch = create the volume in the right region or move the machine. - Out of memory + machine cycles — most likely the browser
plugin loaded Chrome on a shared-1x. Check
fly logsfor OOM killer messages; bump VM size or disable the browser plugin. - Secrets not picked up after deploy — Fly redacts them in
logs but they're in the env. SSH in (
fly ssh console), runprintenv | grep ANTHROPICto verify.
Related
- Docker GHCR — same image Fly pulls
- Hetzner deploy — bare-VM alternative if you outgrow Fly's free tier or want full control
- Phase 27.5 (Docker GHCR) — source of the image this recipe pulls
Deploy on AWS (EC2)
Recipe for a single-AZ AWS deploy on t4g.small (ARM Graviton).
Fits a personal-agent or small team; production multi-AZ scale-out
needs Phase 32 multi-host orchestration.
What you end up with
- Nexo daemon under systemd on EC2 + EBS gp3 for state
- Nginx + ACM cert for TLS termination (free)
- Route53 hostname pointing at the instance
- IAM role granting only SES send + S3 backup-bucket access (no console / no read of other AWS resources)
- Daily snapshot of the EBS volume + lifecycle policy retaining 30
- CloudWatch agent shipping
/var/log/nexo-rs/*.log+ metrics
Estimated cost (us-east-1, on-demand):
t4g.smallinstance: ~$13.43/mogp316 GB EBS: ~$1.28/mo- Route53 hosted zone: $0.50/mo
- ACM cert: free
- SES outbound (5k emails/mo on free tier first 12 months): free then $0.10/1k
- Total: ~$15-20/mo
Cheaper alternative for personal-agent budgets: use Hetzner's CX22 at €4/mo if you don't need AWS-specific integrations.
0. Prerequisites
- AWS account with billing alarms set
- Route53 hosted zone for your domain
- AWS CLI installed and
aws configure'd locally - Terraform 1.5+ if you want infra-as-code (recommended)
1. Provision via Terraform (recommended)
The repo will eventually ship deploy/terraform/aws/ (Phase 40
follow-up). Until then, here's a minimal main.tf:
terraform {
required_providers {
aws = { source = "hashicorp/aws", version = "~> 5.0" }
}
}
provider "aws" {
region = "us-east-1"
}
# --- VPC + subnet -----------------------------------------------------
resource "aws_vpc" "nexo" {
cidr_block = "10.0.0.0/16"
enable_dns_support = true
enable_dns_hostnames = true
tags = { Name = "nexo" }
}
resource "aws_subnet" "nexo_public" {
vpc_id = aws_vpc.nexo.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
map_public_ip_on_launch = true
}
resource "aws_internet_gateway" "nexo" {
vpc_id = aws_vpc.nexo.id
}
resource "aws_route_table" "nexo_public" {
vpc_id = aws_vpc.nexo.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.nexo.id
}
}
resource "aws_route_table_association" "nexo_public" {
subnet_id = aws_subnet.nexo_public.id
route_table_id = aws_route_table.nexo_public.id
}
# --- security group ----------------------------------------------------
resource "aws_security_group" "nexo" {
name = "nexo"
vpc_id = aws_vpc.nexo.id
# SSH only from your home IP — replace 1.2.3.4/32 with yours.
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["1.2.3.4/32"]
}
# 443 open to the world, terminated at nginx on the instance.
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# 80 only to redirect to https.
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# --- IAM role: SES + S3 backups, nothing else --------------------------
resource "aws_iam_role" "nexo" {
name = "nexo-instance"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = { Service = "ec2.amazonaws.com" }
}]
})
}
resource "aws_iam_role_policy" "nexo" {
name = "nexo-instance-policy"
role = aws_iam_role.nexo.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{ Effect = "Allow", Action = ["ses:SendEmail","ses:SendRawEmail"], Resource = "*" },
{ Effect = "Allow", Action = ["s3:PutObject","s3:GetObject","s3:DeleteObject","s3:ListBucket"], Resource = ["arn:aws:s3:::your-nexo-backups","arn:aws:s3:::your-nexo-backups/*"] }
]
})
}
resource "aws_iam_instance_profile" "nexo" {
name = "nexo-instance"
role = aws_iam_role.nexo.name
}
# --- AMI lookup: latest Debian 12 arm64 -------------------------------
data "aws_ami" "debian" {
most_recent = true
owners = ["136693071363"] # Debian official
filter {
name = "name"
values = ["debian-12-arm64-*"]
}
}
# --- instance ----------------------------------------------------------
resource "aws_instance" "nexo" {
ami = data.aws_ami.debian.id
instance_type = "t4g.small"
subnet_id = aws_subnet.nexo_public.id
vpc_security_group_ids = [aws_security_group.nexo.id]
iam_instance_profile = aws_iam_instance_profile.nexo.name
key_name = "your-existing-aws-keypair-name"
root_block_device {
volume_size = 16
volume_type = "gp3"
encrypted = true
}
tags = {
Name = "nexo-1"
}
}
# --- Route53 DNS -------------------------------------------------------
data "aws_route53_zone" "main" {
name = "yourdomain.com."
}
resource "aws_route53_record" "nexo" {
zone_id = data.aws_route53_zone.main.zone_id
name = "nexo.yourdomain.com"
type = "A"
ttl = 300
records = [aws_instance.nexo.public_ip]
}
output "nexo_ip" {
value = aws_instance.nexo.public_ip
}
Then:
terraform init
terraform apply
# review the plan; type 'yes'
2. Hardening + install (post-provision)
SSH in:
ssh admin@nexo.yourdomain.com
sudo apt update && sudo apt full-upgrade -y
sudo apt install -y unattended-upgrades ufw fail2ban nginx certbot python3-certbot-nginx
sudo dpkg-reconfigure -p low unattended-upgrades
# UFW — defense in depth on top of the security group
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
# Disable root SSH + password auth
sudo sed -i 's/^#\?PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config
sudo sed -i 's/^#\?PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
sudo systemctl restart ssh
Install Nexo (when 27.4 .deb is available):
curl -LO https://github.com/lordmacu/nexo-rs/releases/latest/download/nexo-rs_arm64.deb
# Verify Cosign signature first (Phase 27.3) — see verify.md
sudo apt install ./nexo-rs_arm64.deb
NATS:
NATS_VERSION=2.10.20
curl -LO "https://github.com/nats-io/nats-server/releases/download/v${NATS_VERSION}/nats-server-v${NATS_VERSION}-linux-arm64.deb"
sudo apt install ./nats-server-v${NATS_VERSION}-linux-arm64.deb
sudo systemctl enable --now nats-server
3. nginx + ACM-via-certbot
sudo tee /etc/nginx/sites-available/nexo >/dev/null <<'EOF'
server {
listen 80;
server_name nexo.yourdomain.com;
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name nexo.yourdomain.com;
# Cert paths populated after `certbot --nginx`
ssl_certificate /etc/letsencrypt/live/nexo.yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/nexo.yourdomain.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
# Health check — proxied through to the daemon
location /health { proxy_pass http://127.0.0.1:8080; access_log off; }
location /ready { proxy_pass http://127.0.0.1:8080; access_log off; }
# Admin surface (auth via the daemon's session token)
location /api/ { proxy_pass http://127.0.0.1:8080; }
location /admin/ { proxy_pass http://127.0.0.1:8080; }
# Block /metrics from public — scrape internally only
location /metrics { return 403; }
}
EOF
sudo ln -s /etc/nginx/sites-available/nexo /etc/nginx/sites-enabled/nexo
sudo nginx -t
# Issue cert (ACME via Let's Encrypt — same chain ACM uses)
sudo certbot --nginx -d nexo.yourdomain.com --non-interactive --agree-tos -m ops@yourdomain.com
sudo systemctl reload nginx
If you want AWS ACM specifically (instead of Let's Encrypt), front the EC2 with an ALB and attach an ACM cert there — adds ~$18/mo for the ALB. Most personal deploys don't need it.
4. Wire SES for outbound email
The IAM role grants ses:SendEmail. Configure in config/llm.yaml:
plugins:
email:
provider: ses
aws_region: us-east-1
# Credentials come from the EC2 instance profile — no keys
# in the YAML.
sender: "agent@nexo.yourdomain.com"
Verify the sender domain in SES first:
aws ses verify-domain-identity --domain yourdomain.com
# Add the printed TXT record to Route53
aws ses set-identity-mail-from-domain --identity yourdomain.com \
--mail-from-domain mail.yourdomain.com
If your SES account is still in sandbox, request production access via the SES console — required to send to non-verified recipients.
5. EBS snapshots + lifecycle
# Daily snapshot via DLM (Data Lifecycle Manager) — set up once
# in Terraform or via the console:
aws dlm create-lifecycle-policy \
--description "nexo daily snapshots, retain 30" \
--state ENABLED \
--execution-role-arn arn:aws:iam::ACCT:role/AWSDataLifecycleManagerDefaultRole \
--policy-details '{...}' # see DLM docs
Or the cheap way: cron + aws ec2 create-snapshot on the
instance itself, retaining 30 days locally.
6. CloudWatch logs + metrics
sudo apt install -y amazon-cloudwatch-agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
# Point at /var/log/nexo-rs/*.log + 9090/metrics scrape
The Prometheus metrics endpoint can be pulled by CloudWatch Container Insights via the EMF agent if you go in that direction. For most personal deploys, journalctl + a Grafana Cloud free-tier scrape is cheaper.
Limits + escape hatches
- t4g.small RAM (2 GB) is tight if the browser plugin is on.
Bump to
t4g.medium(4 GB, ~$26/mo) before turning on Chrome. - Single AZ. AZ outage = full downtime. Multi-AZ needs Phase 32 + an external NATS cluster. Acceptable for personal agents; not for SLAs.
- SES sandbox limit (200 emails/day) until you request production. Plan for this if email channel is primary.
- EIP not allocated. Stop/start the instance and the public IP changes. Allocate an Elastic IP (free when attached) if the Route53 record can't auto-update.
Troubleshooting
- Nexo can't send email —
aws sts get-caller-identityfrom the instance must show thenexo-instancerole. If empty, the instance profile is missing. - certbot --nginx fails — DNS hasn't propagated yet. Wait 5-10 min after the Route53 record creation.
/healthreturns 503 — broker not ready.systemctl status nats-server; if good, checkjournalctl -u nexo-rsfor credential errors (instance profile didn't propagate, orconfig/llm.yamlreferences a key the instance can't reach).
Related
- Hetzner Cloud — bare-VM, cheaper
- Fly.io — easier scaling, less AWS lock-in
- Phase 27.4 (Debian package) — source of the .deb this recipe consumes
- Phase 27.3 (Cosign) — signature verification before install
NATS with TLS + auth
Harden the broker for a multi-node deployment: mTLS on the client connection, NKey-based authentication, and a separate NATS server process (not the throwaway Docker-compose one).
Prerequisites
- A NATS server ≥ 2.10
nscCLI for generating NKeys- The agent binary deployed where it will run
1. Generate NKeys
nsc add operator --generate-signing-key nexo-ops
nsc add account --name nexo-prod
nsc add user --name agent-kate --account nexo-prod
nsc generate creds --account nexo-prod --name agent-kate > secrets/agent-kate.nkey
secrets/agent-kate.nkey is a single-file credential that contains
both the NKey seed and the signed JWT. Treat it like any other
secret — gitignored, Docker-secret, k8s-secret.
2. Configure the NATS server
nats-server.conf:
listen: 0.0.0.0:4222
http: 0.0.0.0:8222
tls {
cert_file: "/etc/nats/tls/server.crt"
key_file: "/etc/nats/tls/server.key"
ca_file: "/etc/nats/tls/ca.crt"
verify: true # require client certs too (mTLS)
}
authorization {
operator = "/etc/nats/nsc/operator.jwt"
resolver = MEMORY
accounts = [
{ name: nexo-prod, jwt: "/etc/nats/nsc/nexo-prod.jwt" }
]
}
Start the server:
nats-server -c nats-server.conf
3. Configure the agent
config/broker.yaml:
broker:
type: nats
url: tls://nats.example.com:4222
auth:
enabled: true
nkey_file: ./secrets/agent-kate.nkey
persistence:
enabled: true
path: ./data/queue
fallback:
mode: local_queue
drain_on_reconnect: true
The agent reads nkey_file at startup and presents it on every
connection.
4. Verify the client
Before starting the full agent, smoke-test the credentials with the
nats CLI:
nats --creds ./secrets/agent-kate.nkey \
--tlsca /etc/nats/tls/ca.crt \
-s tls://nats.example.com:4222 \
pub test.topic "hello"
If this works, the agent will too.
5. Deploy
Start the agent as usual:
agent --config ./config
On boot the agent:
- Opens a TLS connection to the broker
- Presents its NKey + JWT
- Server validates against the operator/account JWT
- Subscribes only to subjects its account is allowed to access
6. Multi-agent isolation
Give each agent its own NKey and an export/import declaration in the NSC account so agents can talk to each other on specific subjects only. Example policy:
# allow kate to publish agent.route.ops
# deny kate from publishing plugin.outbound.* (only the WA plugin should)
The agent does not enforce NATS auth itself — it just presents credentials. The broker enforces. That's the point: you can revoke a compromised agent without touching the agent's code or config.
Observability
circuit_breaker_state{breaker="nats"}flips to1if the broker rejects the credentials on startup or after a refreshdisk queuebuffers every publish while the circuit is open — see Event bus — disk queuenats --traceon the server side logs every auth failure with the rejected subject
Gotchas
verify: true(mTLS) requires client certs and NKey auth. Picking one or the other is a policy choice — don't half-configure.- JWT expiry. Account JWTs expire; NSC's
pushcommand renews them against the resolver. - Disk queue on client side. Even with auth misconfigured, the
agent keeps running on the local fallback; operators may miss the
outage without alerting on
circuit_breaker_state.
Cross-links
Rotating config without downtime
Three practical hot-reload scenarios. Each shows the YAML edit, how to trigger the swap, and what the operator should see in the logs and on the metrics endpoint. Reference: Config hot-reload.
Prerequisites
- A running daemon (
agentin another terminal or under systemd). - Broker reachable from the same host (
broker.yaml). - Phase 16 + Phase 18 features enabled (default since
0.xof nexo-rs).
A quick sanity check:
$ agent reload
reload v1: applied=1 rejected=0 elapsed=14ms
✓ ana
If you get exit 1 with "no control.reload.ack received within 5s",
the daemon isn't running or runtime.reload.enabled is false —
fix that first.
1. Rotate an LLM API key
The Anthropic key on production rotates every 90 days. Old key still valid for an hour after the rotation.
Edit
config/llm.yaml:
providers:
anthropic:
- api_key: ${file:./secrets/anthropic_old.txt}
+ api_key: ${file:./secrets/anthropic_new.txt}
base_url: https://api.anthropic.com
Apply
# Drop the new key first, THEN trigger the reload — the file watcher
# would also do it 500 ms after the save, the CLI is just explicit.
$ printf '%s' "sk-ant-..." > secrets/anthropic_new.txt
$ chmod 600 secrets/anthropic_new.txt
$ agent reload
reload v2: applied=2 rejected=0 elapsed=22ms
✓ ana
✓ bob
Verify
# The aggregate counter bumped:
$ curl -s localhost:9090/metrics | grep config_reload_applied_total
config_reload_applied_total 2
# Per-agent versions advanced:
$ curl -s localhost:9090/metrics | grep runtime_config_version
runtime_config_version{agent_id="ana"} 2
runtime_config_version{agent_id="bob"} 2
# Watch one agent's next turn — the new key is used by the LlmClient
# rebuilt inside RuntimeSnapshot::build:
$ tail -f agent.log | grep "llm request"
In-flight LLM calls keep using the old client (the in-flight Arc<dyn LlmClient> is captured per-turn). They land in <30 s; the old key is
still valid for the hour the auth team gave you.
2. A/B test a system prompt
You want to roll out a friendlier sales pitch on Ana's WhatsApp binding without touching the Telegram one (which has a longer support persona).
Edit
config/agents.d/ana.yaml:
inbound_bindings:
- plugin: whatsapp
allowed_tools: [whatsapp_send_message]
outbound_allowlist:
whatsapp: ["573115728852"]
- system_prompt_extra: |
- Channel: WhatsApp sales. Follow the ETB/Claro lead-capture flow.
+ system_prompt_extra: |
+ Channel: WhatsApp sales (variant B — warmer tone).
+ Follow the ETB/Claro lead-capture flow but lead with a personal
+ greeting and use first names.
- plugin: telegram
instance: ana_tg
allowed_tools: ["*"]
...
Apply
The file watcher picks the save up automatically:
$ tail -f agent.log
INFO config reload applied version=3 applied=["ana"] rejected_count=0 elapsed_ms=18
Or trigger manually:
$ agent reload
reload v3: applied=1 rejected=0 elapsed=18ms
✓ ana
Verify
Send one message on each channel and tail the LLM request log to see which prompt block went to the model.
$ grep "snapshot_version=3" agent.log
INFO inbound matched binding agent_id=ana plugin=whatsapp \
binding_index=0 snapshot_version=3
Telegram binding's system_prompt_extra is unchanged; only the WA binding picks up variant B.
Roll back
If variant B underperforms, git revert the YAML and agent reload.
Sessions in flight finish their turn on B; the next inbound is back
on A.
3. Tighten an outbound allowlist after an incident
A jailbroken prompt almost made Ana send WhatsApp messages to arbitrary numbers (Phase 16's defense-in-depth caught it). Until you investigate, narrow the allowlist to the on-call advisor only.
Edit
config/agents.d/ana.yaml:
inbound_bindings:
- plugin: whatsapp
allowed_tools: [whatsapp_send_message]
outbound_allowlist:
whatsapp:
- - "573115728852"
- - "573215555555"
- - "573009999999"
+ - "573115728852" # incident-only: on-call advisor
Apply
$ agent reload
reload v4: applied=1 rejected=0 elapsed=15ms
✓ ana
Verify
Try the previously-allowed-but-now-blocked number from a test message. The LLM will try; the tool will reject:
ERROR tool_call rejected reason="recipient 573215555555 is not in \
this agent's whatsapp outbound allowlist"
The session's Arc<RuntimeSnapshot> is captured at the start of each
turn, so even mid-conversation the next user reply re-loads from the
new snapshot and the allowlist update takes effect immediately.
What you cannot reload (yet)
- Adding or removing agents — restart the daemon. Phase 19.
- Plugin instances (
whatsapp.yaml,telegram.yamlinstance blocks) — restart the daemon. Plugin sessions own QR pairing / long-polling state that needs lifecycle plumbing. Phase 19. broker.yaml,memory.yaml— restart the daemon. Long-lived connections + storage handles aren't safe to swap mid-flight.workspace,skills_dir,transcripts_diron an agent — restart that agent.
The daemon logs every restart-required field that changed during a
reload as warn so you don't have to remember which knob lives where.
See also
- Config hot-reload — full behaviour reference
- agents.yaml — per-binding override surface
- Per-agent credentials — credential
rotation has its own
POST /admin/credentials/reloadendpoint - Metrics —
config_reload_*series
Architecture overview
nexo-rs is a single-process multi-agent runtime. One binary (agent)
hosts every agent, every channel plugin, every extension, and the
persistence layer. Coordination between components happens over NATS
(with a local tokio-mpsc fallback when NATS is offline).
Why single-process: shared in-memory caches, zero IPC overhead between agent and tool invocations, simpler ops. The broker and disk queue give us the durability a multi-process layout would provide, without the coordination cost.
High-level layout
flowchart TB
subgraph PROC[agent process]
direction TB
subgraph PLUGINS[Channel plugins]
WA[WhatsApp]
TG[Telegram]
MAIL[Email / Gmail poller]
BR[Browser CDP]
GOOG[Google APIs]
end
subgraph BUS[Event bus]
NATS[(NATS)]
LOCAL[(Local mpsc fallback)]
DQ[(Disk queue + DLQ)]
end
subgraph AGENTS[Agent runtimes]
A1[Agent: ana]
A2[Agent: kate]
A3[Agent: ops]
end
subgraph STORE[Persistence]
STM[(Short-term sessions<br/>in-memory)]
LTM[(Long-term memory<br/>SQLite + sqlite-vec)]
WS[(Workspace-git<br/>per agent)]
end
subgraph TOOLS[Tools & integrations]
EXT[Extensions<br/>stdio / NATS]
MCP[MCP client / server]
LLM[LLM providers]
end
PLUGINS --> BUS
BUS --> AGENTS
AGENTS --> BUS
AGENTS --> STORE
AGENTS --> TOOLS
TOOLS --> LLM
end
USERS[End users] <--> PLUGINS
Workspace crates
The Cargo.toml workspace defines these member crates:
| Crate | Responsibility |
|---|---|
crates/core | Agent runtime, trait, SessionManager, HookRegistry, heartbeat, tool registry |
crates/broker | NATS client, local fallback, disk queue, DLQ, backpressure |
crates/llm | LLM clients (MiniMax, Anthropic, OpenAI-compat, Gemini), retry, rate limiter |
crates/memory | Short-term sessions, long-term SQLite, vector search via sqlite-vec |
crates/config | YAML parsing, env-var resolution, secrets loading |
crates/extensions | Manifest parser, discovery, stdio + NATS runtimes, watcher, CLI |
crates/mcp | MCP client (stdio + HTTP), server mode, tool catalog, hot-reload |
crates/taskflow | Durable flow state machine with wait/resume |
crates/resilience | CircuitBreaker three-state machine |
crates/setup | Interactive wizard, YAML patcher, pairing flows |
crates/tunnel | Public HTTPS tunnel for pairing / webhooks |
crates/plugins/browser | Chrome DevTools Protocol client |
crates/plugins/whatsapp | Wrapper over whatsapp-rs (Signal Protocol) |
crates/plugins/telegram | Bot API client |
crates/plugins/email | IMAP / SMTP |
crates/plugins/gmail-poller | Cron-style Gmail → broker bridge |
crates/plugins/google | Gmail / Calendar / Drive / Sheets tools |
Binaries
Defined in Cargo.toml:
| Binary | Entry | Purpose |
|---|---|---|
agent | src/main.rs | Main daemon; also exposes setup, dlq, ext, flow, status subcommands |
browser-test | src/browser_test.rs | CDP integration smoke test |
integration-browser-check | src/integration_browser_check.rs | End-to-end browser flow validation |
llm_smoke | src/bin/llm_smoke.rs | LLM provider smoke test |
Runtime topology
agent runs a single tokio multi-thread runtime. Work is split into
independent tasks:
flowchart LR
MAIN[main tokio runtime]
MAIN --> PA[Per-agent runtime task]
MAIN --> PI[Plugin intake loops]
MAIN --> HB[Heartbeat scheduler]
MAIN --> MCP[MCP runtime manager]
MAIN --> EXT[Extension stdio runtimes]
MAIN --> MET[Metrics server :9090]
MAIN --> HEALTH[Health server :8080]
MAIN --> ADMIN[Admin console :9091]
MAIN --> LOCK[Single-instance lock watcher]
Each agent runtime owns its own subscription to inbound topics, its own
session manager view, its own LLM-loop state. Agents do not share
mutable in-memory state — coordination between agents happens over the
event bus (agent.route.<target_id>).
What lives where — quick mental model
- A message arrives → lands on
plugin.inbound.<channel>(NATS) - Agent runtime consumes it →
SessionManagerattaches or creates a session,HookRegistryfiresbefore_message - LLM loop runs → tools invoked through the registry, which calls
into extensions / MCP / built-ins, each wrapped by
CircuitBreaker - Tool result flows back →
after_tool_callhooks fire, LLM decides next turn - Agent emits reply → publishes to
plugin.outbound.<channel> - Channel plugin delivers → physical message goes to the user
Details per subsystem:
Agent runtime
The agent runtime is the per-agent machinery that consumes inbound
events, drives the LLM loop, invokes tools, and emits outbound events.
One AgentRuntime is instantiated per configured agent at boot; each
runs as its own async task.
Source: crates/core/src/agent/ (behavior.rs, agent.rs,
runtime.rs, hook_registry.rs), boot in src/main.rs.
AgentBehavior trait
Every agent implements AgentBehavior (crates/core/src/agent/behavior.rs).
The trait is intentionally small — default no-ops let built-in types
(like LlmAgentBehavior) override only what they need.
| Method | Fires on | Default |
|---|---|---|
on_message(ctx, msg) | Inbound message from a plugin | no-op |
on_event(ctx, event) | Any event on a subscribed topic | no-op |
on_heartbeat(ctx) | Periodic tick (if heartbeat enabled) | no-op |
decide(ctx, msg) | LLM-reasoning hook (stub for custom flows) | empty string |
The shipped LlmAgentBehavior implements the full chat-completion
loop with tool calls, streaming, rate-limited retry, and hook fan-out.
Boot sequence
sequenceDiagram
participant Main as src/main.rs
participant Cfg as AppConfig
participant Disc as Extension discovery
participant SM as SessionManager
participant TR as ToolRegistry
participant LLM as LLM client
participant AR as AgentRuntime
participant Bus as Broker
Main->>Cfg: load(config_dir)
Main->>Disc: run_extension_discovery()
Main->>SM: with_cap(ttl, max_sessions)
Main->>TR: register built-ins + extensions + MCP
Main->>LLM: build per provider (w/ CircuitBreaker)
loop per agent in config
Main->>AR: new(agent_id, behavior, tools, sm, llm, broker)
AR->>Bus: subscribe plugin.inbound.<channel>+
AR->>Bus: subscribe agent.route.<agent_id>
AR-->>Main: ready
end
Main->>Main: install signal handlers
Main->>Main: serve forever
Request/response lifecycle
A single inbound message drives the following flow inside one agent runtime:
sequenceDiagram
participant Bus as NATS
participant AR as AgentRuntime
participant SM as SessionManager
participant HR as HookRegistry
participant LLM as LLM
participant TR as ToolRegistry
participant Ext as Extension / MCP / built-in
Bus->>AR: plugin.inbound.<ch>
AR->>SM: get_or_create(session_key)
AR->>HR: fire("before_message")
loop LLM turn loop
AR->>LLM: completion(messages, tools)
LLM-->>AR: assistant turn (text or tool_calls)
alt tool_calls present
AR->>HR: fire("before_tool_call", name, args)
AR->>TR: invoke(tool_name, args)
TR->>Ext: call
Ext-->>TR: result
TR-->>AR: result
AR->>HR: fire("after_tool_call", name, result)
else text only
AR->>Bus: publish plugin.outbound.<ch>
end
end
AR->>HR: fire("after_message")
SessionManager
Defined in crates/core/src/session/manager.rs. Tracks per-user
conversational state in memory.
- Key:
SessionKeyderived from(agent_id, channel, sender_id); group chats get one session per group - Storage:
DashMap<SessionKey, Session>— lock-free concurrent map - TTL: configured via
memory.short_term.session_ttl(default 30 min); each access updateslast_access - Cap: soft limit
DEFAULT_MAX_SESSIONS = 10,000; on overflow the oldest-idle session is evicted before insert - Sweeper: background task scans every 1 s, removes expired entries
- Callbacks:
on_expire()fires viatokio::spawnwhen a session is dropped — used by the MCP runtime to tear down per-session children
stateDiagram-v2
[*] --> Active: first message
Active --> Active: on_message / on_event<br/>(last_access updated)
Active --> Expired: idle > TTL
Active --> Evicted: cap exceeded,<br/>oldest-idle chosen
Expired --> [*]: sweeper removes
Evicted --> [*]: on_expire() fires
HookRegistry
Defined in crates/core/src/agent/hook_registry.rs. Lets extensions
inject behavior at well-known points in the lifecycle without patching
the runtime.
- Hook names: arbitrary strings. In practice the runtime fires:
before_message,after_message,before_tool_call,after_tool_call,on_session_start,on_session_end - Fan-out: sequential by priority (lower first), insertion order breaks ties
- Cap: 128 handlers per hook name — defensive guard against a buggy extension re-registering on every reload
- Errors: logged, treated as
Continue— one misbehaving hook does not cascade into the rest - Override: a hook may return
Override(new_args)to mutate what the next hook (or the runtime itself) sees
Heartbeat
# per-agent config
heartbeat:
enabled: true
interval: 30s
- Scheduled per agent if
heartbeat.enabled: true - Interval parsed via
humantime— anyhumantimeduration works - Each tick:
- Fires
AgentBehavior::on_heartbeat(ctx) - Publishes
agent.events.<agent_id>.heartbeat
- Fires
- Typical uses: proactive messages ("good morning"), reminders, external state syncs (pull Gmail, scan calendar), liveness pings
Graceful shutdown
src/main.rs installs SIGTERM / Ctrl+C handlers. On signal, the process
tears down in a specific order so in-flight work finishes cleanly:
flowchart TD
SIG[SIGTERM / Ctrl+C] --> C1[Cancel dream-sweep loops<br/>5 s grace]
C1 --> C2[Mark /ready = false<br/>stop new traffic]
C2 --> C3[Stop plugin intake<br/>no new inbound]
C3 --> C4[Shutdown MCP runtime manager<br/>5 s clean close]
C4 --> C5[Shutdown extensions<br/>5 s grace then kill_on_drop]
C5 --> C6[Stop agent runtimes<br/>drain buffered messages]
C6 --> C7[Abort metrics + health tasks]
C7 --> EXIT([exit 0])
This order is enforced in src/main.rs around lines 1389–1458.
Extensions get the longest grace period because stdio children can be
mid-tool-call; the disk queue absorbs any events that the plugins
couldn't finish publishing.
Why this shape
- One tokio runtime, many tasks: lets you run 10 agents on one CPU core when idle, saturates cores under load. No thread-per-agent bloat.
- No shared mutable state across agents: each agent holds its own registry views, its own session map. Cross-agent communication goes over the bus → visible, replayable, testable.
- Hooks instead of inheritance: extensions customize behavior without recompiling the core. Every insertion point is named, sequenced, and capped.
Event bus (NATS)
Every piece of communication between plugins, agents, and the broker
layer itself flows over NATS (async-nats = 0.35). When NATS is
offline, a local tokio::mpsc bus takes over and a SQLite-backed disk
queue holds events until reconnection. No events are lost.
Source: crates/broker/ (nats.rs, local.rs, disk_queue.rs,
topic.rs).
Why NATS
- Subject-based routing fits the "N plugins × M agents" fan-out
naturally (
plugin.inbound.*wildcards) - Low-latency pub/sub with no broker-side state to manage
- Cluster-ready without rewriting the data plane
- Async-nats is mature, has
JetStreamif we ever need it
The design doc discusses the alternatives (RabbitMQ, Redis streams)
that were rejected; see proyecto/design-agent-framework.md.
Subject namespace
| Pattern | Direction | Example | Who publishes | Who subscribes |
|---|---|---|---|---|
plugin.inbound.<plugin> | plugin → agent | plugin.inbound.whatsapp | Channel plugins | Agent runtimes |
plugin.inbound.<plugin>.<instance> | plugin → agent | plugin.inbound.telegram.sales_bot | Multi-instance plugins (WA, TG) | Agent runtimes |
plugin.outbound.<plugin> | agent → plugin | plugin.outbound.whatsapp | Agent tools (send, reply…) | Channel plugins |
plugin.outbound.<plugin>.<instance> | agent → plugin | plugin.outbound.whatsapp.ana | Agent tools | Specific plugin instance |
plugin.health.<plugin> | plugin → runtime | plugin.health.browser | Plugins | Health server |
agent.events.<agent_id> | internal | agent.events.ana | Runtime internals | Dashboards, tests |
agent.events.<agent_id>.heartbeat | scheduler → agent | agent.events.kate.heartbeat | Heartbeat scheduler | That agent |
agent.route.<target_id> | agent → agent | agent.route.ops | Sending agent's delegate tool | Target agent runtime |
taskflow.resume | external → flow | taskflow.resume | Anything (other agents, services, ops) | TaskFlow resume bridge |
Multi-instance plugins append an .<instance> suffix so two WhatsApp
accounts (e.g. Ana's line and Kate's line) can run side by side without
subject collisions.
Agent-to-agent routing
sequenceDiagram
participant Ana
participant Bus as NATS
participant Ops
Ana->>Ana: LLM decides to delegate
Ana->>Bus: publish agent.route.ops<br/>(correlation_id=X)
Bus->>Ops: deliver
Ops->>Ops: on_message handler runs
Ops->>Bus: publish agent.route.ana<br/>(correlation_id=X)
Bus->>Ana: deliver
Ana->>Ana: correlate reply by ID
The sender always includes a correlation_id in the event envelope;
the receiver echoes it on the reply. That's how one agent can fan out
to several agents and reassemble results.
Broker abstraction
crates/broker exposes a Broker trait implemented by two backends:
NatsBroker— real NATS connection wrapped in aCircuitBreakerLocalBroker— in-processtokio::mpscfor tests and offline mode
Switching between them is driven by config. The local broker matches
NATS subject semantics (including . segments and > wildcards),
which keeps the test surface identical to production.
Disk queue
When a publish to NATS fails — circuit breaker open, connection lost, transient 5xx — the event is persisted to the disk queue instead of being dropped.
| Property | Value |
|---|---|
| Storage | SQLite |
| Default path | ./data/ (configurable via broker.persistence.path) |
| Tables | pending_events, dead_letters |
| Event format | JSON serialization of Event { id, topic, payload, enqueued_at, attempts } |
| Drain order | FIFO by enqueued_at |
| Batch size | up to 100 per drain() call |
| Max attempts before DLQ | 3 (DEFAULT_MAX_ATTEMPTS) |
flowchart LR
PUB[publish] --> OK{NATS up?}
OK -->|yes| NATS[(NATS)]
OK -->|no| ENQ[disk_queue.enqueue]
ENQ --> SQLITE[(pending_events)]
RECON[NATS reconnect] --> DRAIN[disk_queue.drain]
SQLITE --> DRAIN
DRAIN --> NATS
DRAIN -.->|3 attempts failed| DLQ[(dead_letters)]
DRAIN -.->|deserialization error| DLQ
Drain on reconnect
When NatsBroker detects reconnection, it calls disk_queue.drain():
- Read up to 100 oldest events from
pending_events - Republish each to NATS
- On success: delete row
- On failure: increment
attempts, leave row in place - Once
attempts >= 3: move todead_letters
Dead-letter queue (DLQ)
Events that exhaust retries, or fail to deserialize at all, land in
dead_letters. They're not silently discarded — CLI lets you inspect
and replay them.
agent dlq list # show all dead events
agent dlq replay <event_id> # move one back to pending_events
agent dlq purge # drop the table (destructive!)
Replay moves the entry back to pending_events; the next drain cycle
retries it with attempts reset.
Backpressure
Two independent mechanisms:
- Local broker channels are 256-capacity
tokio::mpscper subscriber. If a subscriber is slow, dropped events log aslow consumerwarning but the subscription stays alive. - Disk queue applies proportional sleep at >50% capacity (scaled
from 0 ms up to
MAX_BACKPRESSURE_MS = 500 ms). At the hard cap it additionally drops the oldest event and sleeps 500 ms — an intentional "shed load, don't block the producer forever" stance.
The disk queue's backpressure only matters when NATS is down for a long time and the producer is faster than real time. In normal operation the disk queue stays near-empty.
Local fallback
When NATS is unreachable or the circuit breaker on the publish path is Open, the runtime degrades gracefully:
- Inbound events from local plugins (e.g. a Telegram webhook fielded
in-process) go through
LocalBrokerand reach agents immediately - Outbound events that target a plugin hosted in the same process
(which is every shipped plugin) also go through
LocalBroker - Anything that would have crossed a real NATS hop sits in the disk queue until reconnection
In practice, single-machine deployments keep working even with no NATS at all — the disk queue and the local broker together are sufficient for one process. NATS starts earning its keep the moment you scale to multiple processes, machines, or regions.
Broker deployment shapes
The nexo daemon supports three legitimate broker deployment shapes,
covering single-host dev / production server clusters / embedded
mobile builds with a single broker.yaml switch + (in some cases)
a different build mode. Phase 92's stdio-bridge closed the gap that
forced operators to install NATS for any single-host deployment.
Picking a shape
┌────────────────────────────────────────────────────────────┐
│ Multiple daemons across hosts? │
│ │
│ YES ──→ Shape 1: NATS │
│ broker.yaml type: nats │
│ Plugins: subprocess (extracted out-of-tree) │
│ Infra: NATS server / cluster on the network │
│ │
│ NO ──→ Single host. Mobile / embedded? │
│ │
│ YES (Android / iOS / WASM / Flutter FFI) │
│ ──→ Shape 3: Embedded │
│ broker.yaml type: local │
│ Plugins: lib-linked Rust crates (no spawn) │
│ Infra: none │
│ │
│ NO (laptop dev / server deb / desktop app) │
│ ──→ Shape 2: Server single-host │
│ broker.yaml type: local │
│ Plugins: subprocess (extracted) │
│ Infra: none — stdio-bridge handles xprocess│
│ │
└────────────────────────────────────────────────────────────┘
Shape 1 — Server multi-host (NATS)
The classical setup. NATS server runs on the network; every daemon in the cluster connects to it. Subprocess plugins connect directly to the same NATS, addressed by URL.
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ daemon A │ │ daemon B │ │ daemon C │
│ (host 1) │ │ (host 2) │ │ (host 3) │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────┐
│ NATS cluster (TLS-mTLS) │
└─────────────────────────────────────────────────┘
▲ ▲ ▲
│ │ │
┌──────┴───────┐ ┌──────┴───────┐ ┌──────┴───────┐
│ whatsapp │ │ marketing │ │ telegram │
│ subprocess │ │ subprocess │ │ subprocess │
│ (host 1) │ │ (host 2) │ │ (host 3) │
└──────────────┘ └──────────────┘ └──────────────┘
Configuration:
# broker.yaml
broker:
type: nats
url: "nats://nats.example.com:4222"
auth:
enabled: true
nkey_file: /etc/nexo/nats.nkey
Daemon stamps subprocess plugins with NEXO_BROKER_KIND=nats +
NEXO_BROKER_URL=<url>, so each plugin connects to the same NATS
server independently. Cross-host fanout is NATS's job.
Shape 2 — Server single-host (stdio bridge)
Daemon and plugins share one host. No NATS server installed; the
daemon's in-process Local broker (tokio::mpsc) handles every
event. Subprocess plugins reach the local broker through a JSON-RPC
stdio bridge piggybacking on the channel the daemon already opens
for tool.invoke.
┌─────────────────────────────────────────────────────────────────┐
│ daemon process (single host) │
│ │
│ ┌─────────────────────────────────┐ │
│ │ LocalBroker (tokio::mpsc) │ │
│ └──┬───────────────────────────┬──┘ │
│ │ │ │
│ │ broker.publish │ broker.subscribe forwarder │
│ │ broker.event │ pre-subscribed from │
│ │ │ manifest at boot │
│ ▼ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ subprocess.rs JSON-RPC dispatch │ │
│ │ - tool.invoke (existing) │ │
│ │ - broker.publish (81.14.b) │ │
│ │ - broker.event (81.14.b) │ │
│ └──┬─────────────────────────────┬─┘ │
└─────┼─────────────────────────────┼─────────────────────────────┘
│ stdin ◀──┐ ┌──▶ stdout
▼ │ │
┌─────────────┐ │ │ ┌─────────────┐
│ whatsapp │──┤ ├──│ marketing │
│ subprocess │ │ │ │ subprocess │
│ │ │ │ │ │
│ Stdio- │ │ │ │ SDK's │
│ Bridge- │ │ │ │ Broker- │
│ Broker │ │ │ │ Sender + │
│ (Phase 92) │ │ │ │ on_broker_ │
│ │ │ │ │ event │
└─────────────┘ │ │ └─────────────┘
│ │
▼ ▼
(each plugin's StdioBridgeBroker holds an
mpsc::Sender<Value> the SDK's PluginAdapter
drains onto its single async stdout writer)
Configuration:
# broker.yaml
broker:
type: local
url: ""
Daemon stamps subprocess plugins with NEXO_BROKER_KIND=stdio_bridge
and omits NEXO_BROKER_URL (no network endpoint). Each plugin's
main.rs reads the env, calls
PluginAdapter::with_stdio_bridge_broker(), and wraps the returned
StdioBridgeBroker in AnyBroker::stdio_bridge. From there the
plugin's existing BrokerHandle::publish / .subscribe code
keeps working unchanged.
Marketing uses the SDK's BrokerSender + on_broker_event hooks
directly (it never went through AnyBroker::from_config); those
hooks already route through the same stdout writer, so marketing
needs zero migration to participate in this shape.
Shape 3 — Embedded (Android / iOS / WASM)
Daemon and plugins linked into a single binary as Rust crates.
No subprocess spawns at all; the LocalBroker works directly
because every plugin shares the same process memory.
┌─────────────────── APK / Bundle / WASM module ──────────────────┐
│ │
│ Single process: │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ daemon core │ │ WhatsappPlugin│ │ MarketingPlugin│ │
│ │ │ │ (crate) │ │ (crate) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └──────────────────┴──────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ LocalBroker │ │
│ │ (tokio::mpsc │ │
│ │ in-process) │ │
│ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Configuration:
# broker.yaml
broker:
type: local
No env vars stamped — there's no subprocess. The host (Android JNI,
Flutter FFI shim, WASM glue) injects an Arc<AnyBroker::Local>
directly into each plugin factory. Use of stdio-bridge is impossible
in this shape because there's nothing on the other end of stdin.
Plugin crates already expose the surface for this; the build flip
is a feature flag on the daemon and a different main entrypoint
in the host shim. See Phase 90 (Android embed) for the concrete
build pipeline.
Daemon env vars stamped on each subprocess plugin
| Env var | When stamped | Plugin reads when |
|---|---|---|
NEXO_BROKER_KIND | always for whatsapp + telegram instance factories (Phase 92.4) | constructing the broker in main.rs |
NEXO_BROKER_URL | only when KIND is nats (Phase 92.4 omits for stdio_bridge) | constructing the NATS BrokerInner |
For non-instance plugins (marketing today, plugins discovered via
plugins.discovery.search_paths without a per-instance factory)
the env clear path isn't applied; they inherit NEXO_BROKER_KIND
from the daemon's own env if the operator exported it before
launching the daemon. The Phase 92 dev-daemon.sh template seeds
NEXO_BROKER_KIND in the daemon's environment for this reason.
Migration path from a pre-92 deployment
A single-host operator that installed NATS only because subprocess plugins forced it (the pre-92 default behaviour) follows this sequence after pulling a 92-or-later release:
- Upgrade the daemon binary to one that contains Phase 92.
- Upgrade each subprocess plugin binary (whatsapp, telegram) to one that contains the matching 92.6 migration.
- Stop NATS:
sudo systemctl stop nats-server. - Switch broker.yaml back to
type: local. - Restart the daemon. Verify
plugins.discovery: plugin registry wire complete loaded=N invalid=… init_failed_total=0in the log. - Exercise an end-to-end flow (e.g. send a WhatsApp message, expect the bot reply). The whole pipeline now runs without any external broker.
Cluster operators (Shape 1) are unaffected — type: nats continues
to work identically.
Source map
| Concern | File |
|---|---|
BrokerKind::StdioBridge enum | crates/config/src/types/broker.rs |
StdioBridgeBroker impl | crates/broker/src/stdio_bridge.rs |
AnyBroker::StdioBridge variant | crates/broker/src/any.rs |
Daemon-side broker.publish handler | crates/core/src/agent/nexo_plugin_registry/subprocess.rs (Phase 81.14.b) |
Daemon-side broker.event forwarder | same file (auto-subscribe from manifest) |
seed_*_subprocess_env_for helpers | proyecto/src/main.rs |
SDK with_stdio_bridge_broker helper | crates/microapp-sdk/src/plugin.rs |
| Plugin migrations (consumers) | nexo-rs-plugin-whatsapp/src/main.rs, nexo-rs-plugin-telegram/src/main.rs |
Phase 92 (the stdio-bridge broker) shipped in v0.1.6 — see the release notes. Remaining sub-phases (an end-to-end integration test, Prometheus metrics for the bridge) are tracked as follow-ups.
Phase 93.11 — Compile-Time Plugin Decoupling Audit
Status: closed 2026-05-16. All 47 anchor sites cleared.
Phase 95 close-out — 6/N plugin extraction milestone
Status: closed 2026-05-17. Web-search joins the canonical subprocess plugin set: browser (81.17.c) → telegram (81.18) → whatsapp (81.19.a) → email (81.19.b) → google (94) → web-search (95).
nexo-rs-plugin-web-search lives in a standalone repo and ships
as nexo-plugin-web-search 0.1.0. The daemon's nexo-core 0.2.0
breaking release removes web_search_router from
AgentContext / AgentRuntime / AgentSpawnConfig /
McpServerBootContext; the WebSearchTool in-process
ToolHandler (crates/core/src/agent/web_search_tool.rs) deleted
entirely. crates/web-search/ survives as a workspace member
for direct consumers (microapp embeds, future MCP standalone
tools); the daemon's compile graph no longer pulls it.
Phase 95 also adds the agnostic tool.invoke.params.policy
framework contract (microapp-sdk 0.1.19 + nexo-core 0.2.0): the
daemon's RemoteToolHandler stamps the per-binding
EffectivePolicy::for_tool(tool_name) slice onto every JSON-RPC
envelope. Future subprocess tools needing per-binding gating
(lsp, dream, fork) reuse the same envelope without daemon-side
changes.
Phase 94 close-out — 5/5 plugin extraction milestone
Status: closed 2026-05-16. The Phase 81 plugin-extraction lineage is complete: browser (81.17.c) → telegram (81.18) → whatsapp (81.19.a) → email (81.19.b) → google (94).
nexo-rs-plugin-google lives in a standalone repo and ships as
nexo-plugin-google 0.2.0 (subprocess binary). The daemon
(nexo-rs) no longer imports nexo_plugin_google::* from main.rs
nor from crates/setup/. crates/plugins/google/ survives as the
lib dep for nexo-poller's google_calendar + gmail builtins
(in-process callers); future cleanup migrates the poller to the
published lib crate so the in-tree dir can be deleted.
Result
cargo tree -i nexo-plugin-{whatsapp,telegram,email,browser}
returns "did not match any packages" against either the
default daemon build or --no-default-features. The daemon
binary compiles with zero direct or transitive dependency on
any canonical channel-plugin crate; pairing, outbound
dispatch, HTTP routes, admin RPC, metrics scrape, dashboard
sources, and pairing triggers all flow through
manifest-declared broker contracts:
[plugin.pairing.adapter](Phase 81.33.b.real Stage 1)[plugin.http](Stage 2)[plugin.admin](Stage 4)[plugin.metrics](Stage 5)[plugin.dashboard](Stage 6)[plugin.pairing.trigger](Phase 81.20.x Stage 7 Phase 2)[plugin.public_tunnel](Phase 81.20.x Stage 7 Phase 2)
Operators wanting an embedded in-process build link the canonical plugin crate directly from a custom binary — the daemon's published default ships with zero hardcoded plugin imports.
Historical TL;DR (pre-close)
Daemon binary (nexo-daemon) cannot build without the four
canonical plugin crates as Cargo dependencies. Phase 93's
opaque-config + PluginsConfig.entries work eliminated
runtime YAML coupling but did not touch the compile-time
import graph. 47 distinct import sites (43 in code, 4 in
Cargo) anchor the daemon to nexo-plugin-whatsapp,
nexo-plugin-telegram, nexo-plugin-email,
nexo-plugin-browser.
Recommendation (closed via Phase 81.20.x Stage 7 Phase 2):
Hybrid path adopted. Cargo feature-gates landed first
(whatsapp/telegram/browser ~Stage 7 Phase 1) and then
the gates themselves were deleted once each plugin shipped
its manifest sections — cargo tree returns unmatched for
all four canonical plugins as of 2026-05-16.
Inventory: 47 sites
Cargo dependency anchors (4)
| File | Dep |
|---|---|
Cargo.toml:376 | nexo-plugin-whatsapp = "0.1.3" (root workspace dep) |
Cargo.toml:377 | nexo-plugin-email = "0.1.3" (root workspace dep) |
Cargo.toml:410 | nexo-plugin-whatsapp = { workspace = true } (daemon bin) |
Cargo.toml:415 | nexo-plugin-telegram = "0.1.1" (daemon bin) |
Cargo.toml:416 | nexo-plugin-email = { workspace = true } (daemon bin) |
crates/setup/Cargo.toml:25 | nexo-plugin-email = { workspace = true } |
crates/setup/Cargo.toml:69 | nexo-plugin-whatsapp = { workspace = true } |
nexo-plugin-browser is NOT a Cargo dep — only env-var
seeding via copy-pasted env_config reference (no compile-
time coupling). Already decoupled in practice.
Code import sites (43) — by classification bucket
Bucket A — dies after 81.32.c7.c (full-parity tools registry extract).
The register_<channel>_tools(&tools) blocks at agent boot
- hot-spawn path. Plugins already expose
NexoPlugin::register_outbound_tools(&self, &ToolRegistry)(trait method exists). Daemon currently hardcodes per-plugin calls as fallback. Migration: drop the hardcoded calls; loop overplugin_handlesand dispatchregister_outbound_tools.
src/main.rs:5128 register_whatsapp_tools (boot path)
src/main.rs:5132 register_telegram_tools (boot path)
src/main.rs:5145 filter_from_allowed_patterns
src/main.rs:5146 register_email_tools_filtered
src/main.rs:5150 EMAIL_TOOL_NAMES.len()
src/main.rs:6917 register_whatsapp_tools (hot-spawn path)
src/main.rs:6920 register_telegram_tools (hot-spawn path)
src/main.rs:7025 filter_from_allowed_patterns
src/main.rs:7026 register_email_tools_filtered
9 sites. Already tracked by Phase 81.32.c7.c, not new debt. Effort: ~4h.
Bucket B — dies after 81.33.b.real (manifest-driven pairing-adapter).
Daemon hardcodes XxxPairingAdapter::new(broker) because
SubprocessNexoPlugin::build_pairing_adapter() still
defaults to None — manifest schema for
[plugin.pairing.adapter] not yet finalised (see
crates/core/src/agent/plugin_host.rs:113-119).
src/main.rs:1654 WhatsappPairingAdapter::new(broker)
src/main.rs:1657 TelegramPairingAdapter::new(broker)
2 sites. New follow-up surfaced: Phase 81.33.b.real
manifest schema + GenericBrokerPairingAdapter.
Effort: ~5h.
Bucket C — dies after 93.5.d (WhatsApp orchestration generalisation). Daemon-owned pairing state map + tunnel config + HTTP UI dispatcher.
src/main.rs:710 SharedPairingState (fn sig)
src/main.rs:1038 SharedPairingState (fn sig)
src/main.rs:1126 QrSnapshot
src/main.rs:2492 pairing_trigger::CHANNEL_ID
src/main.rs:2494 WhatsappPairingTrigger::from_configs
src/main.rs:3226 SharedPairingState
src/main.rs:3244 PairingState::new
src/main.rs:15167 dispatch_route + WhatsappRoute
src/main.rs:15174 PAIR_PAGE_HTML
crates/setup/src/admin_adapters.rs:3728 dispatch::TOPIC_OUTBOUND
crates/setup/src/admin_bootstrap.rs:713 WhatsappBotHandle
crates/setup/src/writer.rs:789 session::pair_once
12 sites. Already tracked as 93.5.d DEFERRED-strict. Trigger: 2nd pairing-based channel with daemon-owned tunnel + admin RPC. Effort when triggered: ~5h.
Bucket D — email in-process integration (autonomous_worker + tool ctx + metrics + wizard validators).
Email is the only canonical plugin that has NOT undergone
subprocess flip. Daemon holds Arc<EmailPlugin> to expose:
dispatcher_handle()for outbound tool routingbounce_store_handle()for delivery receiptsattachments_dir()for MCP server attachment pathshealth_map()for/metricsPrometheus renderingWorkerStateenum for/healthHTML rendering- MCP autonomous_worker mode embeds
EmailToolContextin-process to shareArc<HealthMap>
src/main.rs:715,3289,3299 EmailPlugin construction (boot)
src/main.rs:3698,3766,3767 EmailToolContext (boot)
src/main.rs:14129,14153,14154,14174 EmailPlugin (autonomous_worker)
src/main.rs:15117,15273,15286-15289 metrics + render_email_health + WorkerState
crates/setup/src/services/email.rs:28-31 ImapConnection / provider_hint /
SmtpClient / spf_dkim (wizard validators)
15 sites. Email is structurally in-process by design. Subprocess-flip would require a Phase 81.20.x plan with:
- Broker RPC for
dispatcher_handleoutbound (lat impact) - Cross-process
Arc<HealthMap>sync (state-replication design) - Subprocess-side MCP server merged into autonomous_worker mode (or autonomous_worker stays in-tree as "core service")
- Setup wizard validators (ImapConnection / SmtpClient probe) invocable via wizard-only crate or broker RPC
Estimated effort: 20-30h. No active trigger today — email works in-process, autonomous_worker depends on it.
Imports already dead-after-something (zero today, classify-only)
None. Every import site has a live consumer. Cat C audit 93.5.c (2026-05-15) confirmed zero zombies in the related typed-config sites; same conclusion holds for plugin imports.
Decision matrix
Option 1 — Cargo feature-gates
[features]
default = ["plugin-whatsapp", "plugin-telegram", "plugin-email", "plugin-browser"]
plugin-whatsapp = ["dep:nexo-plugin-whatsapp"]
plugin-telegram = ["dep:nexo-plugin-telegram"]
plugin-email = ["dep:nexo-plugin-email"]
plugin-browser = [] # already no Cargo dep; flag gates env seeding code
Each import site wrapped with #[cfg(feature = "plugin-X")].
Build slim binary with cargo build --no-default-features --features plugin-whatsapp.
Pros.
- Cheap: ~6h for the three subprocess-flipped plugins
(whatsapp + telegram + browser). 9-12
cfgwrap sites (the bucket-A code is already feature-shaped by the if-plugin-present check). - Compile-time enforcement:
cargo build --no-default-features --features plugin-whatsappproves "daemon compiles without telegram crate". - Unlocks Android/embedded slim builds today
(whatsapp-only daemon is real demand per
project_android_flutter_targetmemory). - Composable with subprocess-flip: a feature-disabled plugin
can still run as a discovered subprocess if its manifest
is in
search_paths/. The daemon never imports the crate; the subprocess does.
Cons.
- Doesn't help 3rd-party plugins (still need to be Cargo deps of daemon or run as subprocesses).
- Doesn't decouple email (in-process by design).
- Adds
#[cfg]noise in main.rs (~9-12 blocks). - Workspace default-features quirks already biting
(per
feedback_rustls_default_features_off); each new feature needsdefault-features = falsediscipline.
Effort. ~6h for whatsapp + telegram + browser gates +
test matrix (build --no-default-features per single-channel
combo). Email NOT gated (keeps current shape).
Option 2 — Full dynamic-loading via manifest + broker
Daemon drops all nexo_plugin_X:: imports. Every integration
becomes:
- Pairing:
NexoPlugin::build_pairing_adapter()→GenericBrokerPairingAdapter(81.33.b.real manifest). - Tools:
NexoPlugin::register_outbound_tools(&tool_registry)(already trait, just drop hardcoded fallbacks — 81.32.c7.c). - Pairing trigger map:
NexoPlugin::pairing_trigger()→dyn PairingTrigger(new opt-in trait method). - Tunnel + session_dir:
NexoPlugin::orchestration_descriptor()→ typedOrchestrationRequirements(new — covers tunnel port, pairing-state HTTP route, instance loop driver). - Email in-process: broker RPC OR keep gated in-tree.
Pros.
- True self-describing: drop plugin in
search_paths/, it works without daemon recompile. - Android-friendly (no compile-time plugin baggage on lib target).
- Eliminates the 30+ import-site debt entirely.
- Forces clean trait surfaces (good architectural pressure).
Cons.
- Cost: ~30-50h across multiple sub-phases (manifest schema design, GenericBrokerPairingAdapter, OrchestrationDescriptor, email broker-RPC bridge OR feature-gate, wizard validator extraction).
- Speculative trait shapes without driver: 93.5.d locked DEFERRED-strict for exactly this reason. 93.10 channels_dashboard same shape.
- Email subprocess-flip is 20-30h on its own with real
latency tradeoffs (in-process
Arc<HealthMap>→ broker cross-process sync). - Migration risk: each trait method that mis-fits the real-world 2nd-channel driver = churn.
Effort. 30-50h. No discrete trigger today (no 3rd-party plugin, no email subprocess driver).
Option 3 — Hybrid (recommended)
- Ship feature-gates now (~6h) for whatsapp + telegram + browser. Email stays default-on. This delivers the Android/embedded slim-daemon win NOW.
- Defer dynamic-loading trait surface until trigger: 3rd-party plugin demand OR email subprocess driver lands.
- Land 81.32.c7.c (tool registry full-parity extract, already tracked, ~4h) — kills 9 bucket-A imports regardless of which long-term path wins.
- Land 81.33.b.real (manifest-driven pairing adapter
schema, already implied by L113-119 comment in
plugin_host.rs, ~5h) — kills 2 bucket-B imports.
Combined immediate work ~15h, drops ~11 of 43 code
imports + adds compile-time enforcement that
whatsapp|telegram|browser are optional. Leaves the
email-in-process bucket (15 sites) intact as documented
design choice, not silent debt.
Recommendation
Ship Option 3 Hybrid. Phase 93 closes with:
- 93.5.d DEFERRED-strict (already done)
- 93.10 DEFERRED until non-canon dashboard driver (already done)
- 93.11 = this memo (concrete data + decision)
- 93.12 (new) = ship feature-gates per above
- Bucket-B follow-up = 81.33.b.real (existing implied)
Phase 93 then status = shipped + audit-complete. Email in-process is acknowledged design (not bug). Daemon can ship slim builds for embedded targets immediately. Dynamic-loading becomes a Phase 100+ candidate triggered by real 3rd-party demand, not speculative.
Trigger watchlist (re-open this audit if)
- 3rd-party plugin (slack/discord/sms) requests to ship as pure subprocess without daemon recompile.
- Android Flutter integration finalised + daemon binary size becomes a release-blocker.
- Email subprocess-flip plan opens (would invalidate the "email stays in-process" recommendation).
- 2nd pairing-based channel lands and 93.5.d unblocks — forces revisit of trait shape consensus.
Plugin Auto-Discovery — Design Memo
Status: design memo (no code change). Produced 2026-05-15 to anchor the next 2-4 sessions of work toward the goal:
Adding a new plugin to nexo should be a drop-in operation. The operator places the plugin binary and its manifest in
plugins.discovery.search_pathsand the daemon picks up EVERY capability the plugin declares — outbound tools, credentials, HTTP routes, pairing flows, dashboard surface, per-instance orchestration — without any daemon-side code change.
Reference mining
OpenClaw (/home/familia/chat/research/):
src/channels/plugins/types.plugin.ts:47-96—ChannelPlugindeclarative top:id,meta,capabilities,gatewayMethods,configSchema,reload.src/channels/plugins/types.adapters.ts:76-858— imperative handler split (gateway.startAccount/pairing/auth.login/outbound.send*/messaging.normalizeTarget/directory.self/lifecycle.onAccountConfigChanged).src/channels/plugins/types.core.ts:100—webhookPathas per-account declarative HTTP mount point.src/gateway/server-channels.ts:285-449— daemon-managed per-account lifecycle loop (AbortController, exponential backoff 5s→5min, ≤10 retries, status snapshot).src/plugins/inspect-shape.ts:36-127— runtime introspection classifies plugins asplain-capability / hybrid-capability / hook-only / non-capabilityby countingchannelIds,providerIds,gatewayMethodCount,httpRouteCount.docs/channels/pairing.md:41-49— pairing state lives in~/.openclaw/credentials/<channel>-pairing.json+<channel>-allowFrom.json;pairingadapter is the only per-channel custom logic surface.
claude-code-leak/ ausente en /home/familia/chat/. Mining
absence declared explicitly.
Current Rust shape (crates/core/src/agent/plugin_host.rs):
- L66-199 —
NexoPlugintrait. Already has the auto-discovery shape:manifest(),init(&ctx),shutdown(),build_pairing_adapter(broker),register_outbound_tools(®),configure(&yaml),credential_store(),as_any(). Defaults let new plugins opt in. PluginInitContext(L204-300+) — hands pluginstool_registry,advisor_registry,hook_registry,broker,llm_registry,reload_coord,sessions,long_term_memory,shutdown,channel_adapter_registry,plugin_config. Plenty of extension points already.
The trait + context is already mostly self-describing. What's
missing is daemon-side dispatch — code in src/main.rs that
iterates plugin_handles instead of hardcoding per-plugin
blocks.
Inventory of 12 capability layers
| Layer | Auto-discoverable today? | What blocks it |
|---|---|---|
| Config schema | ✅ done (Phase 93.1-93.4) | — |
| Manifest discovery | ✅ done | — |
| Subprocess lifecycle | ✅ done | — |
| Broker RPC integration | ✅ done | — |
| Credential store | ✅ done (Phase 93.6-93.9) | — |
| Outbound tools | ✅ partial | Phase 81.32.c7.c — daemon-side hardcoded fallbacks (register_whatsapp_tools etc.) coexist with trait method. |
| Pairing adapter | ❌ | Phase 81.33.b.real — trait method exists but no daemon dispatch; subprocess plugins can't supply Rust trait obj across process boundary. |
| HTTP routes | ❌ | Daemon hardcodes /whatsapp/pair. No trait method for plugins to declare routes. |
| Admin RPC commands | ❌ partial | Daemon hardcodes with_wa_bot_handle. No generic admin-RPC registration. |
| Channel dashboard | ✅ partial (Phase 93.10) | ChannelDashboardSource lives in nexo-setup, NOT exposed via NexoPlugin trait. Plugins can't auto-register a dashboard surface. |
| Metrics / health endpoints | ❌ partial | Daemon hardcodes /email/health, /metrics whatsapp-instances JSON. |
| Orchestration | ❌ | Phase 93.5.d — daemon hardcodes whatsapp instance loop, tunnel auto-open, pairing-state map. |
Seven layers need work to reach "drop plugin → daemon discovers everything".
Architectural principles (non-negotiable)
- Manifest is the single source of truth. Anything the daemon
needs to know about a plugin is in
nexo-plugin.toml. Daemon never inspects plugin Cargo features, plugin source code, or plugin runtime state to discover capabilities. - Subprocess boundary is honoured. Rust trait objects do not cross process boundaries. Anywhere the daemon would need to call into the plugin per-message, the dispatch goes through broker JSON-RPC (with caches at hot paths).
- In-tree plugins use trait dispatch; subprocess plugins use
broker dispatch.
NexoPlugintrait methods stay valid for in-tree plugins (Phase 81.20 candidates). For subprocess plugins,SubprocessNexoPlugintranslates trait calls into broker RPCs against a generic adapter constructed from manifest data. - Per-channel custom logic stays in the plugin process.
normalize_sender,auth_check, instance-discovery — every per-channel rule executes inside the subprocess, never in the daemon. Daemon stays generic. - Hardcoded canonical-plugin paths are deprecation-tracked, not deleted opportunistically. Out-of-tree plugin crates ship on their own release cadence; daemon ships fallbacks until canonical plugins opt into the generic path via their own next manifest revision.
Patterns
Two patterns repeat across all 7 remaining layers. Pin them once in the framework; reuse for each layer.
Pattern A: broker-RPC dispatch with cache
For per-event hot paths that need plugin-side logic.
Manifest declares the broker topic shape:
[plugin.pairing.adapter]
channel_id = "whatsapp"
broker_topic_prefix = "plugin.whatsapp"
# daemon will call: <broker_topic_prefix>.pairing.normalize_sender
# <broker_topic_prefix>.pairing.send_reply
# <broker_topic_prefix>.pairing.send_qr_image
Daemon-side adapter:
#![allow(unused)] fn main() { pub struct GenericBrokerPairingAdapter { channel_id: &'static str, broker: AnyBroker, topic_prefix: String, // Cache: raw sender → normalized form. Pairing volume is // low; cache grows bounded by unique senders. normalize_cache: Arc<RwLock<HashMap<String, Option<String>>>>, } }
normalize_sender(raw)checks cache, on miss doesbroker.request("<prefix>.pairing.normalize_sender", raw)with a short timeout, then caches result.send_reply/send_qr_imageare already async — direct broker RPC.
Trade-off. First-sighting of every sender pays a broker round-trip (~1-5ms local). Subsequent lookups are O(1) cache. For pairing flows, this is acceptable because handshakes are rare. For high-throughput hot paths (every inbound message), upfront broadcast-of-known-normalizations would be required — design that into the manifest as a separate batch RPC if a layer needs it.
Sync trait → async broker. PairingChannelAdapter::normalize_sender
is fn sync. The generic adapter uses tokio::runtime::Handle::block_on
inside an inherent async-block-on-cache-miss helper, OR the trait
gets migrated to async fn first (preferred if downstream
callers are already in async contexts).
Pattern B: declarative interpreter
For boot-time setup that needs plugin-side logic but only fires once per startup or per config-reload.
Manifest declares the data; daemon interprets:
[plugin.orchestration]
per_instance_state = true # daemon allocates a state-map keyed by instance
public_tunnel.enabled = true # daemon offers an auto-tunnel knob
public_tunnel.route = "/whatsapp/pair" # daemon mounts the tunneled prefix
inbound_state_topic = "plugin.inbound.whatsapp" # daemon subscribes here for state events
[plugin.http]
mount_prefix = "/whatsapp" # daemon mounts a proxy under this prefix
# requests get forwarded via broker as:
# plugin.<id>.http.<method>.<path-encoded>
Daemon iterates plugin_handles.iter().filter_map(|h| h.manifest().http.as_ref())
and mounts proxies generically.
Trade-off. The manifest schema enumerates known orchestration shapes — adding a NEW shape (e.g. "websocket pairing" vs current "HTTP-poll-for-QR") requires extending the schema. That's a breaking change to the manifest contract, NOT to daemon code. Plugin authors get a compile-time deserialization error pointing at the missing field. Schema evolution is centralized + versioned.
Per-layer design
Layer 6 — outbound tools (already partially generic)
Today. NexoPlugin::register_outbound_tools(®istry) trait
method exists with default no-op. Plugins like
nexo-plugin-whatsapp override to call register_whatsapp_tools(&tools).
Daemon ALSO has hardcoded fallbacks (src/main.rs:5128, 6917)
gated on cfg.plugins.iter().any(|p| p == "whatsapp") — these
fire IN ADDITION TO the trait method, scoped by feature gate.
To close (Phase 81.32.c7.c). Remove hardcoded fallbacks once
canonical plugins ship a manifest declaring
[[plugin.tools.outbound]] per tool. Daemon iterates
plugin_handles[..].register_outbound_tools(®istry) only;
delete the fallback if plugin == "whatsapp" blocks.
Effort. ~3-4h. Touches main.rs boot loop + hot-spawn loop.
Plugin crates must publish a release with the manifest section
first — coordinate via release notes.
Layer 7 — pairing adapter (Phase 81.33.b.real)
Today. NexoPlugin::build_pairing_adapter(broker) trait method
exists with default None. Daemon hardcodes
build_known_pairing_registry() (src/main.rs:1651-1660) that
constructs whatsapp + telegram adapters by Rust type, both
cfg-gated.
To close. Pattern A. New manifest section
[plugin.pairing.adapter] (channel_id, broker_topic_prefix).
GenericBrokerPairingAdapter in nexo-pairing reads manifest
- owns cache.
SubprocessNexoPlugin::build_pairing_adapter()returnsSome(Arc::new(GenericBrokerPairingAdapter::from_manifest(self.manifest(), broker)))when manifest declares the section, elseNone.
Daemon build_known_pairing_registry becomes a loop:
#![allow(unused)] fn main() { for handle in &plugin_handles { if let Some(adapter) = handle.build_pairing_adapter(broker.clone()) { registry.register(adapter); } } }
Canonical plugins (whatsapp, telegram) ship next manifest revision adding the section + handle the broker RPCs in their subprocess. Until then daemon falls back to legacy hardcoded registrations (already cfg-gated).
Trade-off accepted. normalize_sender cache miss = one
broker round-trip per unique sender. Pairing flows are low
volume; cost is invisible in practice.
Effort. ~5h: manifest schema + adapter impl + subprocess plugin RPC handler stubs + integration test.
Layer 8 — HTTP routes
Today. Daemon run_health_server (src/main.rs:~15140+)
hardcodes /whatsapp/* route handler using
nexo_plugin_whatsapp::pairing::dispatch_route. Email and other
channels with HTTP needs would each add hardcoded blocks.
To close. Pattern B. New manifest section:
[plugin.http]
mount_prefix = "/whatsapp"
# daemon forwards every request under this prefix via broker
Daemon-side proxy: a single generic handle_plugin_http_route
function that matches request.path against registered
prefixes, then issues a broker RPC
plugin.<id>.http.request with serialized request bundle.
Plugin subprocess implements its own internal router under that
prefix.
Trade-off. Every HTTP request to a plugin pays a broker
round-trip (~1-2ms local) + serialization. For human-facing
pages (pairing QR, OAuth callbacks) this is invisible. For
machine-to-machine high-throughput webhooks, consider whether
the plugin should listen on its own port directly (avoid the
proxy entirely) and only register a "I have a port" descriptor
for the dashboard. Add a mount_kind: "proxy" | "direct" knob
in the manifest section if needed.
Effort. ~6h: manifest schema + daemon proxy handler + broker RPC contract + subprocess router scaffolding + integration test (round-trip a pairing GET through the proxy).
Layer 9 — admin RPC commands
Today. Setup wizard's admin RPC dispatcher
(crates/setup/src/admin_bootstrap.rs:712) hardcodes
.with_wa_bot_handle(Arc::new(WhatsappBotHandle)). Only
whatsapp currently has plugin-specific admin commands but the
pattern extrapolates poorly.
To close. Pattern A (broker-RPC) for admin command dispatch. Manifest section:
[[plugin.admin.command]]
namespace = "whatsapp" # admin RPC method prefix
methods = ["pair_start", "pair_status", "pair_revoke", "bot_status"]
Daemon's admin dispatcher iterates registered plugin admin
namespaces; on admin.<namespace>.<method> call, forwards via
broker to plugin subprocess.
Removes WhatsappBotHandle typed integration entirely. Other
plugins (telegram bot-info, email account-info) auto-declare
their own admin namespaces.
Effort. ~5h: manifest schema + admin dispatcher generic
routing + broker RPC contract + remove with_wa_bot_handle +
integration test.
Layer 10 — channel dashboard (Phase 93.10 polish)
Today. Phase 93.10 shipped ChannelDashboardSource trait
in nexo-setup with 3 hardcoded canonical impls. New canonical
channel = new impl in nexo-setup = framework code change.
To close. Pattern B. Move ChannelDashboardSource data
into manifest:
[plugin.dashboard]
auth_check_kind = "file_presence" # | "session_dir_with_files" | "broker_probe"
auth_check_args = { path = "telegram_bot_token.txt" }
multi_instance_layout = "single" # | "workspace_walk" | "broker_list"
Daemon-side generic interpreter reads the section + dispatches
to the matching auth-check / instance-discovery handler. For
shapes the interpreter doesn't recognise (rare), fall back to a
broker RPC plugin.<id>.dashboard.discover that the subprocess
implements.
Trade-off. Schema enumerates known auth-check + layout shapes. A 5th channel with a wholly new auth shape (e.g. OAuth-token-presence-with-refresh-due-check) requires extending the enumeration. This is the SAME trade-off as Pattern B elsewhere: schema evolution > framework code change.
Move the 3 canonical sources from nexo-setup to manifest
data on the canonical plugin crates (next release each).
Effort. ~4h: interpreter + manifest schema + migrate 3 canonical impls + integration test.
Layer 11 — metrics / health endpoints
Today. Daemon hardcodes /email/health, /metrics
whatsapp-instances JSON output, etc.
To close. Pattern B + Pattern A combined. Manifest declares which metrics surfaces a plugin owns:
[plugin.metrics]
prometheus = true # daemon scrapes plugin's broker RPC
health_endpoint = "/email/health" # exposed as proxy
/metrics aggregator on daemon already collects from registered
sources. Add a generic BrokerScrapeSource that issues
plugin.<id>.metrics.scrape per scrape interval, parses
Prometheus text response, merges into aggregate.
Trade-off. Per-scrape broker RPC cost (~1ms × number of plugins, ≤10ms total at typical scale). Cache-with-TTL if scrape is high-frequency.
Effort. ~4h.
Layer 12 — orchestration (Phase 93.5.d)
Today. Daemon hardcodes whatsapp orchestration in
src/main.rs (instance loop L3219+, tunnel auto-open L3833+,
pairing-state subscriber spawn L3608+).
To close. Pattern B with the orchestration schema:
[plugin.orchestration]
per_instance_state = true
inbound_state_topic = "plugin.inbound.whatsapp"
inbound_state_events = ["connected", "disconnected", "reconnecting", "qr"]
[plugin.orchestration.public_tunnel]
offer = true
mount_route = "/whatsapp/pair"
only_until_paired = true
Daemon iterates plugin_handles[..].manifest().orchestration and
runs the orchestration loop generically:
- Allocates per-instance state map (opaque
Valueindexed by instance label). - Subscribes the broker bridge that mirrors
inbound_state_eventsinto the state map. - Auto-opens public tunnel via
nexo-tunnel-quickifoffer = trueand config allows.
State map is opaque from daemon's POV — it just stores JSON payloads keyed by instance. Plugin subprocess writes events with its own internal schema. HTTP layer (Layer 8) proxies queries into the state map.
Trade-off. State payloads are opaque JSON daemon-side. No
typed access; daemon can't enforce schema. Plugin contract is
"whatever you publish on inbound_state_topic is what callers
get back from /whatsapp/<inst>/status". Plugin authors test
the round-trip themselves.
This is the LARGEST single piece. Probably split:
- 12a — opaque state map + subscriber bridge (~5h)
- 12b — public tunnel auto-open generalised (~3h)
- 12c — remove whatsapp-specific blocks from daemon (~2h, after whatsapp ships orchestration manifest section)
Migration plan
Execution order matters because layers depend on each other:
- Stage 1 — Layer 7 (pairing adapter) — closes Phase 81.33.b.real. Smallest deliverable. Validates Pattern A end-to-end with a real subprocess. ~5h. First.
- Stage 2 — Layer 8 (HTTP routes) — unblocks the orchestration-tunnel work. The orchestration tunnel needs to know how plugins expose pairing pages; once HTTP-via-proxy is the contract, the tunnel just mounts the proxy prefix. ~6h.
- Stage 3 — Layer 12a + 12b (orchestration core + tunnel) — closes Phase 93.5.d main mass. Depends on Layer 8. ~8h.
- Stage 4 — Layer 9 (admin RPC) — orthogonal; can interleave
with Stage 3. Removes
with_wa_bot_handletyped path. ~5h. - Stage 5 — Layer 11 (metrics) — small, independent. Can ship anywhere. ~4h.
- Stage 6 — Layer 10 (dashboard polish) — move sources from
nexo-setupto manifest data. Last because plugin crates need 2 prior releases first (Pattern B precedent + Stage 1's manifest format). ~4h. - Stage 7 — Layer 6 cleanup + Layer 12c — remove all
remaining hardcoded plugin-name fallbacks from daemon
(
register_whatsapp_toolsfallbacks, whatsapp orchestration block). Only after canonical plugin crates have shipped the manifest revisions for layers 1-6. ~3h.
Total: ~35h (~7 sessions of 5h each, more realistic than the optimistic earlier estimates).
Critical dependency. Each stage that needs a new manifest section blocks on a coordinated release of the 3 canonical plugin crates (whatsapp, telegram, email). The daemon ships fallbacks until the plugin manifest revisions are out. Plan plugin releases AHEAD of removing the daemon fallback.
Trade-offs we are explicitly accepting
| Layer | Trade-off |
|---|---|
| 7 — pairing | Broker RPC per unique sender. Cache after first sighting. ≤5ms one-time per pairing handshake. |
| 8 — HTTP | Broker round-trip per request. ≤2ms. Unacceptable for high-throughput webhooks — those keep direct ports. |
| 9 — admin RPC | Broker round-trip per admin command. ≤3ms. Admin commands are human-initiated, latency invisible. |
| 10 — dashboard | Schema enumerates auth-check + layout shapes. New shapes = schema extension, not framework code change. |
| 11 — metrics | Broker scrape per plugin per scrape interval. Cache-with-TTL if frequency is sub-second. |
| 12 — orchestration | State map daemon-side is opaque serde_json::Value. Plugin owns schema entirely. |
Open questions
-
Trait async migration. Several trait methods are sync today (
PairingChannelAdapter::normalize_sender,ChannelDashboardSource::discover). Generic broker-RPC dispatch needs async. Migrate trait to async or wrap with sync→async bridges? Lean: migrate to async, callers are already in async contexts. -
Plugin manifest schema version. Each new manifest section bumps an implicit schema version. Should we add an explicit
nexo_manifest_versionfield that the daemon checks for forward compatibility? Lean: yes, addnexo_manifest_version = 2in this design wave, daemon refuses to loadv1plugins after transition window. -
In-tree plugin migration. Email is still in-process (Phase 93.11 bucket D). Does it adopt the same manifest sections, or does in-process keep using direct trait dispatch? Lean: same manifest sections, but
EmailPluginoverrides eachbuild_pairing_adapter / mount_http / ...to return Rust impls directly. Subprocess plugins return generic adapters. Trait method is the unifying API. -
Hot-reload. OpenClaw supports plugin config hot-reload (
reload.configPrefixes). Rust's static linking + subprocess model makes this harder. Lean: defer — each section's reload semantics get spec'd when the section ships. For now, config reload triggers subprocess restart of affected plugins. -
Plugin permission model. Once plugins can declare HTTP routes + admin commands + metrics endpoints, the daemon needs to enforce per-plugin permissions (a malicious plugin shouldn't register
/admin/dangerous-thing). Lean: prefix every plugin's declared routes with/plugins/<plugin_id>/mandatory. No plugin can mount at/adminor/healthdirectly. Add the namespace constraint in this design wave.
Validation strategy
Each stage gets:
- Unit tests in the affected crate for the new types + interpreters.
- Integration test spinning up a real subprocess plugin declaring the new manifest section, exercising the round-trip via broker.
- Build matrix preservation — every stage keeps
cargo build --no-default-featuresclean. Slim daemon does not need any plugin manifest section to compile. - Documentation —
docs/src/plugins/<section>.mdper manifest section the operator-writing plugin author needs to know.
Non-goals
- Hot-reload of compiled plugin binaries. Subprocess restart is the reload story.
- Wasm plugin runtime. Out of scope. If/when added, this manifest-driven contract is what Wasm modules speak.
- 3rd-party plugin distribution (registry, signing). Out of scope. Operator-managed paths only.
- Web UI auto-generation from manifest. Phase 83 microapp consumes the manifest for its own UI but the auto-discovery contract is daemon-side only.
Next session: brainstorm + spec + plan for Stage 1
Per the project's /forge flow, the actual execution begins
with /forge brainstorm 81.33.b.real → spec → plan → ejecutar.
This memo is the architectural anchor that every brainstorm
must reference.
Update 2026-05-15 — Stages 1+2+4+5+6 + reference plugin shipped
Five of the seven pending stages closed in a single session:
- Stage 1 (pairing adapter) — PR #65.
- Stage 2 (HTTP routes) — PR #66.
- Stage 4 (admin RPC) — PR #67.
- Stage 5 (Prometheus metrics) — PR #68.
- Stage 6 (dashboard surface) — PR #69.
- Reference plugin demo + tests — PR #70.
Stage 3 (orchestration tunnel) skipped after re-evaluation: the
generic state-map / subscriber-bridge originally scoped became
redundant once Stage 2 routed HTTP through broker, and the
remaining tunnel auto-open is daemon-side polish that operators
can already trigger via nexo admin --tunnel. Stage 7 (cleanup
hardcoded fallbacks) deferred pending coordinated releases of
the 3 canonical plugin crates adopting the new manifest
sections — daemon-side legacy paths cannot be retired until
plugin-side migration ships.
Reference plugin. crates/test-fixtures/reference-plugin/
exercises every manifest section in one place. Pure-function
broker handlers (no I/O) so each contract is unit-testable
without spinning up a real subprocess. Operators / plugin
authors copy the crate as a starting template.
The user-visible auto-discovery goal is met today: any new plugin can declare the 5 manifest sections + ship broker handlers, and the daemon auto-discovers every capability with zero framework code change.
Cargo-install ergonomics (2026-05-16)
Stage 8 of auto-discovery: closing the last operator-side
friction. Before today, cargo install nexo-plugin-X deposited
a binary in ~/.cargo/bin/ but the daemon still required
the operator to edit config/plugins/discovery.yaml and add
the directory to search_paths. Out-of-the-box discovery was
empty.
The fix is two-part:
-
PluginDiscoveryConfig::default()populates standard install paths. The defaults now expand to$HOME/.cargo/bin,$HOME/.local/share/nexo/plugins, and/usr/local/libexec/nexo/plugins. Missing dirs are tolerated (Warn diagnostic, walker continues) so a clean machine boots without errors. Operator-supplied paths append to the defaults rather than replacing them — supply an explicit emptysearch_paths: []to opt out. -
Binary-mode discovery branch. When
auto_detect_binariesistrue(default), the walker also scans each search root's immediate children for executables whose filename matchesnexo-plugin-<id>(.exeaccepted on Windows). Each candidate is spawned with--print-manifest(2s timeout, killed on overshoot); stdout is parsed as TOML and treated as the plugin's manifest. The discovered binary path is stamped intomanifest.plugin.entrypoint.commandso the subprocess factory can spawn it directly — the manifest's own./bin/<id>placeholder is ignored.
The SDK gains
nexo_microapp_sdk::plugin::print_manifest_if_requested.
Plugin authors call it as the first statement of main(); it
writes the bundled manifest to stdout and exits 0 when the flag
is present, otherwise returns normally. Two lines on the plugin
side, zero framework knowledge required.
Trust boundary. This opens the door to executing arbitrary
binaries during daemon boot. The trust root is whoever owns the
search-path directory (typically the operator's own
~/.cargo/bin). Operators in hardened environments can opt out
via discovery.auto_detect_binaries: false and pin discovery
back to filesystem-resident nexo-plugin.toml manifests only.
Limitations / deferred work.
- No probe-result cache. Every boot re-spawns each binary. With
N=5 plugins and ~20ms-per-probe this is ~100ms total — under
the noise floor of LLM-bound startup, so cache deferred. If
cold-boot latency becomes a constraint, key by
(path, mtime, size)and persist at<state_root>/plugin-discovery-cache.json. - The
nexo-plugin-<id>naming convention is the contract. Plugins that ship asawesome-channel(no prefix) will never be auto-detected. Documented in the plugin author guide. - One probe failure (timeout / non-zero exit) does not block
other plugins. The failed candidate is emitted as a
ManifestParseErrordiagnostic and the walker continues.
Fault tolerance
Every external call goes through a CircuitBreaker. Every retryable
error has a bounded retry policy with jittered exponential backoff.
Every event survives a NATS outage. A second process cannot race the
first onto the same bus.
This page collects all of those guardrails in one place.
CircuitBreaker
Source: crates/resilience/src/lib.rs.
A three-state machine wrapped around any fallible external call. Once a dependency is failing, the breaker fails fast instead of piling up calls against a dead endpoint; periodic probes let it recover without human intervention.
stateDiagram-v2
[*] --> Closed
Closed --> Open: 5 consecutive failures
Open --> HalfOpen: backoff elapsed
HalfOpen --> Closed: 2 consecutive successes
HalfOpen --> Open: any failure<br/>(backoff × 2, capped)
Defaults
| Field | Default | Meaning |
|---|---|---|
failure_threshold | 5 | consecutive failures before opening |
success_threshold | 2 | consecutive successes in HalfOpen before closing |
initial_backoff | 10 s | wait time on first open |
max_backoff | 120 s | cap on exponential backoff |
Where it wraps
- LLM calls — one circuit per provider (MiniMax, Anthropic, OpenAI-compat, Gemini). A provider outage doesn't cascade to others.
- NATS publish — one circuit over the broker. When it opens the disk queue absorbs writes.
- CDP commands — one circuit per browser session. A dead Chrome doesn't freeze the agent loop.
- Extension stdio — implicit via the
StdioRuntimelifecycle (crashed child → respawn, bounded).
Signals
CircuitBreaker exposes the usual methods (allow(), on_success(),
on_failure()) plus two explicit overrides:
trip()— force Open from outside (e.g. a health check decided the dep is down before a call fails)reset()— force Closed (e.g. the operator just restored the dep and doesn't want to wait for the probe window)
Retry policies
Retries live at a layer above the circuit breaker — they handle transient failures (429, 5xx, network blips) that don't warrant flipping the breaker. Every retry policy uses jittered exponential backoff to avoid thundering-herd reconnection storms.
| Component | Max attempts | Backoff range |
|---|---|---|
| LLM 429 (rate limit) | 5 | 1 s → 60 s, jittered exponential |
| LLM 5xx (server error) | 3 | 1 s → 30 s, jittered exponential |
| NATS publish drain | 3 per event | disk queue drain cycle |
| CDP | via circuit only | backoff = circuit's open window |
These live in crates/llm/src/retry.rs (LLM) and
crates/broker/src/disk_queue.rs (NATS drain).
Error classification
Retries only trigger on retryable errors. A 4xx other than 429 — missing key, invalid model, malformed request — fails fast. The rationale: retrying a misconfigured call wastes budget and still fails. Fail loudly, fix the config.
No message drop
The broker layer guarantees at-least-once delivery for publishes that reach the runtime:
flowchart LR
P[publisher] --> TRY{NATS healthy?}
TRY -->|yes| NATS[(NATS)]
TRY -->|no| DQ[(disk queue)]
DQ --> WAIT{reconnect?}
WAIT -->|yes| DRAIN[drain FIFO]
DRAIN --> NATS
DQ -->|3 failed attempts| DLQ[(dead letters)]
DLQ --> CLI[agent dlq replay]
In the absolute worst case — NATS down forever, disk full — the disk queue starts shedding oldest events at its hard cap, but the producer never crashes and never silently drops.
Single-instance lockfile
A second agent process pointed at the same data directory would
double-subscribe every topic, delivering every message twice. To
prevent that, boot acquires a lockfile and kicks out any stale or
racing instance.
Source: src/main.rs::acquire_single_instance_lock.
flowchart TD
START[agent boot] --> READ[read data/agent.lock]
READ --> EXIST{file exists?}
EXIST -->|no| WRITE[write our PID]
EXIST -->|yes| PID[parse PID]
PID --> ALIVE{/proc/PID/ exists?}
ALIVE -->|no| WRITE
ALIVE -->|yes| SIGTERM[send SIGTERM]
SIGTERM --> WAIT[wait up to 5 s<br/>50 × 100 ms polls]
WAIT --> DEAD{process gone?}
DEAD -->|yes| WRITE
DEAD -->|no| SIGKILL[send SIGKILL]
SIGKILL --> WRITE
WRITE --> LOCK[RAII handle alive]
The SingleInstanceLock RAII struct stores our own PID. On drop it
only removes the lockfile if the stored PID still matches the current
one — so a takeover by a third process doesn't let the original
owner wipe the lock on its way out.
Graceful shutdown
See Agent runtime — Graceful shutdown for the ordered teardown sequence. Key points from a fault-tolerance angle:
- Dream-sweep loops and MCP sessions get explicit grace windows so in-flight work doesn't produce partial state
- Plugin intake is stopped before agent runtimes — the runtimes drain anything already in their mailboxes before exiting
- If the disk queue has unflushed events on SIGTERM, they survive to the next boot
Operator guardrails
Beyond the automatic mechanisms:
- Skill gating — an extension declaring
requires.env = ["FOO"]is skipped at discovery whenFOOis unset, instead of being registered and failing on every invocation. See Extensions — manifest. - Inbound filter — events with neither text nor media (receipts, typing indicators, reactions-only) are dropped before they reach the LLM, saving cost and avoiding noisy turns.
- Health endpoints —
:8080/readyand:8080/liveexpose lifecycle state for k8s liveness / readiness probes. - Metrics —
:9090/metrics(Prometheus) exposes everything from inbound event counts to circuit breaker state; see Metrics.
Transcripts (FTS + redaction)
Per-session JSONL transcripts under agents.<id>.transcripts_dir are
the canonical record of every turn. Two optional layers wrap that
record:
- FTS5 index — a SQLite virtual table that mirrors transcript
content for
MATCHqueries. Backs thesession_logstool'ssearchaction when present. - Redaction — a regex pre-processor that rewrites entry content before it ever reaches disk. Patterns target common credentials and home-directory paths.
Source: crates/core/src/agent/transcripts_index.rs,
crates/core/src/agent/redaction.rs,
crates/core/src/agent/transcripts.rs.
Configuration
config/transcripts.yaml (optional; absent → defaults below):
fts:
enabled: true # default
db_path: ./data/transcripts.db # default
redaction:
enabled: false # default — opt in
use_builtins: true # only relevant if enabled
extra_patterns:
- { regex: "TENANT-[0-9]+", label: "tenant_id" }
JSONL is the source of truth. The FTS index is derivable; if the DB
is corrupted or deleted, agent transcripts reindex (planned) can
rebuild it from disk.
FTS schema
CREATE VIRTUAL TABLE transcripts_fts USING fts5(
content,
agent_id UNINDEXED,
session_id UNINDEXED,
timestamp_unix UNINDEXED,
role UNINDEXED,
source_plugin UNINDEXED,
tokenize = 'unicode61 remove_diacritics 2'
);
The DB is shared across agents; isolation is enforced at query time
by WHERE agent_id = ?. User queries are escaped as a single FTS5
phrase so operators (OR, NOT, :) in the user input never reach
the engine as syntax.
session_logs integration
When the index is available, the search action returns:
{
"ok": true,
"query": "reembolso",
"backend": "fts5",
"count": 3,
"hits": [
{
"session_id": "…",
"timestamp": "2026-04-25T18:00:00Z",
"role": "user",
"source_plugin": "wa",
"preview": "...quería un [reembolso] del pedido..."
}
]
}
If the index is None (FTS disabled or init failed), the action
falls back to the legacy substring scan over JSONL. The shape is the
same minus backend: "fts5".
Redaction patterns
| Label | Detects | Example match |
|---|---|---|
bearer_jwt | Bearer eyJ… JWT triplets | Bearer eyJhbGc.eyJzdWI.dGVzdA |
anthropic_key | Anthropic API keys | sk-ant-abcdef… |
openai_key | sk- prefix API keys (OpenAI etc.) | sk-abc123… |
aws_access_key | AWS access key id | AKIAIOSFODNN7EXAMPLE |
hex_token_32 | Long hex strings | 5d41402abc4b2a76b9719d911017c592 |
home_path | Linux/macOS home dirs | /home/familia, /Users/alice |
Each match is replaced with [REDACTED:<label>]. Patterns run in the
order above, so more specific shapes (Bearer JWT, Anthropic) win over
generic catch-alls below.
A 40-char base64 pattern targeting AWS secret keys was deliberately
omitted — it produces too many false positives on legitimate hashes
and opaque ids. Operators who need it can add it scoped via
extra_patterns.
Custom patterns
redaction:
enabled: true
extra_patterns:
- { regex: "TENANT-[0-9]+", label: "tenant_id" }
- { regex: "internal\\.acme", label: "internal_host" }
Custom patterns run after built-ins. Invalid regex aborts boot with a message naming the offending index and label.
What redaction does not do
- It does not maintain a reverse map. Once content is redacted on disk the original is gone — by design. A reversible mapping would recreate the leak surface this feature is meant to close.
- It does not rewrite previously-written JSONL files. New entries redact going forward; historical content stays as-is.
- It does not redact
tracinglogs — that's a separate concern. - The FTS index stores the redacted text, so
searchresults never surface the original secrets either.
Operational notes
- The FTS index uses WAL journaling and capped pool size of 4 — it shares the same idiom as the long-term memory DB.
- Insert is best-effort. If an FTS write fails (disk full, lock
contention) the tool logs at
warnand the JSONL append still succeeds. The source of truth is never compromised. - Boot logs include
transcripts FTS index ready(or the warn that it fell back) andtranscripts redaction activewhen the redactor has any rule loaded.
nexo-rs vs OpenClaw
OpenClaw is the closest reference point in the multi-channel-agent-gateway space. nexo-rs mined OpenClaw's plugin SDK, channel boundaries, and skills layout for ideas, then rebuilt the runtime in Rust with stricter operational guarantees. This page lays out the differences honestly — including where OpenClaw still has the edge.
Substrate
| Dimension | OpenClaw | nexo-rs |
|---|---|---|
| Language | TypeScript | Rust |
| Runtime | Node 22+ | none — single statically-linked binary |
| Install footprint | pnpm install over ~42 runtime deps + 24 dev deps | one binary, ~90 MB built; ~15 MB to download (xz tarball), ~18 MB .deb, ~25 MB .rpm |
| Cold-start | node boot + module resolution | direct exec — sub-100ms to agent serve |
| Mobile target | feasible with Termux + Node | first-class on Termux, no root, no Docker |
| Memory safety | runtime errors | Rust ownership: data races, use-after-free, null deref refused at compile |
The single-binary shape is the reason nexo-rs runs comfortably on a
phone (Termux) and on a fresh VPS without a Node ecosystem
underneath. cargo build --release and ship target/release/agent
— that is the whole deliverable.
Process & messaging
| Dimension | OpenClaw | nexo-rs |
|---|---|---|
| Process model | single Node process | multi-process via NATS, in-process LocalBroker fallback when NATS is offline |
| Subject namespace | n/a (in-process buses) | plugin.inbound.<plugin>[.instance] / plugin.outbound.… / agent.route.<id> / taskflow.resume |
| Fault tolerance | best-effort | NatsBroker wraps every publish in a CircuitBreaker; failures spill to a SQLite-backed disk queue and drain on reconnect |
| At-least-once delivery | n/a | drain path documented as at-least-once; consumers dedupe by event.id |
| DLQ | n/a | failed events land in dead_letters after 3 attempts; agent dlq list/replay/purge from the CLI |
| Subscription survival | restart | NATS subscriptions auto-resubscribe on reconnect with backoff (250 ms → 10 s) |
Hot reload
| Dimension | OpenClaw | nexo-rs |
|---|---|---|
| Config change | restart | agent reload (or file-watcher trigger) swaps a RuntimeSnapshot via ArcSwap — in-flight turns finish on the old snapshot, the next event picks up the new one |
| Watched files | — | agents.yaml, agents.d/*.yaml, llm.yaml (extra paths via runtime.yaml) |
| Per-agent reload channel | — | mpsc to each AgentRuntime, the coordinator drains acks to confirm |
Per-agent capability sandbox
OpenClaw's plugin allowlist is global to the gateway. nexo-rs pushes the allowlist down to the agent and the binding (the inbound channel surface):
agents:
- id: kate
plugins: [whatsapp, telegram, browser, taskflow]
allowed_tools: ["whatsapp_*", "browser_navigate", "memory_*"]
outbound_allowlist:
whatsapp: ["+57…"]
telegram: [123456789]
skill_overrides:
ffmpeg-tools: warn
accept_delegates_from: ["ana"]
inbound_bindings:
- plugin: whatsapp
instance: kate_wa
# per-binding overrides for the same agent
allowed_tools: ["whatsapp_*"]
outbound_allowlist:
whatsapp: ["+57…"]
What that buys:
- An LLM running under
katecannot send messages to a number not inoutbound_allowlist, even if a prompt injection asks it to. - Two channels exposed to the same agent (sales WA, private TG) carry different capability surfaces — the sales binding doesn't get the private one's tool set.
- Skill modes (
strict/warn/disable) are decided per agent, with explicitrequires.bin_versionssemver constraints (probed at boot, process-cached).
Secrets
| Dimension | OpenClaw | nexo-rs |
|---|---|---|
| Credential resolution | env vars | agents.<id>.credentials block per channel; resolver maps to per-channel stores (gauntlet validates at boot) |
| 1Password | n/a | op CLI extension + inject_template tool: render {{ op://Vault/Item/field }} and pipe to allowlisted commands without exposing the secret |
| Audit log | n/a | append-only JSONL at OP_AUDIT_LOG_PATH: every read_secret and inject_template records agent_id, session_id, fingerprint, reveal_allowed — never the value |
| Capability inventory | n/a | agent doctor capabilities [--json] enumerates every write/reveal env toggle (OP_ALLOW_REVEAL, CLOUDFLARE_*, DOCKER_API_*, PROXMOX_*, SSH_EXEC_*) with state + risk |
Transcripts
OpenClaw stores transcripts as JSONL and greps them. nexo-rs
keeps the JSONL (source of truth) and adds:
- SQLite FTS5 index (
data/transcripts.db) — write-through fromTranscriptWriter::append_entry. Thesession_logs searchagent tool usesMATCHqueries with phrase-escaped user input so operator strings can't inject FTS operators. - Pre-persistence redactor (opt-in) — regex pass over content
before write. 6 built-in patterns (Bearer JWT,
sk-…,sk-ant-…, AWS access keys, 64+ hex tokens, home paths) plus operator-definedextra_patterns. JSONL and FTS receive the same redacted text. - Atomic header writes —
OpenOptions::create_new(true)so 16 concurrent first-appends to the same session result in exactly one header line.
Durable workflows
OpenClaw doesn't ship a durable-flow primitive. nexo-rs has TaskFlow:
taskflowLLM tool with actionsstart | status | advance | wait | finish | fail | cancel | list_mine.- Three wait conditions:
Timer { at },ExternalEvent { topic, correlation_id },Manual. - Single global
WaitEngineticks every 5 s (configurable), resumes flows whose deadlines have passed. taskflow.resumeNATS subject lets external services wakeexternal_eventflows: publish{flow_id, topic, correlation_id, payload}and the bridge callstry_resume_external.agent flow list/show/cancel/resumefrom the CLI.- Guardrails:
timer_max_horizon(default 30 days) blocks unbounded waits; non-empty topic + correlation_id required forexternal_event.
LLM auth
| Dimension | OpenClaw | nexo-rs |
|---|---|---|
| Anthropic | API key | API key and claude_subscription OAuth PKCE flow — uses the operator's Claude Code subscription quota instead of API billing |
| MiniMax | API key | API key and Token Plan / Coding Plan OAuth bundle (api_flavor: anthropic_messages) |
| OpenAI-compat | API key | API key + DeepSeek wired out of the box (OpenAI-compat reuse) |
| Gemini | not in core | first-class client |
MCP
OpenClaw supports MCP as a client. nexo-rs is both:
- Client — stdio and HTTP transports, full tool / resource /
prompt catalog,
tools/list_changedhot-reload. - Server —
agent mcp-serverexposes the agent's own tools (filtered by allowlist) over stdio for Claude Desktop / Cursor / any MCP-aware host. Proxy tools (ext_*,mcp_*) are unconditionally hidden so the agent doesn't become an open relay.
Build size
target/release/nexo ~90 MB (built binary)
nexo-rs-<target>.tar.xz ~12-16 MB (release download, xz -9)
nexo-rs_<ver>_<arch>.deb ~14-18 MB
nexo-rs-<ver>-1.<arch>.rpm ~20-25 MB
The binary has grown from ~34 MB at v0.1.0 as the feature surface
expanded (whisper STT, sqlite-vec, embedded config templates, CDP,
the driver subsystem, …). What you actually fetch is the compressed
artifact — ~15 MB for the musl tarball. For comparison, an OpenClaw
install (Node + node_modules after pnpm install) sits in the
hundreds of megabytes — most of it needed at runtime, not just
build-time.
Where OpenClaw is still ahead
Honest list:
- Installer & onboarding flow — OpenClaw's
openclaw doctorfamily and the bundled installer give a smoother first-run UX than nexo-rs'sagent setupwizard, especially for non-Rust developers. - TS familiarity — the JS / TS audience for plugin authors is larger than the Rust audience; if your team writes mostly TypeScript, contributing back to OpenClaw is faster.
- Track record — OpenClaw has a longer release history, more maintainers, and more shipped extensions in the wild.
- Apps surface — OpenClaw ships iOS / Android / macOS companion apps; nexo-rs only ships the daemon and the loopback web admin (admin-ui Phase A0–A11 still in progress).
Summary
If you want operational guarantees (single binary, fault-tolerant broker, per-agent sandbox, durable workflows, secrets audit) and you're OK with Rust, nexo-rs.
If you want fast onboarding, a TS plugin ecosystem, and the OpenClaw apps, OpenClaw.
The two projects share enough vocabulary that moving an extension
between them is mostly a port, not a rewrite. The plugin SDK
shape (stdio-spoken JSON-RPC + a plugin.toml manifest) is
deliberately compatible.
Driver subsystem (Phase 67)
The driver subsystem turns the nexo-rs agent runtime into the
"human in the loop" for another agent — typically the Claude Code
CLI. It runs a goal-bound experiment: spawn the external CLI, watch
its tool-use stream, decide allow/deny on every action, feed back
acceptance failures, and stop only when the CLI claims "done" AND
objective verification passes.
This page describes the architectural shape; concrete impl details live with each sub-phase.
Why
Claude Code (or any other local CLI agent) is excellent at writing code, but it sometimes:
- over-claims completion — says "done" when tests are red;
- proposes destructive shell commands when stuck;
- forgets which approaches it already tried and failed.
A second agent — driven by nexo-rs, backed by a different LLM
(MiniMax M2.5), with persistent memory — closes those gaps.
Architecture
nexo-rs daemon
│
├─ "claude-driver" agent
│ ├─ LLM: MiniMax M2.5
│ ├─ memory: short_term + long_term + vector + transcripts
│ └─ skills: claude_cli, git_checkpoint, test_runner,
│ acceptance_eval, escalate
│
└─ MCP server (in-process)
└─ tool: permission_prompt(tool_name, input) → {allow|deny, message}
claude (subprocess, one per turn)
└─ claude --resume <id>
--output-format stream-json
--permission-prompt-tool mcp__nexo-driver__permission_prompt
--add-dir <worktree>
--allowedTools "Read,Grep,Glob,LS,WebFetch"
-p "<turn prompt>"
Termination model
Claude says "done" — driver does NOT trust it. Driver runs the goal's
acceptance criteria (cargo build, cargo test, cargo clippy,
PHASES marker, custom verifiers). Only when all pass is the goal
declared Done. Otherwise the failures are folded into the next
turn's prompt: "you said done, but here's what still fails — fix it".
The driver also stops on budget exhaustion: max turns, wall-time, tokens, or consecutive denies. On exhaustion the driver escalates to the operator (WhatsApp / Telegram via existing channel plugins) with a state dump.
Foundational types — nexo-driver-types
The contract — AgentHarness trait + Goal / Attempt / Decision
/ AcceptanceCriterion / BudgetGuards types — lives in the leaf
crate nexo-driver-types. Every value is serde-serializable so the
contract can travel through NATS, get re-imported by extensions, and
power admin-ui dashboards without dragging in the daemon.
How a turn flows (Phase 67.1)
#![allow(unused)] fn main() { use std::time::Duration; use nexo_driver_claude::{ClaudeCommand, spawn_turn}; use nexo_driver_types::CancellationToken; async fn doc(session_id: String) -> anyhow::Result<()> { let cmd = ClaudeCommand::discover("Implementa Phase 26.z")? .resume(session_id) .allowed_tools(["Read", "Grep", "Glob", "LS"]) .permission_prompt_tool("mcp__nexo-driver__permission_prompt") .cwd("/tmp/claude-runs/26-z"); let cancel = CancellationToken::new(); let mut turn = spawn_turn(cmd, &cancel, Duration::from_secs(600), Duration::from_secs(1)).await?; while let Some(ev) = turn.next_event().await? { // dispatch on ev (Assistant tool_use → permission_prompt; Result → done check) let _ = ev; } let _exit = turn.shutdown().await?; Ok(()) } }
next_event cooperatively races three signals via tokio::select!:
the cancel token, the per-turn deadline, and the JSONL stream. Errors
land as Cancelled, Timeout, ParseLine, etc. Cleanup is always
shutdown() — ChildHandle::Drop is the panic safety net.
Persistence (Phase 67.2)
SqliteBindingStore keeps (goal_id → claude session_id) plus
timestamps in a single claude_session_bindings table. Two filters
are applied on get:
- idle TTL —
last_active_atmust be withinidle_ttlof now; - max age —
created_at + max_agemust be in the future.
Either filter can be None (no filter) or Duration::ZERO (alias).
Three soft-delete-friendly operations live alongside clear:
mark_invalid(goal_id)flipslast_session_invalid = 1instead of deleting the row. Phase 67.8 (replay-policy) calls this when Claude rejects a session id mid-turn; the row stays for forensics.touch(goal_id)bumpslast_active_atonly. Driver loop calls it per observed event so the idle filter doesn't need a structural upsert per turn.purge_older_than(cutoff)reaps rows the operator no longer cares about. Phase 67.6 (worktree janitor) calls it nightly.
Schema migrations: PRAGMA user_version = 1 is the sentinel; every
open() runs CREATE TABLE/INDEX IF NOT EXISTS. Future v2 will
extend that helper.
Permission flow (Phase 67.3)
Every Claude tool call that isn't on the static allowlist
(Read,Grep,Glob,LS,WebFetch) goes through the MCP server before
execution:
Claude Code ─── tools/call mcp__nexo-driver__permission_prompt ───▶
│
stdio JSON-RPC
│
▼
nexo-driver-permission-mcp (child)
│
calls PermissionDecider
│
▼
{behavior: allow|deny, ...}
PermissionMcpServer exposes one tool, permission_prompt. The
in-process AllowSession cache keyed on (tool_name, hash(input))
short-circuits repeat calls (a Claude turn that re-reads the same
file pays the decider once).
Outcomes Claude receives are always one of two shapes:
{ "behavior": "allow" } // optional updatedInput
{ "behavior": "deny", "message": "..." }
Internally the driver tracks five outcomes — AllowOnce,
AllowSession{scope}, Deny, Unavailable, Cancelled — collapsing
the last three to deny on the wire. Unavailable (timeout) is
fail-closed by design.
Phase 67.3 ships the bin in placeholder modes (--allow-all for dev,
--deny-all <reason> for shadow). Phase 67.4 will swap those flags
for --socket <path> so the bin asks the daemon's LlmDecider
(MiniMax + memory) for each decision.
Goal lifecycle (Phase 67.4)
nexo-driver run goal.yaml
│
▼
DriverOrchestrator::run_goal
│
├─ workspace_manager.ensure(&goal) ─┐
│ │
├─ write_mcp_config(workspace, ├─ side-effects in
│ bin_path, socket_path) │ <workspace>/
│ │
├─ DriverSocketServer (already running) ──┘
│ spawned by builder, owned via JoinHandle
│
└─ for each turn:
├─ budget.is_exhausted? → BudgetExhausted{axis}
├─ AttemptStarted event
├─ run_attempt(ctx, params)
│ spawn `claude --resume <id> ... --mcp-config ...`
│ event-loop on stream-json
│ binding_store.upsert(session_id)
│ acceptance.evaluate(criteria, workspace)
│ return AttemptResult { outcome }
├─ AttemptCompleted event
└─ match outcome:
Done → break, GoalCompleted{Done}
NeedsRetry{f} → next turn with prior_failures
Continue{...} → next turn (e.g. session-invalid retry)
Cancelled → break
BudgetExhausted → break
Escalate{r} → emit Escalate event, break
AttemptOutcome::Continue covers two cases the loop treats the same:
the stream ended without Result::Success (Claude crashed early),
and a session not found reply that triggered
binding_store.mark_invalid so the next turn starts fresh.
NATS subjects emitted (when feature = "nats" and
emit_nats_events: true):
agent.driver.goal.{started,completed}agent.driver.attempt.{started,completed}agent.driver.decision(Phase 67.7 will populate whenLlmDeciderrecords its rationale)agent.driver.acceptanceagent.driver.budget.exhaustedagent.driver.escalateagent.driver.replay(Phase 67.8 — replay-policy verdict)agent.driver.compact(Phase 67.9 — compact-policy scheduled a/compact <focus>turn)
Compact policy (Phase 67.9)
Long agentic runs let Claude's context grow without bound. The
orchestrator runs a CompactPolicy after every successful work turn:
when running tokens cross threshold * context_window, the next
iteration is rewritten as a /compact <focus> slash command turn so
Claude Code shrinks its own context before the next work turn.
Compact turns absorb token usage but do not bump the goal's turn
counter, so they don't burn the budget. min_turns_between_compacts
prevents back-to-back compacts. Set context_window: 0 (or
enabled: false) in compact_policy: to disable.
Sub-phases
| Phase | What | Status |
|---|---|---|
| 67.0 | AgentHarness trait + types | ✅ |
| 67.1 | claude_cli skill (spawn + stream-json + resume) | ✅ |
| 67.2 | Session-binding store (SQLite) | ✅ |
| 67.3 | MCP permission_prompt in-process | ✅ |
| 67.4 | Driver agent loop + budget guards | ✅ |
| 67.5 | Acceptance evaluator | ✅ |
| 67.6 | Git worktree sandboxing + per-turn checkpoint | ✅ |
| 67.7 | Memoria semántica de decisiones | ✅ |
| 67.8 | Replay-policy (resume tras crash mid-turn) | ✅ |
| 67.9 | Compact opportunista | ✅ |
| 67.10 | Escalación a WhatsApp/Telegram | ⬜ |
| 67.11 | Shadow mode (calibración) | ⬜ |
| 67.12 | Multi-goal paralelo | ⬜ |
| 67.13 | Cost dashboard + admin-ui A4 tile | ⬜ |
See also
crates/driver-types/README.md— contract surface and layeringproyecto/PHASES.md— Phase 67 sub-phase status of record- OpenClaw reference:
research/src/agents/harness/types.ts - OpenClaw subprocess pattern:
research/extensions/codex/src/app-server/transport-stdio.ts
Project tracker + multi-agent dispatch (Phase 67.A–H)
The project-tracker subsystem lets a nexo-rs agent answer "qué fase
va el desarrollo" through Telegram / WhatsApp / a shell, and lets it
dispatch async programmer agents that ship phases on its behalf.
The implementation is layered:
| Layer | Crate | Responsibility |
|---|---|---|
| Project files | nexo-project-tracker | Parse PHASES.md + FOLLOWUPS.md, watch for changes, expose read tools. |
| Multi-agent state | nexo-agent-registry | DashMap + SQLite store of every in-flight goal, cap + queue + reattach. |
| Goal control | nexo-driver-loop | spawn_goal / pause_goal / resume_goal / cancel_goal per-goal. |
| Tool surface | nexo-dispatch-tools | program_phase, dispatch_followup, hook system, agent control + query, admin. |
| Capability gate | nexo-config + nexo-core | DispatchPolicy per agent / binding, ToolRegistry filter. |
Project tracker (Phase 67.A)
FsProjectTracker reads <root>/PHASES.md (required) and
<root>/FOLLOWUPS.md (optional) at startup, caches parsed state
behind a parking-lot RwLock with a 60 s TTL, and starts a notify
watcher on the parent directory that invalidates the cache on
Modify | Create | Remove events.
Read tools register through nexo_dispatch_tools::READ_TOOL_NAMES
(project_status, project_phases_list, followup_detail,
git_log_for_phase).
Set ${NEXO_PROJECT_ROOT} to point at a workspace other than the
daemon's cwd.
Multi-agent registry (Phase 67.B)
AgentRegistry is the single source of truth for every goal the
driver has admitted. Each entry holds an ArcSwap<AgentSnapshot>
(turn N/M, last acceptance, last decision summary, diff_stat) so
list_agents / agent_status readers never block writers.
admit(handle, enqueue)enforces the global cap. Beyond the cap,enqueue=trueparks the goal asQueued;enqueue=falserejects.release(goal_id, terminal)returns the next-up queued goal so the orchestrator can promote it viapromote_queuedonce the worktree / binding is ready.apply_attempt(AttemptResult)refreshes the live snapshot. Idempotent against out-of-order replay (lower turn_index ignored).- Reattach (Phase 67.B.4) walks the SQLite store at boot and
rehydrates
Runningrows. Withresume_running=falsethey flip toLostOnRestartand surface to the operator.
LogBuffer keeps a per-goal ring of recent driver events for the
agent_logs_tail tool — bounded so a chatty goal cannot OOM the
process.
Persistence wiring (Phase 71)
The bin reads agent_registry.store from
config/project-tracker/project_tracker.yaml and opens
SqliteAgentRegistryStore when the resolved path is non-empty.
Env placeholders (${NEXO_AGENT_REGISTRY_DB:-./data/agents.db})
are expanded before the open. Path open failures fall back to
MemoryAgentRegistryStore with a warn so a corrupt sqlite file
never bricks boot.
When the registry is sqlite-backed and reattach_on_boot: true,
the bin runs the reattach sweep with resume_running=false. Every
prior-run Running row flips to LostOnRestart, and any
notify_origin / notify_channel hook attached to that goal fires
once with an [abandoned] summary so the originating chat learns
the goal could not be resumed. Subprocess respawn is intentionally
not attempted — restoring a Claude Code worktree the daemon no
longer owns is unsafe to do silently and lives under Phase 67.C.1.
Shutdown drain (Phase 71.3)
On SIGTERM the bin runs nexo_dispatch_tools::drain_running_goals
before plugin teardown so notify_origin reaches WhatsApp /
Telegram while their adapters are still alive. Each Running goal's
Cancelled hooks fire with a [shutdown] summary; per-hook
dispatch is bounded by a 2 s timeout so a stuck publish cannot
hold shutdown hostage. The row then flips to LostOnRestart so
the next boot's reattach sweep does not re-fire the same
notification.
[shutdown] daemon stopping — goal `<id>` was running and has
been marked abandoned. Re-dispatch with `program_phase
phase_id=<phase>` if you still need it.
SIGKILL still bypasses this — the boot-time reattach sweep is the safety net for that case.
Turn-level audit log (Phase 72)
Live state (AgentSnapshot) only carries the latest decision /
diff / acceptance per goal. Once a turn rolls forward the previous
turn's data is gone. To answer "what did the agent actually do
across its 40 turns?" the runtime now writes a durable row per
turn into a goal_turns table on the same agents.db:
goal_turns(
goal_id TEXT,
turn_index INTEGER,
recorded_at INTEGER,
outcome TEXT, -- done | continue | needs_retry | …
decision TEXT, -- last Decision rendered as
-- "<tool> (allow|deny:msg|observe:note) — rationale"
summary TEXT, -- mirror of AgentSnapshot.last_progress_text
diff_stat TEXT,
error TEXT, -- pre-rendered for needs_retry / escalate / budget
raw_json TEXT, -- full AttemptResult payload
PRIMARY KEY (goal_id, turn_index)
);
EventForwarder writes a row on every AttemptResult event,
upsert-on-conflict so a replay can't dup history. The new chat tool
agent_turns_tail goal_id=<uuid> [n=20] returns a markdown table
of the last N rows (default 20, capped at 1000):
showing 20 of 40 turn(s) for `…`
| turn | outcome | decision | summary | error |
|---|---|---|---|---|
| 21 | continue | Edit (allow) — patch crate slack | wired Plugin trait | - |
| 22 | needs_retry | Bash (allow) — cargo build | … | E0432 in slack/src/lib.rs |
…
Best-effort writes: an append failure logs a warn but never blocks
the driver loop. When the registry isn't sqlite-backed (memory
fallback), the tool reports "set agent_registry.store in
project_tracker.yaml" rather than silently returning empty.
Async dispatch (Phase 67.C + 67.E)
DriverOrchestrator::spawn_goal(self: Arc<Self>, goal) returns a
tokio::task::JoinHandle so the calling tool returns the goal id
instantly without waiting for the run to finish. Per-goal pause /
cancel signals (watch<bool> and CancellationToken::child_token)
let pause_agent / cancel_agent target one goal without taking
down the rest of the orchestrator.
program_phase_dispatch is the heart of the dispatch surface: it
reads the sub-phase out of PHASES.md, runs DispatchGate::check,
constructs a Goal with the dispatcher / origin metadata, asks the
registry for a slot, and either spawns the goal or returns
Queued / Forbidden / NotFound. dispatch_followup is the
mirror that pulls the description from a FOLLOWUPS.md item.
Capability gate (Phase 67.D)
DispatchPolicy { mode, max_concurrent_per_dispatcher, allowed_phase_ids, forbidden_phase_ids } lives on AgentConfig
and (as Option<DispatchPolicy>) on InboundBinding. The
per-binding override fully replaces the agent-level value so an
operator can be precise per channel ("asistente is none
everywhere except this Telegram chat where it is full").
DispatchGate::check short-circuits in this order:
- capability
None→CapabilityNone(every kind). ReadOnlycapability + write kind →CapabilityReadOnly.- write +
require_trusted+!sender_trusted→SenderNotTrusted. Read tools bypass the trust gate solist_agentsstays open for unpaired senders. forbidden_phase_idsmatch →PhaseForbidden.- non-empty
allowed_phase_ids+ no match →PhaseNotAllowed. - dispatcher / sender / global caps. Global cap with
queue_when_full=trueis admitted; the orchestrator queues it. Without queue →GlobalCapReached.
ToolRegistry::apply_dispatch_capability(policy, is_admin) prunes
the registry of dispatch tool names not allowed by the resolved
policy. ToolRegistryCache::get_or_build_with_dispatch builds the
per-binding filtered registry that respects both allowed_tools
and dispatch_policy. Hot reload (Phase 18) constructs a fresh
ToolRegistryCache per snapshot, so a new dispatch_policy lands
on the next intake without restart; in-flight goals keep their
pre-reload tool surface so a hot reload never preempts.
Completion hooks (Phase 67.F)
Each hook is (on: HookTrigger, action: HookAction, id). Triggers
fire on Done | Failed | Cancelled | Progress { every_turns }.
Actions:
notify_origin— publish a markdown summary to the chat that triggered the goal. No-op whenorigin.plugin == "console".notify_channel { plugin, instance, recipient }— publish to an explicit channel different from the origin (escalate to ops).dispatch_phase { phase_id, only_if }— chain another goal whenonly_ifmatches the firing transition. Implemented via a pluggableDispatchPhaseChainerso the runtime ownsprogram_phase_dispatchplumbing.nats_publish { subject }— JSON payload to a custom subject.shell { cmd, timeout }— opt-in viaallow_shell_hooks. CapabilityPROGRAM_PHASE_ALLOW_SHELL_HOOKSregistered with the setup inventory soagent doctor capabilitiesflags it the moment the operator exports the env var. ReceivesNEXO_HOOK_GOAL_ID/PHASE_ID/TRANSITION/PAYLOAD_JSONenv vars.
HookIdempotencyStore (SQLite) keeps (goal_id, transition, action_kind, action_id) UNIQUE so at-least-once NATS replay or a
mid-hook restart cannot fire a hook twice.
HookRegistry (in-memory DashMap<GoalId, Vec<CompletionHook>>)
backs add_hook / remove_hook / agent_hooks_list.
NATS subjects (Phase 67.H.2)
| Subject | Producer |
|---|---|
agent.dispatch.spawned | program_phase_dispatch admitted |
agent.dispatch.denied | DispatchGate::check denied |
agent.tool.hook.dispatched | hook fired ok |
agent.tool.hook.failed | hook attempt errored |
agent.registry.snapshot.<goal_id> | per-goal periodic beacon |
agent.driver.progress | every Nth completed work-turn |
Plus the existing Phase 67.0–67.9 subjects:
agent.driver.{goal,attempt}.{started,completed},
agent.driver.{decision,acceptance,budget.exhausted,escalate,replay,compact}.
CLI (Phase 67.H.1)
nexo-driver-tools mirrors the chat tool surface for shell use:
nexo-driver-tools status [--phase <id> | --followups]
nexo-driver-tools dispatch <phase_id>
nexo-driver-tools agents list [--filter running|queued|...]
nexo-driver-tools agents show <goal_id>
nexo-driver-tools agents cancel <goal_id> [--reason "…"]
origin.plugin = "console" so notify_origin is a no-op (the
operator sees stdout, not a chat reply).
Built-in registration (nexo daemon)
The default nexo agent binary registers every dispatch
tool definition at boot via
nexo_core::agent::dispatch_handlers::register_dispatch_tools_into.
The LLM sees program_phase, list_agents, agent_status,
etc. in its toolset; per-binding dispatch_capability
(config/agents.yaml) prunes the write tools for bindings that
opted out.
What's NOT bundled by default is the runtime
DispatchToolContext — the orchestrator + registry + tracker
references the handlers consult. Without it, a tool call
returns a clean dispatch tools require AgentContext.dispatch to be set at boot error instead of pretending success. Two
integration paths from there:
- In-process orchestrator — boot a
DriverOrchestratoralongside the agents, share oneAgentRegistry. See the next section for the wiring sample. - NATS-based dispatch — agent bin publishes a message to
agent.driver.dispatch.requestthat a separatenexo-driverdaemon consumes. This is the topology to use when the Claude subprocess needs hardware (GPU box) the agent daemon doesn't have. The dispatch tool surface only changes in the registry it consults; operators can swap the in- processAgentRegistryfor one that mirrors a NATS-backed registry without touching the handlers.
Boot wiring (B8)
The integrator's main.rs ties everything together. Minimal
shape:
use std::sync::Arc;
use nexo_agent_registry::{AgentRegistry, MemoryAgentRegistryStore, LogBuffer};
use nexo_core::agent::{
dispatch_handlers::{register_dispatch_tools_into, DispatchToolContext},
tool_registry::ToolRegistry,
};
use nexo_dispatch_tools::{
event_forwarder::EventForwarder,
hooks::{DefaultHookDispatcher, HookRegistry, NoopNatsHookPublisher},
policy_gate::CapSnapshot,
NoopTelemetry,
};
use nexo_pairing::PairingAdapterRegistry;
use nexo_project_tracker::FsProjectTracker;
// 1. Project tracker.
let tracker: Arc<dyn nexo_project_tracker::ProjectTracker> =
Arc::new(FsProjectTracker::open(std::env::current_dir().unwrap())?);
// 2. Agent registry + log buffer.
let registry = Arc::new(AgentRegistry::new(
Arc::new(MemoryAgentRegistryStore::default()),
4,
));
let log_buffer = Arc::new(LogBuffer::new(200));
let hook_registry = Arc::new(HookRegistry::new());
// 3. Hook dispatcher with the channel adapters that Phase 26
// registered (whatsapp / telegram).
let pairing = PairingAdapterRegistry::new();
// pairing.register(WhatsappPairingAdapter::new(...));
// pairing.register(TelegramPairingAdapter::new(...));
let hook_dispatcher = Arc::new(DefaultHookDispatcher::new(
pairing,
Arc::new(NoopNatsHookPublisher),
));
// 4. Orchestrator with EventForwarder so registry / log_buffer /
// hooks see every driver event.
let inner_sink: Arc<dyn nexo_driver_loop::DriverEventSink> =
Arc::new(nexo_driver_loop::NoopEventSink);
let event_sink: Arc<dyn nexo_driver_loop::DriverEventSink> =
Arc::new(EventForwarder::new(
registry.clone(),
log_buffer.clone(),
hook_registry.clone(),
hook_dispatcher.clone(),
inner_sink,
));
// (orchestrator builder consumes event_sink)
// 5. Bundle for AgentContext.dispatch.
let dispatch_ctx = Arc::new(DispatchToolContext {
tracker,
orchestrator: orch.clone(),
registry,
hooks: hook_registry,
log_buffer,
default_caps: CapSnapshot {
queue_when_full: true,
..Default::default()
},
require_trusted: true,
telemetry: Arc::new(NoopTelemetry),
});
// 6. Register the handlers into the base ToolRegistry. The
// per-binding cache prunes write tools when capability=None
// or read_only.
let base = ToolRegistry::new();
register_dispatch_tools_into(&base);
// 7. Per-session AgentContext.with_dispatch(dispatch_ctx)
// + .with_sender_trusted(true) + .with_inbound_origin(plugin,
// instance, sender).
Without step 6 the handlers exist but aren't reachable by the LLM. Without step 4 the registry / log_buffer / hooks stay inert. Without step 5 the handlers return MissingDispatchCtx.
See also
proyecto/PHASES.md— Phase 67.A–H sub-phase status of record.architecture/driver-subsystem.md— Phase 67.0–67.9 driver loop- replay + compact policies.
Plan mode (Phase 79.1)
Plan mode is a per-goal toggle that puts the agent into a read-only
"exploration + design" phase. While active, every mutating tool call
is short-circuited at the dispatcher with a structured
PlanModeRefusal, and the model is expected to call
ExitPlanMode { final_plan } once it has a coherent plan. The
operator approves (or rejects) the plan via the pairing channel, and
plan mode flips back to off so the agent can implement.
The feature ports two prior agent CLI tools
(EnterPlanModeTool + ExitPlanModeV2Tool) with three deliberate
diffs from the upstream CLI:
| Decision | upstream | nexo-rs (Phase 79.1) |
|---|---|---|
| Approval channel | Local TUI dialog; KAIROS_CHANNELS flag DISABLES plan mode under chat channels | Pairing-friendly: every approval flows through the chat channel itself via [plan-mode] approve|reject plan_id=… |
| Refusal payload | Free-form string from validateInput | Structured PlanModeRefusal { tool_name, tool_kind, hint, entered_at, entered_reason } |
| Plan body | Read from disk via getPlanFilePath(agentId) | final_plan: String arg, capped at 8 KiB — disk fallback parked as a follow-up |
YAML knobs
agents:
- id: cody
plan_mode:
enabled: true # tool registered + reachable
auto_enter_on_destructive: false # opt-in pairing with Phase 77.8
default_active: ~ # role-aware default (see below)
approval_timeout_secs: 86400 # 24 h; goal stops with ApprovalTimeout if exceeded
require_approval: false # default safe rollout — flip to `true` in production
inbound_bindings:
- plugin: whatsapp
instance: ops
role: coordinator # used by the role-aware `default_active`
plan_mode: # full per-binding override
require_approval: true
default_active is null by default. The runtime resolves it
through PlanModePolicy::compute_default_active(role):
| Binding role | Default default_active |
|---|---|
coordinator | true (these bindings drive non-trivial work) |
worker | false (workers receive sub-goals from a coordinator that already planned) |
proactive | false (Phase 77.20 ticks would be disrupted by a blocking approval flow) |
| unset / unknown | false (safest opt-out) |
Operators can pin the value with default_active: true | false to
override the role-aware default.
Tools
EnterPlanMode
Zero parameters except an optional reason: string. Returns
{
"entered_plan_mode": true,
"already_in_plan_mode": false,
"entered_at": 1700000000,
"reason": "explore auth flow",
"instructions": "..."
}
Hard guard: rejects with PermissionDenied when called from a
sub-agent / cron / poller / heartbeat-spawned goal. Lift from
upstream agent CLI,
refined with OpenClaw
research/src/acp/session-interaction-mode.ts:4-15 — only
chat-rooted goals qualify because only they have a path to deliver
the operator approval that ends plan mode.
ExitPlanMode { final_plan: String }
Submits the plan and (when require_approval: true) waits for
operator decision. Returns
{
"exited_plan_mode": true,
"unlocked_at": 1700000123,
"entered_at": 1700000000,
"plan_bytes": 412,
"plan_chars": 411,
"plan_id": "01J…",
"approval_required": true
}
plan_mode_resolve { plan_id, decision: approve|reject, reason? }
Operator-side resolver. The pairing parser (future wiring) calls
this when it sees [plan-mode] approve plan_id=… or
[plan-mode] reject plan_id=… reason=… on an inbound message.
Direct callable too — tests and CLI ops use it without the parser
layer.
reject requires reason (non-empty); the rejection causes the
awaiting ExitPlanMode to fail with the canonical follow-up prompt
(lift from
research/src/agents/bash-tools.exec-approval-followup.ts:27-40):
Plan rejected by operator. Reason: <reason>.
Do not call ExitPlanMode again with the same plan.
Adjust the plan based on the rejection reason and present a revised plan.
Plan-mode stays ON after a reject so the model must revise.
Mutating tools blocked while plan mode is on
Canonical list lives in
crates/core/src/plan_mode.rs::MUTATING_TOOLS. Adding a tool to the
runtime registry without classifying it (mutating OR read-only) makes
ToolRegistry::assert_plan_mode_classified() panic at boot in strict
deployments — ensures no tool silently bypasses the gate.
Currently classified mutating:
Bash(gated when the next-shipped Phase 77.8 destructive classifier returnsis_mutating: true; until 77.8 lands, everyBashcall refuses fail-safe)FileWrite,FileEdit,NotebookEditprogram_phase,delegate_to,dispatch_followupTeamCreate,TeamDeleteScheduleCron,RemoteTriggerConfig(theapplyop only —readandproposestay read-only)- Plugin outbound names following the
<channel>.<verb>convention (e.g.whatsapp.send,browser.click)
Classified read-only (always callable):
FileRead,Glob,Grep,WebSearch,WebFetchListMcpResources,ReadMcpResource,ToolSearchAskUserQuestion,SleepEnterPlanMode,ExitPlanMode- Memory + observability tools (
memory_search,agent_query,agent_turns_tail,session_logs,what_do_i_know,who_am_i,my_stats)
Notify-line formats (frozen)
Every transition emits a canonical line via tracing::info!. The
formats are frozen — operator dashboards and parsers read them. Any
change here is a breaking change.
[plan-mode] entered at <RFC3339> — reason: <model[: <text>]|operator|auto-destructive: <check>>
[plan-mode] awaiting approval plan_id=<UUID> (resolve via plan_mode_resolve { plan_id, decision: approve|reject })
[plan-mode] approved plan_id=<UUID>
[plan-mode] rejected plan_id=<UUID> reason=<…>
[plan-mode] approval timed out plan_id=<UUID>
[plan-mode] exited — plan: <first 200 chars>… (full plan in turn log #<turn_idx>)
[plan-mode] refused tool=<name> kind=<bash|file_edit|outbound|delegate|dispatch|schedule|config|read_only>
Future work: pipe these to notify_origin so the pairing channel
sees them directly (today they live in stdout / structured logs).
Persistence
agent_registry.goals.plan_mode is a TEXT column carrying the
JSON-serialised PlanModeState. It survives daemon restart via the
Phase 71 reattach path: a goal that was in plan mode when the
daemon died comes back with the same state, and the per-turn
system-prompt hint resumes.
Follow-ups
Tracked in proyecto/FOLLOWUPS.md::Phase 79.1:
- Operator-approval scope check (port from OpenClaw
roleScopesAllowpattern when 79.10 ships). final_plan_pathvariant for plans larger than 8 KiB.- Acceptance retry policy for flaky test suites.
References
- PRIMARY:
upstream agent CLI,upstream agent CLI,upstream agent CLI(prepareContextForPlanMode). - SECONDARY:
research/src/acp/session-interaction-mode.ts:4-15(interactive vs background sessions),research/src/agents/bash-tools.exec-approval-followup.ts:27-40(canonical reject follow-up prompt). - Plan + spec:
proyecto/PHASES.md::79.1.
TodoWrite (Phase 79.4)
TodoWrite is the model's intra-turn scratch list. Every call
replaces the entire list (full-replace semantics). The runtime wipes
the stored list to [] whenever every item is completed, so the
next planning cycle starts fresh.
The tool is always callable — including while plan mode is on, since
it never touches workspace, broker, or external state. Lift from
upstream agent CLI.
Diff vs Phase 14 TaskFlow
| Trait | TodoWrite (this) | TaskFlow (Phase 14) |
|---|---|---|
| Lifetime | Per goal, in-memory | Persistent, cross-session |
| Owner | Model | Operator + model + flows |
| Shape | Flat array | DAG with deps + waits |
| Semantics | Full-replace, wipe-on-all-completed | Partial mutations, manual close |
| Survives daemon restart | No | Yes |
| When to use | Coordinating sub-steps inside a long Phase 67 driver-loop turn without spawning sub-goals | Multi-day work programs, cross-session state |
Tool shape
{
"todos": [
{
"content": "Run cargo test",
"status": "pending",
"activeForm": "Running cargo test"
}
]
}
Every item must carry both content (imperative) and activeForm
(present continuous) — the upstream CLI shows activeForm is what dashboards
render while the item is in_progress, so they get a natural progress
string without grammar fixup. Snake-case active_form is also
accepted for consistency with the rest of nexo-rs.
status is one of pending | in_progress | completed.
Bounds
- Max 50 items per goal (defensive — the upstream CLI does not enforce one; nexo-rs adds the cap so a runaway model cannot grow the list unbounded).
- Max 200 UTF-8 bytes per
contentandactive_formfield. - A bad write rejects without clobbering the existing list.
Response
{
"old_todos": [...],
"new_todos": [...],
"wiped_on_all_completed": false,
"in_progress_count": 1,
"instructions": "Todos updated. Keep exactly one item `in_progress` at a time. Mark completed IMMEDIATELY after finishing each task; do not batch completions."
}
old_todos echoes the previous list so the model sees the diff in
the same turn. wiped_on_all_completed: true flags that the runtime
just cleared the stored list.
When the model should use it
The tool description ships the canonical "use proactively for 3+
step tasks" guidance lifted from
upstream agent CLI. In short:
multi-step coding tasks → seed a list, mark exactly one item
in_progress, mark it completed the moment it finishes (don't
batch), tear the list down once everything is done.
References
- SECONDARY: OpenClaw
research/— no equivalent (grep -rln "todo" research/src/returns only unrelated cron / delivery files).
ToolSearch (Phase 79.2 — MVP)
ToolSearch is the discovery surface for deferred tools — tools
whose full JSONSchema lives behind a single ToolSearch(...) lookup
instead of inline in the system prompt. The savings are real once the
tool surface gets wide (40+ tools after Phase 13 + 77 + 79); the MVP
shipped here lays the foundation.
Lift from
upstream agent CLI
(input schema, select: prefix, keyword search with +token
required prefix, scoring weights for name parts vs description vs
searchHint).
How a tool becomes "deferred"
The runtime does not infer this from the tool itself. Callers opt in when registering:
#![allow(unused)] fn main() { use nexo_core::agent::tool_registry::{ToolMeta, ToolRegistry}; registry.register_with_meta( MyTool::tool_def(), MyTool, ToolMeta::deferred_with_hint("send a slack message"), ); }
ToolMeta has two fields:
| Field | Default | Effect |
|---|---|---|
deferred: bool | false | When true, surfaces only via ToolSearch |
search_hint: Option<String> | None | Curated phrase used by keyword ranking — beats raw description scoring |
Existing register(...) calls keep working unchanged — the side
channel is opt-in.
MVP caveat
The four LLM provider clients (anthropic, minimax, gemini,
openai_compat) still emit every registered tool's full schema in
the request body. The actual token-cost savings land when a follow-up
wires those four clients to consult
ToolRegistry::deferred_tools() and filter accordingly. Tracked in
FOLLOWUPS.md::Phase 79.2. Until then, ToolSearch is useful as a
discovery API the model can use today, and the registration path is
correct so the upgrade is a four-file change.
Query forms
Lifted verbatim from the upstream CLI:
select:Read,Edit,Grep → fetch these exact tools by name (comma-separated)
notebook jupyter → keyword search, up to max_results best matches
+slack send → require "slack" in name/desc/hint, rank by remaining terms
Tokens prefixed with + are required: tools that don't match all
required tokens are filtered out before scoring. Other tokens are
optional but still contribute to the score.
Scoring (mirror of the upstream CLI)
| Match site | Non-MCP score | MCP score |
|---|---|---|
| Exact name part | 10 | 12 |
| Substring within name part | 5 | 6 |
search_hint contains term | +4 | +4 |
description contains term | +2 | +2 |
Matches with score == 0 are dropped. Results sorted by score desc
then name asc; truncated to max_results (default 5, hard cap 25).
mcp__server__action-shaped names are tokenised by splitting on
__ and _; CamelCase names split on case boundaries; dotted
plugin names (whatsapp.send) split on .. So a query "slack"
matches mcp__slack__send_message on the exact server-name part.
Response shape
{
"query": "...",
"query_kind": "select" | "keyword",
"total_deferred_tools": 7,
"matches": [
{
"name": "FileEdit",
"score": 12, // omitted in `select` responses
"description": "...",
"parameters": { ... } // full JSONSchema
}
],
"missing": ["UnknownTool"] // only present in `select` responses
}
The model can read the schema directly out of parameters and call
the tool on the next turn. When the runtime starts filtering deferred
tools out of the request body (follow-up), the matched tool will
also be auto-injected into that turn's available tool list.
References
- PRIMARY:
upstream agent CLI,upstream agent CLI,62-108(isDeferredTool). - SECONDARY: OpenClaw
research/— no equivalent. Single-process TS reference does not face the wide-surface MCP token cost that motivates this tool. - Plan + spec:
proyecto/PHASES.md::79.2.
SyntheticOutput (Phase 79.3)
SyntheticOutput forces a goal to terminate with a JSON value that
matches a caller-provided JSONSchema. Closes the gap between "model
produces free prose" and "downstream consumer needs a struct" —
direct input for Phase 19/20 pollers, Phase 51 eval harness, and any
future contract-shaped goal.
Lift from
upstream agent CLI.
Diff vs upstream
The upstream CLI builds one tool per schema via
createSyntheticOutputTool(jsonSchema) so the model's input is the
schema. Nexo-rs runs as a daemon — building a fresh tool per call
breaks tool-registry semantics. We ship a single tool whose input
carries BOTH the schema and the value:
{
"schema": { "type": "object", "properties": { "name": { "type": "string" } }, "required": ["name"] },
"value": { "name": "ana" }
}
Pollers and eval harnesses inject the schema via prompt template;
ad-hoc callers pass it inline. The terminal_schema follow-up
(tracked in FOLLOWUPS.md) lets the runtime carry the schema and only
the model's value flows on the wire — closer to the upstream
single-input shape.
Tool shape
| Arg | Type | Required | Notes |
|---|---|---|---|
schema | object | yes | JSONSchema (Draft 7 / 2019-09 / 2020-12). Must be a JSON object. |
value | any | yes | The value to validate. Object / array / scalar — any shape the schema permits. |
Response
{
"ok": true,
"structured_output": <value>,
"instructions": "Output validated. The goal can terminate now — do not call any other tool this turn unless the goal contract calls for it explicitly."
}
On failure the call returns an error whose body lists every violation with its JSONPath:
SyntheticOutput: value does not match schema (2 errors): /age: 30 is not of type "string"; /tags/0: "purple" is not one of ["red","green"]
Validation
Uses jsonschema = "0.20" — already an optional dep on
nexo-core (default-on via the schema-validation Cargo feature
that Phase 9.2 introduced). Builds without the feature compile, but
SyntheticOutput returns a clear "feature disabled" error rather
than silently passing through — synthesised output without
validation is worse than no synthesis.
Plan-mode classification
Classified ReadOnly in nexo_core::plan_mode::READ_ONLY_TOOLS.
The tool only validates and echoes; it never touches workspace,
broker, or external state. Safe to call while plan mode is on.
References
- PRIMARY:
upstream agent CLI. - SECONDARY: OpenClaw
research/— no equivalent. Single-process TS reference shapes its outputs via Zod parsing inline; no separate "force structured output" tool. - Plan + spec:
proyecto/PHASES.md::79.3.
NotebookEdit (Phase 79.13)
Cell-level edits on Jupyter .ipynb notebooks. Pure-Rust round-trip
through serde_json::Value — no jupyter binary, no Python
dependency. Unknown top-level fields survive untouched (forward-compat
with newer nbformat).
Lift from
upstream agent CLI.
Tool shape
{
"notebook_path": "/abs/path/to/nb.ipynb",
"cell_id": "alpha", // UUID-style id, or `cell-N` numeric fallback
"new_source": "x = 1",
"cell_type": "code", // optional for replace, required for insert
"edit_mode": "replace" // replace | insert | delete (default: replace)
}
| Edit mode | Behaviour |
|---|---|
replace | Overwrite cells[i].source. Code cells get execution_count: null and outputs: [] (the diff stays sane). Markdown cells preserve all metadata. |
insert | Add a new cell AFTER the anchor cell_id. Empty cell_id inserts at position 0. cell_type required. nbformat ≥ 4.5 gets a fresh 12-char base-36 id. |
delete | Remove the cell at cell_id. |
cell_id resolution: literal cells[i].id match first; falls back
to cell-N numeric index (matches the upstream parseCellId).
Failure lists up to 10 available ids in the error message.
Defensive behaviour
- Absolute path required. Refuses relative paths to avoid ambiguity across daemon cwd changes.
.ipynbextension required. Other file types fall toFileEdit.- Replace at end-of-cells auto-converts to insert. Lift from
the upstream CLI (
NotebookEditTool.ts:372-377). Requirescell_typein that path. - Bad writes leave the file untouched. Validation runs before
write; any error returns before
std::fs::write.
Plan-mode classification
Classified FileEdit (mutating) in
nexo_core::plan_mode::MUTATING_TOOLS. Plan-mode-on goals get a
PlanModeRefusal rather than a silent edit.
Output
{
"notebook_path": "/abs/path/...",
"edit_mode": "replace" | "insert" | "delete",
"cell_id": "alpha",
"cell_type": "code" | "markdown",
"language": "python", // from notebook.metadata.language_info.name
"total_cells": 3,
"cells_delta": 1 // -1 for delete, +1 for insert, 0 for replace
}
Out of scope (deferred)
- Read-before-Edit guard. the upstream CLI requires
Readto have been called on the file in the same session beforeNotebookEditis allowed. Nexo-rs does not have a shared file-state cache yet — the guard becomes useful when Phase 67 driver-loop adds one. - Attribution / file-history tracking. The upstream
fileHistoryTrackEditrecords edits to a per-session ledger. Skipped — the workspace-git layer (Phase 10.9) covers the same use case. - Multi-cell batch edits. One operation per call by design; callers loop over cell ids.
References
- PRIMARY:
upstream agent CLI,upstream agent CLI. - SECONDARY: OpenClaw
research/— no equivalent (grep -rln "ipynb|jupyter|nbformat" research/src/returns nothing). - Plan + spec:
proyecto/PHASES.md::79.13.
RemoteTrigger (Phase 79.8)
RemoteTrigger lets the model publish a JSON payload to a
pre-configured outbound destination — webhook (HTTP POST) or
NATS subject. Destinations live in the agent's YAML allowlist; the
model passes only name + payload, never URLs or subjects.
Diff vs upstream
The upstream RemoteTriggerTool
(upstream agent CLI)
is a CRUD client for claude.ai's hosted scheduled-agent API
(/v1/code/triggers). Different concept entirely — Anthropic uses
"trigger" to mean "scheduled remote agent". Nexo-rs adopts the
name and ships a generic outbound publisher per our PHASES.md
spec. The two are conceptually unrelated; we cite the upstream CLI as
naming reference only.
Configuration
agents:
- id: cody
remote_triggers:
- kind: webhook
name: ops-pager
url: https://hooks.example.com/abc
secret_env: OPS_PAGER_SECRET # optional — HMAC-SHA256 signs body
timeout_ms: 5000 # default 5000
rate_limit_per_minute: 10 # default 10; 0 = unlimited
- kind: nats
name: internal-ops
subject: agent.outbound.ops
rate_limit_per_minute: 30
Empty list (the default) keeps the tool registered but every call
refuses with "no destination named X in this agent's allowlist".
Tool shape
{
"name": "ops-pager",
"payload": { "level": "warn", "msg": "build red on main" }
}
payload accepts any JSON shape (object / array / scalar). Cap is
256 KiB serialised — oversize is rejected before any network
call.
Webhook headers
When dispatched as a webhook, every request carries:
| Header | Value |
|---|---|
Content-Type | application/json |
X-Nexo-Trigger-Name | trigger name (allowlist key) |
X-Nexo-Timestamp | unix-seconds at dispatch |
X-Nexo-Signature | sha256=<hex> HMAC of body using secret_env value (only when secret_env is set) |
Receivers MUST verify the signature when configured. Compute
HMAC-SHA256(body, secret) and compare against the
X-Nexo-Signature header in constant time.
Rate limit
Sliding-window token bucket per trigger name, 1-minute window,
default 10 calls / minute. Set to 0 for unlimited (no bucket).
Bucket lives in process memory — restarts reset.
Plan-mode
Classified Outbound (mutating) in
nexo_core::plan_mode::MUTATING_TOOLS. Plan-mode-on goals receive
PlanModeRefusal rather than a silent publish.
Security model
- Allowlist. The model sees only destination names; URLs and subjects are operator-owned in YAML. No way to coerce a trigger to a model-supplied URL.
- HMAC sign. Optional but recommended.
secret_envresolves at call time — secrets never enter YAML. - Refuses unsigned when secret missing. If
secret_envis set but the env var is empty, the call refuses rather than send unsigned (defence in depth — shipping unsigned could bypass receiver auth). - Body cap + rate limit. Capacity controls bound the blast radius if a model goes haywire.
- Plan-mode gate. A goal in plan mode cannot publish.
Out of scope (deferred)
- Per-binding override. Today the canonical source is
agents[].remote_triggers. Abinding.remote_triggersoverride would let an operator scope per channel; not yet wired. - Circuit breaker per trigger. Phase 2.5
CircuitBreakeris available but not yet wired in. Add when transient outbound failures become noisy enough to justify. - Telemetry counters.
nexo_remote_trigger_calls_total{ name, result}+nexo_remote_trigger_latency_ms{name}are spec'd but not emitted. Wire when the tool is in active use.
Diff vs upstream (summary)
| Aspect | upstream | Nexo-rs |
|---|---|---|
| Purpose | claude.ai CCR scheduled-agent CRUD | Generic outbound publisher |
| Auth | Anthropic OAuth | HMAC-SHA256 (operator-shared secret) |
| Destinations | hardcoded /v1/code/triggers | YAML allowlist (webhook / NATS) |
| Rate limit | Anthropic-side | Per-trigger token bucket in-process |
References
- PRIMARY: PHASES.md::79.8 spec (own design).
upstream agent CLIcited for naming + dispatcher shape only — semantics differ. - SECONDARY: OpenClaw
research/— no equivalent. Single-process TS reference uses plugin outbound paths directly; no allowlisted generic publisher exists.
Repl (Phase 79.12)
Stateful REPL tool — spawn persistent Python, Node.js, or bash subprocesses whose interpreter survives across LLM turns inside the same goal. Variables, imports, and definitions persist; the model executes code iteratively without restarting the interpreter each turn.
Feature-gated behind repl-tool. Off by default — arbitrary code
execution is dangerous and must be explicitly opted into per agent /
per binding.
Lift from
upstream agent CLI +
src/services/sandbox/.
Tool shape
{
"action": "spawn | exec | read | kill | list",
"session_id": "<uuid>", // required for exec, read, kill
"runtime": "python | node | bash", // required for spawn
"code": "print(1+1)", // required for exec
"cwd": "/tmp" // optional for spawn (defaults to agent workspace)
}
| Action | Behaviour |
|---|---|
spawn | Launch a new persistent REPL session. Returns session_id. |
exec | Run code in a running session. Returns stdout, stderr, timed_out, exit_code. Output is the difference since the last exec/read — only new bytes. |
read | Read current output buffers without sending code. |
kill | Terminate a session by id. |
list | List all active sessions with runtime, cwd, spawned_at, output_len. |
Output
{
"stdout": "2\n",
"stderr": "",
"timed_out": false,
"exit_code": null
}
timed_out: true when exec exceeds timeout_secs (default 30 s).
Exit code is Some(n) only when the child process has terminated.
Configuration
# Agent-level default
agents:
- id: ana
repl:
enabled: true
allowed_runtimes: ["python", "node"]
max_sessions: 3
timeout_secs: 30
max_output_bytes: 65536
# Per-binding override (replaces the whole struct)
inbound_bindings:
- plugin: whatsapp
repl:
enabled: true
allowed_runtimes: ["python"]
| Field | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | Gate the Repl tool. Off by default. |
allowed_runtimes | [string] | ["python","node","bash"] | Runtimes the agent may spawn. |
max_sessions | u32 | 3 | Maximum concurrent REPL sessions per agent. |
timeout_secs | u32 | 30 | Seconds before an exec returns timed_out: true. |
max_output_bytes | u64 | 65536 | Per-session output buffer cap. Oldest bytes dropped when cap reached. |
Runtimes
| Runtime | Binary | Flags |
|---|---|---|
python | python3 | -u (unbuffered), -q (quiet), -i (interactive — required when stdin is piped) |
node | node | -i (interactive), --no-warnings |
bash | bash | --norc, --noprofile |
Session lifecycle
- Sessions are keyed by UUID (returned by spawn).
- Output is buffered per-session with oldest-first truncation at
max_output_bytes. - Background reader threads (blocking I/O via
std::thread) read stdout/stderr. execsnapshots buffer lengths before sending code and returns only the new bytes.wait_for_new_outputpolls every 50 ms until the buffer grows beyond the snapshot or the process dies.- Dead-process detection (
child.try_wait()) on every exec/read — returns a clear error if the session has terminated.
Plan-mode classification
Classified Bash (mutating) in
nexo_core::plan_mode::MUTATING_TOOLS. Plan-mode-on goals receive a
PlanModeRefusal rather than a silent exec.
Sandbox note
Sandbox enforcement (bubblewrap / firejail / macOS sandbox-exec) is
tracked in the Phase 79.12 spec but deferred to a future sub-phase.
Current implementation trusts the operator to enable repl-tool only
for agents that need it, and limits blast radius via allowed_runtimes
and max_output_bytes.
Out of scope (deferred)
- Sandbox integration.
bwrap/firejail/sandbox-execprobing and enforcement. Tracked in PHASES.md 79.12 sandbox matrix. allow_unsandboxedper-binding toggle. Locked behind capability gate — only via direct YAML edit, not self-config.- Last-expression value. The spec's
value: Option<Value>field (JSON-encoded last expression) is deferred. bashlanguage is included as a pragmatic convenience for debugging; the PHASES.md spec says "no shell language" but the implementation ships it as the session-sandbox risk is lower than an ad-hocBashToolinvocation (no filesystem side effects beyond the session cwd).
References
- PRIMARY:
upstream agent CLI+upstream agent CLI - SECONDARY: OpenClaw
research/— no equivalent (grep -rln "repl\|REPL\|stateful.*sandbox" research/src/returns nothing). - Implementation:
crates/core/src/agent/repl_registry.rs,crates/core/src/agent/repl_tool.rs,crates/config/src/types/repl.rs. - Plan + spec:
proyecto/PHASES.md::79.12.
LSP tool (Phase 79.5)
The Lsp tool gives the model in-process access to four
Language Server Protocol servers — rust-analyzer, pylsp,
typescript-language-server, gopls — for code intelligence
queries (go-to-definition, hover, references, workspace symbol,
diagnostics) without spawning a sub-shell or shelling out to
grep.
Built-in language matrix
| Language | Binary | Install hint |
|---|---|---|
| rust | rust-analyzer | rustup component add rust-analyzer |
| python | pylsp | pip install python-lsp-server |
| typescript / javascript | typescript-language-server | npm i -g typescript-language-server |
| go | gopls | go install golang.org/x/tools/gopls@latest |
At boot the daemon probes PATH for each binary. Missing binaries
get a single tracing::warn! line that includes the install
hint. Discovered binaries are cached for the lifetime of the
process; nexo setup reload re-probes if the operator installs
a missing one.
Tool surface
A single tool, Lsp, dispatched by the kind field. All
positional fields (line, character) are 1-based to match
editor UX — the underlying LSP wire is 0-based but the tool
description and handler perform the conversion.
{ "kind": "go_to_def", "file": "src/foo.rs", "line": 42, "character": 8 }
{ "kind": "hover", "file": "src/foo.rs", "line": 42, "character": 8 }
{ "kind": "references", "file": "src/foo.rs", "line": 42, "character": 8 }
{ "kind": "workspace_symbol", "query": "Foo" }
{ "kind": "diagnostics", "file": "src/foo.rs" }
workspace_symbol accepts an empty query ("all symbols").
diagnostics returns the latest publishDiagnostics snapshot
for the file; if the file was just opened and no snapshot has
arrived yet, the tool waits up to 2 seconds before returning
(no diagnostics).
Per-binding policy
Opt in per agent in agents.yaml:
agents:
- id: cody
lsp:
enabled: true
languages: [rust] # whitelist; empty = all discovered
prewarm: [rust] # warmed at boot
idle_teardown_secs: 600 # 10 min
enabled: false (the default) keeps the Lsp tool unregistered
for the agent — the model never sees it advertised.
languages: [] (the default when enabled: true) permits every
discovered language; provide an explicit list to sandbox a
binding to a single language.
Lifecycle
- Lazy spawn — the binary is launched on first tool call to a
given
(workspace_root, language)tuple. Cold start: ~500 ms for rust-analyzer, lower for the others. - Pre-warm — listing a language under
lsp.prewarmspawns it during boot so the first model call hits a warm session. - Idle teardown — sessions with no model activity for
idle_teardown_secsare shut down cleanly (shutdown→exitLSP requests, then boundedkill_on_drop). Phase 19/20 poller calls do not count as activity, so a workspace with only synthetic traffic still releases the server's RAM. - Crash recovery — a session that exits non-zero increments a
restart counter (cap
max_restarts = 3); the next call re-spawns. After the cap, the session reportsSessionDeadand never auto-restarts again. -32801 ContentModifiedretries — rust-analyzer in mid- index returns this code; the session retries with exponential backoff (500 ms / 1 s / 2 s) before surfacing the error.
Plan-mode
All five MVP ops are read-only — Lsp is in READ_ONLY_TOOLS
so plan-mode never refuses an Lsp tool call.
Output format
Results are formatted as path:line:col with workspace-relative
paths when shorter, mirroring upstream agent CLI formatters.ts. A label is appended where the LSP server
provides one ("Definition: src/bar.rs:120:5 (struct Bar)").
Per-result cap: 100 KB total. Workspace-symbol queries that
exceed the cap are truncated with a +N more results note.
Errors
The handler returns {"ok": false, "error", "kind"} instead of
panicking. Stable kind discriminators:
ServerUnavailable— binary missing on PATH (error includes the install hint).LanguageNotEnabled— binding'slsp.languageswhitelist excludes the target.NoServerForExtension— file extension not in the matrix.FileNotFound/NotAFile/FileTooLarge(10 MB cap).RequestTimeout— server stalled past 30 seconds; the session receives$/cancelRequestand stays alive.SessionDead— max-restart cap exceeded.CapabilityMissing— the running server doesn't advertise thekind's capability (the dynamic description should hide this case in practice).Wire— JSON-RPC framing or argument parse error.
References
- PRIMARY:
upstream agent CLI(LSPTool.ts, schemas.ts, formatters.ts, prompt.ts) andupstream agent CLI(LSPClient.ts, LSPServerInstance.ts, LSPServerManager.ts, manager.ts, passiveFeedback.ts). - SECONDARY:
research/src/agents/pi-bundle-lsp-runtime.ts(OpenClaw's hand-rolled JSON-RPC framing and capability- filtered tool registration). - Spec + plan:
proyecto/PHASES.md::79.5.
cron_create / cron_list / cron_delete / cron_pause / cron_resume
LLM-time scheduling: from inside a turn, the model registers a cron
entry that fires a future goal. Complements Phase 7 Heartbeat
(config-time only) and Phase 20 agent_turn poller (config-time
only) — this is the only path where the model itself mutates the
schedule.
Lift from
upstream agent CLI
(5-field cron schema, recurring + durable flags, 50-entry cap).
OpenClaw research/src/cron/schedule.ts provides the parallel
naming convention — we use Rust's cron = "0.12" crate (already a
transitive workspace dep).
Diff vs Phase 7 Heartbeat vs Phase 20 agent_turn poller
| Mechanism | Trigger source | Mutable at runtime | Persists |
|---|---|---|---|
| Phase 7 Heartbeat | YAML heartbeat.interval_secs | No (hot-reload only) | Config |
Phase 20 agent_turn poller | YAML cron spec | No (hot-reload only) | Config |
| Phase 79.7 ScheduleCron | LLM tool call mid-turn | Yes (model-driven) | SQLite |
Tool surface and constraints
cron_create { cron, prompt, channel?, recipient?, recurring? }— schedule a recurring or one-shot prompt.recipientis thetoaddress for outbound publish (JID for WhatsApp, chat id for Telegram, email for SMTP); without it the dispatcher only logs the LLM response.cron_list— read-only, returns the binding's entries.cron_delete { id }— remove an entry.cron_pause { id }— soft-disable an entry (paused = true).cron_resume { id }— re-enable a paused entry (paused = false).- 5-field cron expression (
M H DoM Mon DoW); 6-field also accepted (passthrough). - 60-second minimum interval — sub-minute schedules refuse with a clear message.
- Cap 50 entries per binding (lift from upstream).
- Origin-tagged binding namespace: entries from a
whatsapp:opsgoal stay isolated fromtelegram:botentries.binding_idresolves from inbound origin (plugin:instance) withagent_idfallback for non-interactive turns. - SQLite-backed (
nexo_cron_entriestable); survives daemon restart. - Model pinning at schedule time:
cron_createstoresmodel_provider+model_namefrom effective binding policy so each fire can resolve the same provider/model pair later.
Runtime firing — shipped (end-to-end)
crates/core/src/cron_runner.rs::CronRunner polls
store.due_at(now) every 5 s and dispatches due entries
through an Arc<dyn CronDispatcher>. State advance is
policy-driven:
- recurring entries always advance (even on dispatch failure) so a broken downstream never hot-loops one row forever.
- one-shot entries delete on success; on failure they retry with
bounded exponential backoff (
runtime.cron.one_shot_retry) and are deleted only after the retry budget is exhausted.
Production wiring at boot uses LlmCronDispatcher
(crates/core/src/llm_cron_dispatcher.rs): builds a
ChatRequest from entry.prompt, resolves the LLM client from the
entry's pinned model_provider/model_name (with legacy fallback
for old rows), logs the response with id + binding + cron expression
and a 200-char preview, then forwards the body to the user-facing channel
via BrokerChannelPublisher when the entry carries both a channel
and a recipient.
Tool-call execution is now available as an explicit opt-in:
runtime.cron.tool_calls.enabled: true. In that mode, the dispatcher
advertises the binding-filtered tool set, executes returned tool calls,
feeds tool_result messages back to the model, and repeats up to
runtime.cron.tool_calls.max_iterations.
Fallback: when no agents are configured or the LLM-client build
fails, the runner falls back to LoggingCronDispatcher so cron
fires stay observable in degraded boot.
Outbound publish
BrokerChannelPublisher parses <plugin>:<instance> from
entry.channel and emits an event on
plugin.outbound.<plugin>.<instance> carrying:
{ "kind": "text", "to": "<recipient>", "text": "<llm body>" }
This is the same envelope the WhatsApp / Telegram / Email outbound tools already speak — the receiving plugin's dispatcher delivers the message to the user.
Failure mode: a publish error is logged via tracing::warn!
but never fails fire(). The runner still advances state, so a
stuck downstream channel (NATS down, plugin not subscribed)
cannot deadlock the cron loop. Set both channel and
recipient on cron_create to enable user-facing delivery —
either missing → the dispatcher only logs.
Tool shapes
cron_create
{
"cron": "*/5 * * * *",
"prompt": "Check the build queue and report",
"channel": "whatsapp:default",
"recipient": "5511999999999@s.whatsapp.net",
"recurring": true
}
Returns:
{
"ok": true,
"id": "01J...",
"binding_id": "whatsapp:default",
"cron": "*/5 * * * *",
"recurring": true,
"next_fire_at": 1700000300,
"instructions": "Entry persisted. The runtime fires it on schedule. Use cron_list to inspect, cron_pause/cron_resume to temporarily stop/restart, and cron_delete to cancel."
}
One-shot retry policy
Process-level policy in config/runtime.yaml:
cron:
one_shot_retry:
max_retries: 3 # 0 => drop on first failure
base_backoff_secs: 30 # attempt #1 delay
max_backoff_secs: 1800
tool_calls:
enabled: false # default: log-only for tool calls
max_iterations: 6
allowlist: [] # optional extra narrowing (glob syntax)
Attempt delays are exponential (base * 2^(attempt-1)), capped by
max_backoff_secs.
cron_list
{}
Returns the binding's full entry list, sorted by next_fire_at asc.
cron_delete
{ "id": "01J..." }
cron_pause
{ "id": "01J..." }
cron_resume
{ "id": "01J..." }
Cron expression semantics
Standard 5-field UTC: M H DoM Mon DoW. Examples:
| Expression | Means |
|---|---|
*/5 * * * * | Every 5 minutes |
0 9 * * * | Daily 09:00 UTC |
30 14 28 2 * | Feb 28 14:30 UTC (one-shot if recurring: false) |
0 */2 * * * | Every 2 hours on the hour |
The 60-second minimum is enforced by checking that two consecutive fires are ≥ 60 seconds apart. Sub-minute expressions like `*/30 * *
-
- *` (every 30 s, 6-field) are rejected.
Plan-mode classification
cron_create,cron_delete,cron_pause, andcron_resume→Schedule(mutating). Plan mode refuses withPlanModeRefusal.cron_list→ReadOnly. Stays callable while plan mode is on.
References
- PRIMARY:
upstream agent CLI(schema, validation, 50-entry cap), plus the siblingCronListTool.ts/CronDeleteTool.ts/CronPauseTool.ts/CronResumeTool.ts. - SECONDARY:
research/src/cron/schedule.ts(OpenClaw —cronerJS lib + cache pattern, semantically compatible). - Plan + spec:
proyecto/PHASES.md::79.7.
Poller V2 — Laravel-style dispatch
Phase 96 refactored the poller subsystem around a single principle:
the runner is a dumb scheduler. It knows nothing about channels,
credentials, outbound topics, or LLMs. Pollers reach the world
through one egress trait, PollerHost. Everything else is the
runner's business: schedule, lease, retry, breaker, cursor
persistence, telemetry.
If you've used Laravel's queue, the cut is familiar: Queue::push
takes an opaque job, Worker pops + invokes — the queue never
introspects what the job does or what it returns.
Why
V1 (pre-Phase-96) leaked too much. The runner needed to know:
- That outbound goes to a
Channelenum (whatsapp / telegram / google). - That every poller might want a
CredentialsBundle+ per-channelAgentCredentialResolver. - That
agent_turnspecifically needsLlmRegistry+LlmConfig. - How to translate
OutboundDelivery { channel, recipient, payload }intoplugin.outbound.<channel>.<account_id>topic publishes (dispatch.rs, ~200 LOC).
Every new poller kind risked widening the runner's surface — and out-of-tree pollers couldn't escape the in-tree types at all. Phase 96 cut all of it.
Contract
#![allow(unused)] fn main() { #[async_trait] pub trait Poller: Send + Sync + 'static { fn kind(&self) -> &'static str; async fn tick(&self, ctx: &PollContext) -> Result<TickAck, PollerError>; } pub struct PollContext { pub job_id: String, pub agent_id: String, pub kind: &'static str, pub config: Value, pub cursor: Option<Vec<u8>>, pub now: DateTime<Utc>, pub interval_hint: Duration, pub cancel: CancellationToken, pub host: Arc<dyn PollerHost>, } pub struct TickAck { pub next_cursor: Option<Vec<u8>>, pub next_interval_hint: Option<Duration>, pub metrics: Option<TickMetrics>, } }
PollerHost is the single egress:
#![allow(unused)] fn main() { #[async_trait] pub trait PollerHost: Send + Sync + 'static { async fn broker_publish(&self, topic: String, payload: Vec<u8>) -> Result<(), HostError>; async fn credentials_get(&self, channel: String) -> Result<Value, HostError>; async fn log(&self, level: LogLevel, message: String, fields: Value) -> Result<(), HostError>; async fn metric_inc(&self, name: String, labels: Value) -> Result<(), HostError>; async fn llm_invoke(&self, request: LlmInvokeRequest) -> Result<LlmInvokeResponse, HostError>; } }
Two host implementations
| Adapter | Use case | Crate |
|---|---|---|
InProcessHost | In-tree builtins (webhook_poll, agent_turn) | nexo-poller (private use) |
BrokerPollerHost | Subprocess plugin pollers | nexo-microapp-sdk::poller |
InProcessHost calls directly into the daemon's AnyBroker,
AgentCredentialResolver, LlmRegistry. BrokerPollerHost pipes
the same trait methods through broker reverse-RPC on
daemon.rpc.<plugin_id>. Both produce identical poller-visible
behavior — the trait surface is the contract, not the impl.
Dispatch topology
┌─────────────────────────────────────┐
│ daemon: PollerRunner │
│ ├─ webhook_poll (in-tree) │
│ ├─ agent_turn (in-tree) │
│ └─ plugin proxies (Arc<dyn Poller>) │
└──────────────┬──────────────────────┘
│
┌────────────────────┼────────────────────────┐
│ │ │
broker_publish broker tick request broker reverse-RPC
plugin.outbound.X.Y plugin.poller.<kind>.tick daemon.rpc.<plugin_id>
│ │ │
▼ ▼ │
┌──────────────┐ ┌──────────────┐ │
│ wa / tg / … │ │ subprocess │──────────────────┘
│ channel │ │ plugin │
│ plugin │ │ (PollerHandler)
└──────────────┘ └──────────────┘
The daemon's PluginPollerRouter owns (plugin_id, kinds, topic_prefix)
mappings. For each (handle, kind) it wraps a PluginPollerProxy
that implements Poller and forwards every tick through broker
JSON-RPC. The runner's registry is homogeneous — it cannot tell which
entries are in-tree and which forward over the wire.
[plugin.poller] manifest section
Seventh manifest section closing the Phase 81.33.b.real lineage (pairing → http → admin → metrics → dashboard → ✶ → poller).
[plugin.poller]
kinds = ["google_calendar"]
broker_topic_prefix = "plugin.poller.google_calendar"
lifecycle = "long_lived" # or "ephemeral"
max_concurrent_ticks = 1 # 1..=64, default 1
tick_timeout_secs = 60 # 1..=3600, default 60
Boot-time validation:
kindsnon-empty + unique within the plugin.- Each kind matches
^[a-z][a-z0-9_]+$. broker_topic_prefixnon-empty + no trailing dot + no spaces.- Cross-plugin uniqueness — two plugins declaring the same kind
fail boot loud (
PollerRouteRegistrationError::DuplicateKind).
Lifecycle
long_lived(default) — daemon spawns the plugin subprocess once at boot. The subprocess subscribes to its tick topic and replies through the message'sreply_to. Best for pollers with warm state (OAuth tokens, HTTP connection pools, parsed feeds).ephemeral— manifest accepts the value but the daemon currently rejects it with a config error. Tracked as a Phase 96 follow-up: spawn-per-tick path requires new stdio JSON-RPC primitives (no broker subscription, direct stdin/stdout dispatch- SIGTERM on reply).
Reverse-RPC
Subprocess pollers call back to the daemon via broker request-reply
on daemon.rpc.<plugin_id>. Methods:
| Method | Daemon response |
|---|---|
credentials_get(channel, agent_id) | { account_id, … } plus typed Google fields (client_id_path, token_path) when channel == google |
log(level, message, fields) | Forwards to daemon tracing |
metric_inc(name, labels) | Forwards to daemon tracing (Prometheus aggregator is a follow-up) |
llm_invoke(request) | Proxies to LlmRegistry::build(...)::chat(...), returns { content, model_id, usage } |
Error envelopes use JSON-RPC codes that mirror PollerError
classification: -32001 transient (retry with backoff), -32002
permanent (auto-pause job until agent pollers reset <id>),
-32602 config (bad config — operator fixes YAML), -32601 method
not found.
What this unlocks
- New poller kinds (Jira, Linear, Stripe, custom internal APIs)
ship as standalone crates published to crates.io. No fork of
nexo-poller, no PR to the framework. - The framework's
nexo-pollerdep tree no longer carriesnexo-plugin-google— gmail + google_calendar moved out tonexo-rs-poller-gmail+nexo-rs-poller-google-calendarstandalone repos. Closes the Phase 94 close-out follow-up. - LLM-using pollers (
agent_turn+ future custom prompts) no longer needllm_registry/llm_configfields baked intoPollContext. Any subprocess plugin can callhost.llm_invokeand get the daemon's configured provider stack for free.
References
- Phase 81.33.b.real lineage: Stage 1 pairing → 2 http → 4 admin →
5 metrics → 6 dashboard → ✶ → Phase 96 poller. Same
RwLockinterior mutability pattern, same broker-RPC forwarder shape, same "construct empty at boot, populate after wire" rule. - OpenClaw cron service (
research/src/cron/service/locked.ts:11-21,service.restart-catchup.test.ts:79-116) informed lease + restart semantics. - claude-code-leak MCP elicitation handler
(
src/services/mcp/elicitationHandler.ts:77-106) shaped the reverse-RPC pattern: server (here, subprocess) sends a request UP to the client (here, daemon), client responds via the same channel.
ListMcpResources + ReadMcpResource (Phase 79.11 — MVP)
Router-shaped MCP tools: a single discovery surface for agents talking to many MCP servers, instead of registering N×2 per-server tools (which still ship via the Phase 12.5 catalog).
Pattern derived from prior CLI work +
ReadMcpResourceTool/. The upstream CLI ships these as the LLM-driven
introspection layer over the per-server McpClient trait.
Diff vs Phase 12.5 per-server tools
| Aspect | Phase 12.5 per-server | Phase 79.11 router |
|---|---|---|
| Tool surface | mcp__<server>__list_resources per server | Single ListMcpResources { server: Option<String> } |
| Token cost | O(2 × N servers) tools in prompt | 2 fixed tools |
| Use when | 1–2 MCP servers connected | 3+ MCP servers, surface gets noisy |
Both surfaces coexist — operators can keep the per-server tools for the agents that need them and let the router-shaped tools handle wide deployments.
Tool shapes
ListMcpResources
{ "server": "github", "max": 100 }
server is optional — omit to enumerate every connected server.
max overrides the default 200-entry cap. Returns:
{
"resources": [
{ "server": "github", "uri": "...", "name": "...", "description": "...", "mime_type": "..." }
],
"truncated": false,
"errors": [],
"count": 12
}
errors carries per-server failures so a single bad server
doesn't drop the whole call. truncated: true flags that the cap
was hit before all resources were enumerated.
ReadMcpResource
{ "server": "github", "uri": "github://owner/repo/file", "max_bytes": 65536 }
server + uri required. max_bytes overrides the default
256 KiB cap. Returns:
{
"server": "github",
"uri": "...",
"contents": [
{ "uri": "...", "mime_type": "text/plain", "text": "...", "blob_length": null, "blob": null }
],
"truncated": false
}
Truncation respects UTF-8 char boundaries — never splits a
multi-byte sequence. Binary blob bodies are returned verbatim
(base64) without mid-string truncation; the blob_length field
is reported so the model can decide whether to ask for a smaller
slice.
Plan-mode classification
Both tools are ReadOnly — they query but never mutate, so they
stay callable while plan mode is on.
Out of scope (deferred)
McpAuth. TheMcpClienttrait does not expose a refresh hook (refresh is currently transparent inside the client). Tracked inFOLLOWUPS.md::Phase 79.11— once the trait grows the method (lift fromupstream agent CLI), a third tool wires into the same module.- Operator overrides for the caps. Today the caps are module
constants; a follow-up can read them from the MCP YAML
(
mcp.list_max_resources,mcp.resource_max_bytes).
References
- SECONDARY: OpenClaw
research/— no equivalent router-shaped MCP tool. - Plan + spec:
proyecto/PHASES.md::79.11.
ConfigTool — gated self-config (Phase 79.10)
The Config tool lets an agent read and propose changes to its
own YAML configuration from inside a chat-driven turn. The flow
is intentionally two-step (propose then apply) so a remote
operator can approve or reject the change with a regular message
on the same channel that originated the proposal — there is no
host 'ask' permission prompt the way Claude Code's upstream shows
(upstream agent CLI).
Cargo feature gate
ConfigTool ships behind the config-self-edit Cargo feature
(off by default). Build with:
cargo build --workspace --features config-self-edit
A binary distributed without the feature cannot expose the tool
even when an operator sets config_tool.self_edit: true in YAML
— the gate is a hard ship-control until security review.
Per-agent YAML
agents:
- id: cody
config_tool:
self_edit: true # default false; opt-in per agent
allowed_paths: # empty = every SUPPORTED_SETTINGS key
- "model.model"
- "language"
approval_timeout_secs: 86400 # 24 h
allowed_paths is intersected with SUPPORTED_SETTINGS (12
keys at MVP: model.{provider,model}, language,
system_prompt, heartbeat.{enabled,interval_secs},
link_understanding.enabled, web_search.enabled,
lsp.{enabled,languages,idle_teardown_secs,prewarm}) and then
filtered by the hard-coded denylist.
Three operations
{ "op": "read", "key": "model.model" }
{ "op": "propose", "key": "model.model", "value": "claude-opus-4-7", "justification": "operator asked" }
{ "op": "apply", "patch_id": "01J7HVK..." }
op: read is read-only and passes plan-mode gating. propose
and apply are mutating — plan-mode refuses them.
Hard-coded denylist
crates/setup/src/capabilities.rs::CONFIG_SELF_EDIT_DENYLIST
holds 13 globs that the tool MUST NEVER touch:
| Glob | Intent |
|---|---|
*_token, *_secret, *_password, *_key | credential-shaped suffixes |
pairing.* | pairing internals — touching these revokes the operator's grip |
capabilities.* | cannot widen own capabilities |
mcp.servers.*.auth.*, mcp.servers.*.command | MCP auth + spawn args (running arbitrary binaries via config self-edit is game-over) |
binding.*.role | cannot self-promote to coordinator |
binding.*.plan_mode.* | cannot drop plan-mode guardrails |
remote_triggers[*].url, remote_triggers[*].secret_env | outbound webhook URLs + signing keys |
cron.user_max_entries | operator-only |
agent_registry.store.* | changing the store under a running goal is unsafe |
Source-of-truth lives in code, not YAML — a model that proposes
a patch widening the denylist cannot succeed because that's a
code change requiring review. Validation runs at BOTH propose
(early reject) and apply (defense-in-depth: the staging file
may have been edited externally between propose and apply).
Approval flow
- Model emits
Config { op: "propose", key, value, justification }. - Handler validates the triple gate (capability on, key in
SUPPORTED_SETTINGS ∩ allowed_paths, key NOT in denylist), runs the per-key validator, generates apatch_id, persists the proposal under.nexo/config-proposals/<patch_id>.yaml, parks an approvaloneshotwith theApprovalCorrelator, writes aproposedrow to the audit store, firesnotify_originon the binding's channel with:Operator: reply [config-approve patch_id=<id>] or [config-reject patch_id=<id> reason=...] within 24 h. - Operator replies with the bracketed command on the SAME
(channel, account_id)that originated the proposal. Cross-binding messages are rejected (the entry stays parked). - Model emits
Config { op: "apply", patch_id }. Handler awaits the correlator decision (Approved / Rejected / Expired):- Approved: snapshot
agents.yaml, apply the patch, trigger Phase 18 hot-reload. On reload-Err: restore the snapshot + recordrolled_back. On Ok: cleanup staging- record
applied.
- record
- Rejected: record
rejectedwith the reason. - Expired (24 h elapsed): record
expired.
- Approved: snapshot
Audit log
<state_dir>/config_changes.db stores every state transition
(idempotent on (patch_id, status)). Read-only LLM tool
config_changes_tail { n: 20 } (always available, regardless
of the Cargo feature) returns a markdown table suitable for the
agent's post-mortem.
Schema:
patch_id TEXT
status TEXT -- proposed | applied | rolled_back | rejected | expired
binding_id TEXT
agent_id TEXT
op TEXT -- propose | apply | reject | expire
key TEXT
value TEXT -- pre-redacted: secret-suffix paths render as "<REDACTED>"
error TEXT
created_at INTEGER
applied_at INTEGER
PRIMARY KEY (patch_id, status)
Plan-mode behaviour
Config is in MUTATING_TOOLS but the dispatcher inspects
args.op at call time. op: "read" short-circuits the gate;
op: "propose" and op: "apply" refuse with a
PlanModeRefusal { tool_kind: Config }.
config_changes_tail is always in READ_ONLY_TOOLS — never
refuses.
Error kinds
The tool returns { "ok": false, "error", "kind", ... } on
failure. Stable kind discriminators:
UnknownKey— key not inSUPPORTED_SETTINGS.PathNotAllowed— key not in this agent'sallowed_paths.ForbiddenKey— denylist hit (returns the matched glob).ValidationFailed— per-key validator rejected the value.NoPending— apply called for apatch_idthat was never proposed or already consumed.Rejected— operator replied[config-reject ...].Expired— 24 h elapsed without approval.RolledBack— reload rejected the post-apply config; the snapshot was restored.Yaml/Io/InternalError— fall-through.
References
- PRIMARY:
upstream agent CLI(ConfigTool.ts,supportedSettings.ts,prompt.ts,constants.ts). - Spec:
proyecto/PHASES.md::79.10.
Team tools (Phase 79.6)
Five LLM tools that let an agent form a named team of up to
8 sub-agents that operate in parallel, share a Phase 14 TaskFlow
task list, and communicate via broker DMs. Distinct from the
existing 1-to-1 delegate (Phase 8) which is request/reply
between named agents — teams are coordinated multi-member work
rooted under one lead.
Five tools
| Tool | Op kind | Plan-mode |
|---|---|---|
TeamCreate | mutating | Delegate (refused) |
TeamDelete | mutating | Delegate (refused) |
TeamSendMessage | mutating | Delegate (refused) |
TeamList | read-only | always callable |
TeamStatus | read-only | always callable |
MUTATING_TOOLS and READ_ONLY_TOOLS in
crates/core/src/plan_mode.rs are the source of truth.
Per-agent YAML
agents:
- id: cody
team:
enabled: true # default false; opt-in
max_members: 8 # clamped at 8
max_concurrent: 4 # clamped at 4
idle_timeout_secs: 3600 # 1 h stale-team threshold
worktree_per_member: false # default for TeamCreate;
# per-call override accepted
When enabled: false (the default) the 5 tools are not
registered for this agent — the model never sees them advertised.
SQL store
Three tables in <state_dir>/teams.db (idempotent CREATE):
CREATE TABLE teams (
team_id TEXT PRIMARY KEY,
display_name TEXT NOT NULL,
description TEXT,
lead_agent_id TEXT NOT NULL,
lead_goal_id TEXT NOT NULL,
flow_id TEXT NOT NULL,
worktree_per_member INTEGER NOT NULL,
created_at INTEGER NOT NULL,
deleted_at INTEGER,
last_active_at INTEGER NOT NULL
);
CREATE TABLE team_members (
team_id TEXT NOT NULL,
name TEXT NOT NULL, -- human-readable handle
agent_id TEXT NOT NULL, -- internal UUID
agent_type TEXT,
model TEXT,
goal_id TEXT NOT NULL,
worktree_path TEXT,
joined_at INTEGER NOT NULL,
is_active INTEGER NOT NULL,
last_active_at INTEGER NOT NULL,
PRIMARY KEY (team_id, name)
);
CREATE TABLE team_events (
event_id TEXT PRIMARY KEY,
team_id TEXT NOT NULL,
kind TEXT NOT NULL,
actor_member_name TEXT,
payload_json TEXT NOT NULL,
created_at INTEGER NOT NULL
);
team_id is the sanitised name (lowercase + non-alnum → -).
Composite PK on (team_id, name) enforces unique member names
within a team. deleted_at IS NOT NULL ⇒ soft-deleted.
Broker topics
| Topic | Direction | Payload |
|---|---|---|
team.<team_id>.dm.<member_name> | point-to-point | DmFrame JSON |
team.<team_id>.broadcast | fan-out (lead only) | DmFrame JSON |
#![allow(unused)] fn main() { pub struct DmFrame { pub team_id: String, pub to: String, // member_name or "broadcast" pub from: String, pub body: serde_json::Value, pub correlation_id: Option<String>, } }
The router subscribes once per process to team.>; per-team
in-memory tokio::sync::broadcast::Sender channels deliver to
member runtimes. When a member's goal is Pending (idle
between turns), Phase 67's wake-on hook flips it to Running
on the first DM.
Lifecycle
+-----------+ TeamCreate +-----------+
| no team | ----------------> | team |
+-----------+ | (1 lead) |
+-----+-----+
|
| (operator/79.6.b adds members)
v
+-----------+
| team |
| (N>=2) |
+-----+-----+
|
+-------------------------+--------------------------+
| | |
v v v
TeamSendMessage TeamSendMessage TeamDelete
to: <member> to: "broadcast" (zero running)
DM fan-out soft delete
↻ wake idle ↻ wake all + drop router
Caps
TEAM_MAX_MEMBERS = 8(incl. lead).TEAM_MAX_CONCURRENT_DEFAULT = 4per agent.TEAM_NAME_MAX_LEN = 64,MEMBER_NAME_MAX_LEN = 32.TEAM_IDLE_TIMEOUT_SECS = 3600— reaper marks teams stale.SHUTDOWN_DRAIN_SECS = 30— TeamDelete drain budget.DM_BODY_MAX_BYTES = 64 * 1024perTeamSendMessage.
Per-agent YAML can lower max_members / max_concurrent but
never raise above the constants.
Error kinds
{ ok: false, kind: "...", error: "..." } shape:
TeamingDisabled—team.enabled = falsefor this agent.InvalidName/InvalidMemberName.TeamNameTaken { existing_team_id }.TeamNotFound/TeamDeleted.MemberNotFound.TeamFull { count, cap }.ConcurrentCapExceeded { count, cap }.BodyTooLarge { actual, max }.NotLeader— only the team lead canTeamDelete/ broadcast.OnlyLeadCanBroadcast— non-lead member triedto: "broadcast".NotMember— caller is neither the lead nor a current member;TeamStatusrefuses without confirming team existence.BlockedByActiveMembers { names: [...] }—TeamDeletewhile members are stillRunning.TeammateCannotSpawnTeammate— caller'sAgentContexthasteam_member_name = Some(...). Single-level fan-out only.Wire— missing required arg or malformed shape.
Plan-mode behaviour
TeamCreate,TeamDelete,TeamSendMessageare inMUTATING_TOOLS. Under an active plan-mode they refuse withPlanModeRefusal { tool_kind: Delegate }.TeamList,TeamStatusare inREAD_ONLY_TOOLS. Always callable.
Comparison vs delegate (Phase 8)
delegate (Phase 8) | Team* (Phase 79.6) | |
|---|---|---|
| Topology | 1 → 1 | 1 lead + N parallel members |
| Lifecycle | request/reply | persistent team + audit log |
| Storage | none (broker only) | SQLite (3 tables) |
| Comms | agent.route.{target} | team.{id}.dm.{name} + team.{id}.broadcast |
| Idle/wake | n/a | goal Pending ↔ Running |
| Capability gate | allowed_delegates | team.enabled + caps |
| Best for | quick request to known peer | research fan-out, multi-source verify, full-stack work |
Deferred to 79.6.b
- Spawn-as-teammate via Phase 67 dispatch (
AgentContext.team_idinjection is wired; the actual sub-goal spawn that registers inteam_membersis the missing piece). - Phase 14 FlowFlow link (
flow_idis currently a placeholder equal toteam_id). nexo team list / status / dropoperator CLI.- Force-kill drain in TeamDelete (today blocks; 79.6.b cancels
in-flight goals after
SHUTDOWN_DRAIN_SECS). - MCP server mode (
run_mcp_server) exposes the 5 tools — part of the 79.M MCP exposure parity sweep.
References
- Spec:
proyecto/PHASES.md::79.6.
MCP server exposable catalog (Phase 79.M)
nexo mcp-server advertises a curated subset of the runtime
tool registry to external MCP clients (Claude Desktop, Cursor,
Zed, etc.). The subset is defined in code by a static slice;
operators pick which entries to enable via
mcp_server.expose_tools.
Source-of-truth: EXPOSABLE_TOOLS
#![allow(unused)] fn main() { // crates/config/src/types/mcp_exposable.rs pub static EXPOSABLE_TOOLS: &[ExposableToolEntry] = &[ // ... ExposableToolEntry { name: "cron_list", tier: SecurityTier::ReadOnly, boot_kind: BootKind::Always, feature_gate: None, }, // ... ]; }
Adding a tool to this slice does not expose it — the operator
must still list the name in mcp_server.expose_tools. The slice
controls what is legal to expose; YAML controls what is
actually exposed.
YAML
# config/mcp_server.yaml
mcp_server:
enabled: true
name: "kate"
expose_tools:
- cron_list
- cron_create
- ListMcpResources
- ReadMcpResource
- config_changes_tail
- web_search
- web_fetch
- EnterPlanMode
- ExitPlanMode
- ToolSearch
- TodoWrite
- NotebookEdit
expose_denied_tools:
- Heartbeat
denied_tools_profile:
enabled: true
require_auth: true
require_delegate_allowlist: true
require_remote_trigger_targets: true
allow:
heartbeat: true
delegate: false
remote_trigger: false
Three-bucket policy
| Bucket | BootKind | Behaviour |
|---|---|---|
| Expose | Always | Boot helper constructs the tool from McpServerBootContext; missing handle → labelled skip. |
| Expose (gated) | FeatureGated | Skipped unless the named Cargo feature is enabled. Config is the only entry today. |
| Deny by default | DeniedByPolicy { reason } | Dispatcher denies by default (Heartbeat, delegate, RemoteTrigger). run_mcp_server can optionally override selected entries via mcp_server.expose_denied_tools plus extra safety checks. |
| Defer | Deferred { phase, reason } | Wiring postponed to a follow-up sub-phase. Lsp, Team*. |
Boot dispatch flow
expose_tools (YAML) ┐
├──► EXPOSABLE_TOOLS lookup
│ │
│ ├──► Always → boot helper → Registered | SkippedInfraMissing
│ ├──► FeatureGated → cfg!(feature) check → Registered | SkippedFeatureGated
│ ├──► DeniedByPolicy → SkippedDenied (or override path in run_mcp_server)
│ └──► Deferred → SkippedDeferred
└──► (typo / removed) → UnknownName
Every outcome lands in two telemetry counters:
mcp_server_tool_registered_total{name, tier}mcp_server_tool_skipped_total{name, reason}
reason ∈ {denied_by_policy, deferred, feature_gate_off, infra_missing, unknown_name}.
Boot context
#![allow(unused)] fn main() { // crates/core/src/agent/mcp_server_bridge/context.rs pub struct McpServerBootContext { pub agent_id: String, pub broker: AnyBroker, pub cron_store: Option<Arc<dyn CronStore>>, pub mcp_runtime: Option<Arc<SessionMcpRuntime>>, pub config_changes_store: Option<Arc<dyn ConfigChangesStore>>, pub web_search_router: Option<Arc<WebSearchRouter>>, pub link_extractor: Option<Arc<LinkExtractor>>, pub agent_context: Arc<AgentContext>, } }
run_mcp_server builds the context best-effort: it tries to
open ./data/cron.db, ./data/config_changes.db, and the
env-driven web-search providers when the corresponding entry
is in expose_tools. If a handle cannot be constructed the
relevant tool is skipped with a labelled warn line; the
server still boots.
Safe profile for denied overrides
Denied-by-default tools now require two explicit opt-ins:
- Tool name in
mcp_server.expose_denied_tools. - Matching allow-bit in
mcp_server.denied_tools_profile.allow.*withdenied_tools_profile.enabled: true.
Default profile is fail-closed (enabled: false, all allow bits false).
Additional hardening gates in the profile:
require_auth(defaulttrue): requiresmcp_server.auth_token_envormcp_server.http.auth.require_delegate_allowlist(defaulttrue):delegateonly boots whenagents.<id>.allowed_delegatesis non-empty and not["*"].require_remote_trigger_targets(defaulttrue):RemoteTriggeronly boots whenagents.<id>.remote_triggershas at least one entry.
Adding a new tool
- Implement the tool somewhere in
nexo-core::agent::*so it has atool_def() -> ToolDefand aToolHandlerimpl. - Add an
ExposableToolEntrytoEXPOSABLE_TOOLSwith the appropriatetier+boot_kind. - Add a match arm in
boot_always(or per-bucket helper) that constructs the tool from the boot context and returnsBootResult::Registered. - Add a unit test in
crates/core/src/agent/mcp_server_bridge/dispatch.rs::testscovering the missing-handle and present-handle cases. - The conformance suite in
crates/core/tests/exposable_catalog_test.rswill automatically pick it up via theevery_always_entry_boots_*tests.
Comparison vs nexo run
nexo run | nexo mcp-server | |
|---|---|---|
| Tool registry | full (~31 tools, per-binding) | curated subset of EXPOSABLE_TOOLS |
| Plan-mode gating | yes (MUTATING_TOOLS / READ_ONLY_TOOLS) | yes — same gates apply |
| Capability YAML | per-agent team.enabled, lsp.enabled, etc. | mcp_server.expose_tools allowlist |
| Auth | local trust + binding policy | optional auth_token_env / http.auth.kind |
Threat model — Config self-edit via MCP
The Config tool is the only entry that lets an external MCP
client mutate the agent's YAML at runtime. It is gated by four
locks that all must be open before the boot dispatcher
registers it:
| Lock | Where | Failure → |
|---|---|---|
1. Cargo feature config-self-edit | compile-time | SkippedFeatureGated |
2. mcp_server.auth_token_env or http.auth set | boot-time | SkippedDenied { config-requires-auth-token } |
3. agents.<id>.config_tool.self_edit = true | per-agent YAML | SkippedDenied { config-self-edit-policy-disabled } |
4. agents.<id>.config_tool.allowed_paths non-empty | per-agent YAML | SkippedDenied { config-allowed-paths-must-be-explicit } |
Plus the inherent denylist
(crates/setup/src/capabilities.rs::CONFIG_SELF_EDIT_DENYLIST)
which permanently blocks credentials, allowed_delegates,
outbound_allowlist, system_prompt, plugins, mcp_server., and
broker.. The denylist is hard-coded in code, not operator-
editable from inside a Config call.
Approval flow:
- Model calls
Config { op: "propose", key, value, justification }. - ConfigTool stages the patch under
<state_dir>/config-proposals/<patch_id>.yaml. - ApprovalCorrelator parks a
oneshot::Receiverkeyed bypatch_id. - Operator sends
[config-approve patch_id=<id>]on any plugin inbound topic the daemon subscribes to (works because mcp- server's correlator subscribes toplugin.inbound.>if NATS is shared with the operator'snexo rundaemon). - Model calls
Config { op: "apply", patch_id }. If approved, the YAML write happens; ConfigChangesStore records the row; ReloadTrigger fires.
In mcp-server mode the McpServerReloadTrigger is a stub that
returns Ok with a log line. The mutated YAML is durable on disk;
the operator's nexo run daemon picks it up via Phase 18 file
watcher. The mcp-server process itself does not run a
ConfigReloadCoordinator — same-process reload only happens in
nexo run.
Audit:
- Every read/propose/apply lands in
config_changesSQLite (<state_dir>/config_changes.db) viaConfigChangesStore. - Tail with
Config { op: ... }events:config_changes_tail(read-only, exposable). - Secret values redacted via
DefaultSecretRedactor(matches*_token,*_secret,*_password,*_keysuffixes).
What an MCP client cannot do, even with all locks open:
- Change credentials, API keys, OAuth tokens (denylist).
- Add/remove agent bindings (denylist on
inbound_bindings). - Modify
allowed_delegates,outbound_allowlist,system_prompt(denylist). - Toggle plugins (denylist on
plugins). - Self-elevate
mcp_server.expose_tools(denylist onmcp_server.*). - Bypass approval —
applyalways blocks until correlator gets a matching[config-approve patch_id=<id>]from inbound. - Read secret values without redaction.
References
- PRIMARIO:
upstream agent CLI,upstream agent CLI. - SECUNDARIO:
research/docs/cli/mcp.md:30-120(openclaw mcp servecurated catalog). - Spec:
proyecto/PHASES.md::79.M.
Fork subagent (Phase 80.19)
crates/fork/ — fork-with-cache-share subagent infrastructure. A
fork is a lightweight in-process LLM turn loop that:
- Shares the parent goal's prompt-cache key (system prompt, tools, model, message prefix) so cache hits transfer across the fork boundary.
- Runs as a single LLM turn loop (
LlmClient::chat+ tool dispatch + loop), NOT through Phase 67's heavyweight goal-flow driver-loop (which spawnsclaudesubprocesses and runs acceptance + workspace checks). - Optionally writes a transcript / agent-handle row, or stays
invisible to
agent pswhenskip_transcript: true.
Fork is the primitive that consumes downstream sub-phases:
| Sub-phase | Use of fork |
|---|---|
| 80.1 autoDream consolidation | ForkAndForget + AutoMemFilter (80.20) + 4-phase prompt |
| 80.14 AWAY_SUMMARY | ForkAndForget + read-only memory whitelist + transcript scan |
| Phase 51 eval harness | Sync mode + scripted prompts |
Refactored delegation_tool.rs | Sync mode replacing the bespoke sync delegate |
The upstream runForkedAgent (upstream agent CLI)
is the verbatim reference. nexo's adaptation collapses 17 isolation
fields down to the handful that actually matter in Rust, because
Arc<...> shared state is already isolated by construction.
Public surface
#![allow(unused)] fn main() { use nexo_fork::{ DefaultForkSubagent, ForkSubagent, ForkParams, ForkOverrides, DelegateMode, QuerySource, CacheSafeParams, AllowAllFilter, }; // 1. Snapshot the parent's last LLM request. let cache_safe = CacheSafeParams::from_parent_request(&parent_chat_request); // 2. Build a fork. let handle = DefaultForkSubagent::new() .fork(ForkParams { parent_ctx, llm, tool_dispatcher, prompt_messages: vec![/* fork's first-turn user message */], cache_safe, tool_filter: Arc::new(AllowAllFilter), query_source: QuerySource::Custom("docs_example"), fork_label: "docs_example".into(), overrides: None, max_turns: 10, on_message: None, skip_transcript: true, mode: DelegateMode::ForkAndForget, timeout: Duration::from_secs(300), external_abort: None, }) .await?; // 3. Await completion when ready (or never, for true fire-and-forget). let mut handle = handle; let result = handle.take_completion().unwrap().await?; }
Cache-key invariant (CRITICAL)
CacheSafeParams::fork_context_messages MUST preserve any incomplete
tool_use blocks from the parent. Filtering them strips the paired
tool_result rows and breaks Anthropic's API (400 error), AND breaks
the cache prefix. nexo's crates/llm repairs missing pairings in
transport — same as the main thread — so identical post-repair prefix
keeps the cache hit.
Reference: upstream agent CLI.
#![allow(unused)] fn main() { // CORRECT — pass through unchanged let cs = CacheSafeParams::from_parent_request(&req); // WRONG — never do this // cs.fork_context_messages.retain(|m| !has_dangling_tool_use(m)); }
The test
cache_safe::tests::from_parent_request_preserves_message_prefix_with_partial_tool_use
verifies bit-for-bit pass-through.
Isolation strategy
KAIROS (TypeScript) clones 17 mutable fields per fork:
readFileState, abortController, getAppState, setAppState,
setResponseLength, nestedMemoryAttachmentTriggers,
toolDecisions, etc. Most of these are mutable closures or
mutable maps that JavaScript needs to deep-clone manually.
In nexo, every analogous field on AgentContext is either an Arc
(shared) or wrapped in Arc<RwLock<...>> (interior mutability with
explicit locking). Rust's ownership model already guarantees forks
cannot mutate the parent's state without going through the locks.
We therefore only override the fields whose isolation actually matters:
| Field | Default | Override |
|---|---|---|
agent_id | parent's value | ForkOverrides::agent_id |
critical_system_reminder | none | ForkOverrides::critical_system_reminder (consumed by run_turn_loop) |
abort | new child token; parent → child cascade only | ForkParams::external_abort (caller supplies) |
tool_filter | AllowAllFilter | ForkParams::tool_filter (e.g. AutoMemFilter for 80.1) |
DelegateMode
#![allow(unused)] fn main() { pub enum DelegateMode { Sync, // block until completion ForkAndForget, // tokio::spawn + return ForkHandle immediately } }
ForkAndForget is right when the caller (autoDream, AWAY_SUMMARY) does
not need the result inline. The handle's Drop impl cancels the
abort signal automatically when the future is never consumed —
prevents leaked tokio tasks if the handle is dropped without
take_completion.
Telemetry
Every fork emits a tracing span fork.subagent with fields:
fork_run_id— uuid v4parent_agent—parent_ctx.agent_idfork_label— caller-supplied tag (e.g.auto_dream)query_source—QuerySourcevariantmode—Sync | ForkAndForgetskip_transcript— boolcache_key_hash—u64fromCacheSafeParams::cache_key_hash
The turn loop additionally emits:
fork.cache_break_detected(level WARN) when cache hit ratio drops below 0.5 on the first turn — actionable signal that the fork'sCacheSafeParamsdoes not match the parent. Phase 77.4 cache-break heuristic.fork.tool_filter(level DEBUG) when the filter denies a tool call.
AutoMemFilter (Phase 80.20)
crates/fork::AutoMemFilter is the canonical [ToolFilter] for
forked memory-consolidation work — autoDream (Phase 80.1),
AWAY_SUMMARY (Phase 80.14), eval harness (Phase 51 future). Verbatim
port of upstream agent CLI.
What it allows
| Tool | Allowed when |
|---|---|
REPL | always (inner primitives re-gate via this same filter; required for cache-key parity per upstream :171-180) |
FileRead, Glob, Grep | always (inherently read-only) |
Bash | nexo_driver_permission::is_read_only(command) — composes Phase 77.8 destructive-cmd warning + Phase 77.9 sed-in-place + a positive whitelist of ~45 read-only utilities + redirect / subshell / heredoc detection |
FileEdit, FileWrite | file_path (post-canonicalize) starts with the filter's memory_dir |
| anything else | denied with structured tool_result body so the model can recover within the same turn |
Defense in depth
- Whitelist allow-list — only the seven tool names above; everything else is rejected at the filter layer.
- Bash classifier — composes existing Phase 77.x classifiers + a
conservative whitelist that intentionally drops
tee,awk,perl,python,node,rubybecause they can shell out viasystem(...). Operators add them back per-call only if a pipe-only no-side-effects shape can be validated. - Path canonicalize at construction (
memory_dirresolved once) AND per-call (file_pathresolved beforestarts_with). Defeats symlink swaps and..traversal. - Post-fork audit in 80.1 —
auto_dreamindependently re-checksfiles_touchedpaths after the fork completes, so a filter bypass would still be caught.
Provider-agnostic
The filter operates on tool name + JSON args. It does NOT depend on
any specific [LlmClient] impl — works under Anthropic, OpenAI,
MiniMax, Gemini, DeepSeek, or any future provider that implements
the trait. Tool names are canonical nexo strings (tool_filter::tool_names::*);
provider clients translate to/from native wire formats.
The filter expects flat top-level args. If a provider client wraps
args in a nested envelope (e.g. {"arguments": {...}}), the client
MUST unwrap before dispatch — the filter denies nested shapes
explicitly so a missing unwrap surfaces immediately.
Example
#![allow(unused)] fn main() { use std::sync::Arc; use nexo_fork::{ AutoMemFilter, DefaultForkSubagent, DelegateMode, ForkParams, ForkSubagent, QuerySource, CacheSafeParams, }; let memory_dir = std::path::PathBuf::from("/var/lib/nexo/memory/agent_a"); std::fs::create_dir_all(&memory_dir)?; let filter = Arc::new(AutoMemFilter::new(&memory_dir)?); let handle = DefaultForkSubagent::new() .fork(ForkParams { parent_ctx, llm, tool_dispatcher, prompt_messages: vec![/* /dream prompt */], cache_safe: CacheSafeParams::from_parent_request(&parent_request), tool_filter: filter, // ← whitelist applied here query_source: QuerySource::AutoDream, fork_label: "auto_dream".into(), overrides: None, max_turns: 30, on_message: None, skip_transcript: true, mode: DelegateMode::ForkAndForget, timeout: std::time::Duration::from_secs(300), external_abort: None, }) .await?; }
Cross-process forks
Out of scope for 80.19. When Phase 32 multi-host orchestration lands,
a NatsForkSubagent impl will publish on
agent.fork.<run_id>.events so a fork can run on a remote daemon
sharing the parent's prompt cache via the upstream LLM provider's
cache plane.
Agent event firehose (Phase 82.11)
Operator UIs (and any microapp with the right capability) need real-time visibility into what agents are doing: chat lines, pause state changes, escalations to humans, batch-job results, future custom kinds. The agent event firehose is the single architectural seam that delivers them.
The wire format — AgentEventKind — is a #[non_exhaustive]
discriminated enum on nexo/notify/agent_event (admin RPC
reference).
This page documents the runtime composition that gets a frame
from a producer (transcript writer, processing handler,
escalation handler) onto every interested subscriber.
Trait
Every producer holds a single Arc<dyn AgentEventEmitter>:
#![allow(unused)] fn main() { #[async_trait] pub trait AgentEventEmitter: Send + Sync + Debug { async fn emit(&self, event: AgentEventKind); } }
Implementations are best-effort: failures log and drop. The
contract is that emit MUST NOT block the producer. Boot is free
to swap in any composition without touching emit sites.
Source: crates/core/src/agent/agent_events.rs.
Implementations
BroadcastAgentEventEmitter — live in-process
A tokio::sync::broadcast::Sender<AgentEventKind> with a 256-frame
ring buffer. Subscribers that lag past the buffer get
RecvError::Lagged(n) rather than panic — they are expected to
call nexo/admin/agent_events/list to resync.
Single-daemon installs run happily with just this. No durability, no cross-host.
SqliteAgentEventLog — durable backfill
Append-only log keyed by autoincrement id. Denormalised columns
(kind, agent_id, tenant_id, at_ms) so the common filter
axes hit indexed paths; full AgentEventKind round-trips as JSON
in payload_json so future enum variants land non-breaking.
Doubles as AgentEventEmitter so it slots into the composition
without a separate wiring path. sweep_retention(retention_days, max_rows) mirrors the admin-audit sweep so a single boot
scheduler runs both.
Read API (AgentEventLog::list_recent) supports agent_id +
kind + tenant_id + since_ms + limit filters with
parameterised SQL.
Source: crates/core/src/agent/admin_rpc/agent_events_sqlite.rs.
NatsAgentEventEmitter — multi-host bridge
Publishes serialised AgentEventKind frames to
<prefix>.<agent_id>.<kind> (default prefix
nexo.agent_events). Subscribers route per-agent
(nexo.agent_events.ana.>), per-kind
(nexo.agent_events.*.processing_state_changed), or both at the
broker.
The pure helper agent_event_subject(prefix, &event) exposes the
routing key without a live client — useful for boot-time
validation and for tests. agent_id is sanitised at emit-site
(./*/>/whitespace → _) so a malformed config can't break
wildcard subscriptions.
Failure mode is best-effort: publish errors log and drop. The broker crate's circuit breaker + disk queue protect the daemon when NATS is unreachable.
Source: crates/core/src/agent/agent_events.rs
(NatsAgentEventEmitter, agent_event_subject).
TeeAgentEventEmitter — fan-out
Composes several inner emitters into a single
Arc<dyn AgentEventEmitter>. Boot wires:
Tee([
BroadcastAgentEventEmitter, // live JSON-RPC notifications
SqliteAgentEventLog, // durable backfill across restart
NatsAgentEventEmitter, // multi-host bridge for SaaS
])
Per-sink failures stay isolated by trait contract. Tee preserves
that guarantee — emit returns after every inner has been polled
sequentially. Production keeps each inner non-blocking (broadcast
try_send, NATS publish, async SQLite append) so a slow sink
cannot throttle the whole tee.
Boot composition state
AdminBootstrapInputs (in nexo-setup) accepts an optional
agent_event_log: Option<Arc<SqliteAgentEventLog>>. When Some,
build_with_firehose composes Tee([Broadcast, Log])
internally — every emit through bootstrap.event_emitter() lands
in the durable log. The NATS bridge is library-side ready
(NatsAgentEventEmitter::new(client)) but not yet stitched by
boot — adding it is one line in the same composition once the
broker handle is threaded into bootstrap inputs.
NoopAgentEventEmitter
Default for headless installs and tests. Useful as an explicit
"no-op, by design" instead of None plumbed through every emit
site.
Subscribe paths
Subscribers reach events through three doors:
| Door | When | How |
|---|---|---|
| Live JSON-RPC notifications | Operator UI online during the emit | Microapp holds transcripts_subscribe / agent_events_subscribe_all; daemon delivers nexo/notify/agent_event frames automatically. |
| Backfill RPC | Operator UI starts after the emit | nexo/admin/agent_events/list reads from the MergingAgentEventReader — transcripts JSONL for transcript_appended, durable SQLite log for non-transcript kinds, merged by at_ms desc. |
| External NATS subscriber | Operator dashboard runs off-daemon | Subscribe directly at <prefix>.<agent_id>.<kind>. |
MergingAgentEventReader (in
crates/core/src/agent/admin_rpc/domains/agent_events.rs)
respects the kind filter:
kind=Some("transcript_appended")→ transcripts JSONL only.kind=Some(other)→ durable log only.kind=None→ both, merged byat_msdesc, capped at the caller'slimit.
Boot wires the SQLite log as a Tee sink alongside the broadcast
emitter — meaning the log captures TranscriptAppended too. The
merger drops those on the log side so the JSONL reader stays
canonical for chat history; subscribers never see duplicates.
Variants today
| Variant | Producer | Notes |
|---|---|---|
TranscriptAppended | TranscriptWriter::append_entry (Phase 82.11) | Body already-redacted at emit. |
PendingInboundsDropped | inbound dispatcher under processing/pause (Phase 82.13.b.3) | Fired only on cap eviction. |
EscalationRequested | (deferred) escalate_to_human built-in tool | Variant + emit shape pinned in 82.14.b.firehose. |
EscalationResolved | escalations::resolve + auto_resolve_on_pause (Phase 82.14.b.firehose) | Same shape from both call sites so subscribers can't tell paths apart. |
ProcessingStateChanged | processing::pause + processing::resume (Phase 82.13.b.firehose) | Carries prev_state + new_state so subscribers render correct deltas. Idempotent retries skip the emit. |
Adding a new kind
AgentEventKind is #[non_exhaustive] with #[serde(tag = "kind")],
so a new variant lands non-breaking in three steps:
- Add the variant in
crates/tool-meta/src/admin/agent_events.rs. Mirror the conventions:agent_iddenormalised, optionaltenant_id(skip-when-None on the wire), anat_msfield for ordering. - Wire the producer to call
emitter.emit(...)from the place the event becomes true. Pre-fetch any "previous" state before the mutation so the wire frame carries both ends of the transition. - Extend
agent_events_sqlite::extract_metadataandagent_events::event_at_msso the durable log + the merger know how to project the new variant. Unknown future variants fall through to a warn-skip on the durable side and the live broadcast still surfaces them — failure stays graceful.
No FTS schema change is required: search remains
TranscriptAppended-only today. Future revs that want full-text
search over non-transcript kinds add an
AgentEventLog::search_events method without touching existing
emit sites.
Pure-Rust quick tunnel
nexo-tunnel-quick exposes a local TCP port over
https://*.trycloudflare.com without the cloudflared Go
subprocess. It is the public-HTTPS plumbing the daemon needs for
WhatsApp QR pairing and dev-time webhook receivers.
Phase 92 lineage:
cloudflare-quick-tunnel(crates.io, upstream) — QUIC + Cap'n Proto-RPC client against Cloudflare'sargotunneledge.nexo-tunnel-quick(this crate) — workspace wrapper that adds the sidecar URL file accessor, the lifecycle metrics module, and surfaces the supervisor knobs through a stable API.nexo-tunnel(legacy alias, v0.3.x) — re-exportsnexo-tunnel-quick::*verbatim until Phase 92.11 retires it.
Public API
#![allow(unused)] fn main() { use std::time::Duration; use nexo_tunnel_quick::{TunnelManager, DEFAULT_GRACE_PERIOD}; let handle = TunnelManager::new(8080) .with_timeout(Duration::from_secs(30)) .start() .await?; println!("public URL: {}", handle.url); println!("edge POP : {}", handle.location); println!("tunnel id: {}", handle.tunnel_id); if let Some(m) = handle.metrics().await { println!( "streams={} in={} out={} reconnects={}", m.streams_total, m.bytes_in, m.bytes_out, m.reconnects, ); } handle.shutdown_with(DEFAULT_GRACE_PERIOD).await; }
Supervisor
Heartbeat, reconnect-with-backoff and graceful unregisterConnection
all run inside the upstream supervisor (a Tokio task owned by
QuickTunnelHandle). Visible knobs:
| Constant | Value | Role |
|---|---|---|
DEFAULT_HANDSHAKE_TIMEOUT | 30 s | wait for edge to register before failing |
DEFAULT_GRACE_PERIOD | 30 s | drain in-flight streams during shutdown |
MAX_RECONNECT_ATTEMPTS | 10 | consecutive supervisor reconnect failures |
MAX_RECONNECT_ATTEMPTS exhaustion surfaces as
TunnelError::PermanentFailure(attempts) on the next metrics()
poll (the supervisor task closes itself; the handle still answers
shutdown() cleanly).
Telemetry (Phase 92.10)
Process-wide lifecycle counters live in nexo_tunnel_quick::metrics
and follow the same lock-free pattern as Phase 86
(nexo-memory::metrics): LazyLock<AtomicU64> / LazyLock<DashMap>
storage, hand-rolled Prometheus text rendering, no prometheus
crate dep, no metrics server here. Operators stitch the renderer
output into the runtime's /metrics aggregator.
| Counter | Labels | Meaning |
|---|---|---|
tunnel_starts_total | — | successful TunnelManager::start calls |
tunnel_starts_failed_total | reason | api / discovery / quic_dial / register / … |
tunnel_shutdowns_total | — | graceful TunnelHandle::shutdown_with invocations |
tunnel_streams_total | tunnel_id | streams proxied (per tunnel, supervisor counter) |
tunnel_bytes_in_total | tunnel_id | bytes edge → local |
tunnel_bytes_out_total | tunnel_id | bytes local → edge |
tunnel_reconnects_total | tunnel_id | supervisor reconnect cycles |
Two render entry points:
#![allow(unused)] fn main() { // Lifecycle counters only — no per-handle snapshot. let text = nexo_tunnel_quick::metrics::render_prometheus(); // Lifecycle + per-tunnel supervisor counters from live handles. let text = nexo_tunnel_quick::metrics::render_prometheus_for(&[&h1, &h2]).await; }
reason cardinality is capped at 16 distinct labels — extra ones
collapse to "other" so a misbehaving error path can't blow up
Prometheus storage.
Sidecar URL file
nexo pair start is a separate process from the daemon, so the
active URL is published to a file at:
$NEXO_HOME/state/tunnel.urlwhenNEXO_HOMEis set, otherwise~/.nexo/state/tunnel.url.
The daemon writes atomically (<path>.tmp + rename) on tunnel-up
and removes on shutdown:
#![allow(unused)] fn main() { use nexo_tunnel_quick::{write_url_file, read_url_file, clear_url_file}; write_url_file(&handle.url)?; let active = read_url_file(); clear_url_file()?; }
No daemon connection, no broker round-trip, no shared library state — the CLI reads the file directly.
llm.yaml
LLM provider registry. Each agent's model.provider must resolve to a
key in this file.
Source: crates/config/src/types/llm.rs.
Shape
providers:
minimax:
api_key: ${MINIMAX_API_KEY:-}
group_id: ${MINIMAX_GROUP_ID:-}
base_url: https://api.minimax.io
rate_limit:
requests_per_second: 2.0
quota_alert_threshold: 100000
anthropic:
api_key: ${ANTHROPIC_API_KEY:-}
base_url: https://api.anthropic.com
rate_limit:
requests_per_second: 2.0
auth:
mode: oauth_bundle
bundle: ./secrets/anthropic_oauth.json
retry:
max_attempts: 5
initial_backoff_ms: 1000
max_backoff_ms: 60000
backoff_multiplier: 2.0
Per-provider fields
| Field | Type | Required | Default | Purpose |
|---|---|---|---|---|
api_key | string | ✅ | — | API key. Supports ${ENV_VAR} and ${file:…}. |
base_url | url | ✅ | — | API endpoint. Override to use a proxy or a local server. |
group_id | string | — | — | MiniMax-only. Group identifier. |
rate_limit.requests_per_second | f64 | — | 2.0 | Outbound throttle. |
rate_limit.quota_alert_threshold | u64 | — | — | Optional soft-alarm tokens-per-day threshold. |
api_flavor | enum | — | openai_compat | openai_compat or anthropic_messages. Lets MiniMax expose the Anthropic wire. |
embedding_model | string | — | — | Override model used for embeddings (e.g. Gemini's text-embedding-004). |
safety_settings | JSON | — | — | Gemini-only; attached verbatim to requests. |
Top-level retry block
Applies to every provider that doesn't define its own:
| Field | Default | Purpose |
|---|---|---|
max_attempts | 5 | Total attempts including the first try. |
initial_backoff_ms | 1000 | First backoff. |
max_backoff_ms | 60000 | Cap. |
backoff_multiplier | 2.0 | Exponential factor. |
Retries are jittered to avoid thundering-herd reconnects. See Fault tolerance — Retry policies.
Auth modes
auth:
mode: auto | static | token_plan | oauth_bundle
bundle: ./secrets/anthropic_oauth.json
setup_token_file: ./secrets/anthropic_setup.json
refresh_endpoint: https://auth.example.com/refresh
client_id: your-oauth-client
mode | When |
|---|---|
auto | Let the provider client decide from available credentials. |
static | Use api_key verbatim. |
token_plan | MiniMax "Token Plan" OAuth bundle. |
oauth_bundle | Anthropic PKCE OAuth bundle written by agent setup. |
Supported providers
| Key | Notes |
|---|---|
minimax | Primary provider. MiniMax M2.5. OpenAI-compat or Anthropic-flavour wire. |
anthropic | Claude models. API key or OAuth subscription. |
openai | OpenAI API and anything speaking its wire (Ollama, Groq, local proxies). |
gemini | Google Gemini, including embedding support. |
Provider-specific docs
Common mistakes
api_key: sk-…committed to git. Use${ENV_VAR}or${file:./secrets/…}; thesecrets/directory is gitignored.- Mismatched
embedding_modeldimensions. The vector store assertsembedding.dimensionsmatches the model output. A mismatch aborts startup with an explicit message. - Setting both
api_keyandauth.mode: oauth_bundle. The auth mode wins. Theapi_keyis kept as a fallback for tools that bypass the OAuth path.
Input-token reduction (context_optimization)
Four independent kill switches for prompt caching, online history compaction, pre-flight token counting, and the workspace bundle cache. Full schema, defaults, and rollout guidance in Operations → Context optimization.
broker.yaml
Broker topology, disk persistence, and fallback behavior.
Source: crates/config/src/types/broker.rs.
Shape
broker:
type: nats # nats | local
url: nats://localhost:4222
auth:
enabled: false
nkey_file: ./secrets/nats.nkey
persistence:
enabled: true
path: ./data/queue
limits:
max_payload: 4MB
max_pending: 10000
fallback:
mode: local_queue
drain_on_reconnect: true
Fields
| Field | Type | Default | Purpose |
|---|---|---|---|
type | nats | local | local | local keeps the whole bus in-process; nats uses a real NATS server. |
url | url | — | NATS connection URL (ignored when type: local). |
auth.enabled | bool | false | Turn on NKey mTLS. |
auth.nkey_file | path | — | Path to the NKey file when auth.enabled. |
persistence.enabled | bool | true | Turn on the SQLite disk queue. |
persistence.path | path | ./data/queue | Directory for the disk queue SQLite DB. |
limits.max_payload | size | 4MB | Reject events larger than this. |
limits.max_pending | u64 | 10000 | Hard cap on the disk queue; past this, oldest events are shed. |
fallback.mode | local_queue | drop | local_queue | What to do when NATS is unreachable. |
fallback.drain_on_reconnect | bool | true | Replay the disk queue when NATS returns. |
Operational notes
type: localfor single-machine dev (and small prod). You don't need NATS running just to try the agent. The local broker matches NATS subject semantics, so everything works the same.- Subprocess plugins work in
localmode too (Phase 92). Whentype: local, the daemon derives astdio_bridgetransport and pipesbroker.publish/broker.eventfor the extracted subprocess plugins (WhatsApp, Telegram, marketing) through the stdio JSON-RPC channel it already uses for tool calls — no NATS server required.stdio_bridgeis daemon-derived; operators never pick it in YAML. Full picture: broker shapes. - Switch at runtime with
nexo set-broker. Rewrites this file + SIGHUPs the running daemon (~3 s blackout; in-flight messages drained from the persistence layer):nexo set-broker nats --url nats://localhost:4222 # → multi-host nexo set-broker local # → stdio bridge - Disk queue always on in production. Even on a single machine. It's the guarantee against losing events on a NATS blip.
drain_on_reconnect: trueis FIFO. See Event bus — Disk queue.
See also:
memory.yaml
Short-term sessions, long-term SQLite storage, and optional vector search.
Source: crates/config/src/types/memory.rs.
Shape
short_term:
max_history_turns: 50
session_ttl: 24h
max_sessions: 10000
long_term:
backend: sqlite
sqlite:
path: ./data/memory.db
vector:
enabled: false
backend: sqlite-vec
default_recall_mode: hybrid
embedding:
provider: http
base_url: https://api.openai.com/v1
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY}
dimensions: 1536
timeout_secs: 30
Short-term
Per-session conversation buffer held in memory by
SessionManager.
| Field | Default | Purpose |
|---|---|---|
max_history_turns | 50 | Turns kept before oldest are pruned into long-term memory. |
session_ttl | 24h | How long a session lives idle before eviction. humantime syntax. |
max_sessions | 10000 | Soft cap. On overflow the oldest-idle session is evicted (fires on_expire). 0 = unbounded. |
Long-term
Persisted memory, durable across restarts.
| Field | Options | Default | Purpose |
|---|---|---|---|
backend | sqlite | redis | sqlite | Storage engine. |
sqlite.path | path | ./data/memory.db | SQLite file (with sqlite-vec extension loaded when vector enabled). |
redis.url | url | — | Redis connection string (when backend: redis). |
Vector
Opt-in semantic memory.
| Field | Default | Purpose |
|---|---|---|
enabled | false | Opt-in. |
backend | sqlite-vec | Zero-extra-infrastructure vector index. |
default_recall_mode | hybrid | Used when the memory tool call omits mode. Options: keyword, vector, hybrid. |
embedding.provider | http | Where to fetch embeddings. http = any OpenAI-compatible embeddings server. |
embedding.base_url | — | Embeddings endpoint. |
embedding.model | — | Model id, e.g. text-embedding-3-small, nomic-embed-text. |
embedding.api_key | — | Key for the embeddings server. Supports ${ENV_VAR} / ${file:…}. |
embedding.dimensions | — | Must match the model output (1536 for OpenAI 3-small; 768 for nomic). Mismatch aborts startup. |
embedding.timeout_secs | 30 | Embeddings request timeout. |
Memory layers
flowchart LR
MSG[incoming message] --> STM[short-term<br/>in-memory buffer]
STM -->|turns exceed max| PRUNE[prune]
PRUNE --> LTM[(long-term<br/>SQLite)]
LTM --> EMB{vector<br/>enabled?}
EMB -->|yes| VEC[(sqlite-vec index)]
TOOL[memory tool] --> RECALL{recall mode}
RECALL -->|keyword| LTM
RECALL -->|vector| VEC
RECALL -->|hybrid| LTM
RECALL -->|hybrid| VEC
Per-agent isolation
Each agent's memory DB lives under its workspace when
workspace_git is enabled — keeps memories forensically reviewable and
prevents one agent from reading another's history.
See also:
Drop-in agents
config/agents.d/*.yaml is a merge-directory for agent definitions
that should not live in agents.yaml — typically anything with
business content (sales prompts, pricing tables, internal phone
numbers, customer-facing identities).
Source: crates/config/src/lib.rs (merge logic).
Why it exists
- Keep
agents.yamlpublic-safe and checked into git - Keep sensitive content gitignored and loaded at runtime
- Compose layered configs (
00-dev.yaml,10-prod.yaml) without editing a single monolithic file - Ship
.example.yamltemplates so the shape stays discoverable
.gitignore rules include:
config/agents.d/*.yaml
!config/agents.d/*.example.yaml
The .example.yaml files are committed and serve as templates; the
real .yaml files are not.
Merge order
Files are loaded in lexicographic filename order and their agents
arrays are concatenated to the base agents.yaml:
flowchart TD
BASE[agents.yaml] --> MERGE[merged catalog]
D1[agents.d/00-shared.yaml] --> MERGE
D2[agents.d/10-ana.yaml] --> MERGE
D3[agents.d/20-kate.yaml] --> MERGE
EX[agents.d/ana.example.yaml] -.->|gitignored template<br/>usually not loaded| MERGE
Every file must have the top-level agents: [...] shape:
# config/agents.d/10-ana.yaml
agents:
- id: ana
model:
provider: minimax
model: MiniMax-M2.5
plugins: [whatsapp]
inbound_bindings:
- plugin: whatsapp
system_prompt: |
…private content…
Agent id collisions
Two files cannot define the same agent.id. On collision the loader
fails fast with a clear message. If you want to override an agent,
either:
- Replace the entry (rename or remove the original)
- Use
inbound_bindings[]per-binding overrides inside a single entry
Common patterns
Public vs. private split
config/agents.yaml # committed, only support/ops agents
config/agents.d/ana.yaml # gitignored, full sales prompt
config/agents.d/kate.yaml # gitignored, personal assistant
config/agents.d/ana.example.yaml # committed, empty template
Environment layering
config/agents.d/00-common.yaml # shared defaults
config/agents.d/10-dev.yaml # dev-only overrides (loaded only on dev box)
Swap the 10-*.yaml file per environment. Docker compose can mount
the right one from a secret volume.
Validation
#[serde(deny_unknown_fields)]still applies to every filevalidate_agents()runs after the merge — checks duplicate ids, missing plugin references, invalid skill directories- Errors name the file and the offending agent id
Per-agent credentials
Bind each agent to specific WhatsApp / Telegram / Google accounts so outbound traffic originates from the right number, bot, or mailbox — never from a shared pool.
Mental model
Three layers:
- Plugin instance — a labelled WhatsApp session or Telegram bot in
config/plugins/{whatsapp,telegram}.yaml. Each instance owns its own token / session_dir and an optionalallow_agentslist. - Google account — an entry in the optional
config/plugins/google-auth.yaml. Each account is 1:1 with anagent_id. - Agent binding — in
config/agents.d/<agent>.yaml, thecredentials:block pins the agent to the instance / account it may use for outbound tool calls.
The runtime runs a boot-time gauntlet that cross-checks all three layers before any plugin boots. Every invariant violation surfaces in a single report so you can fix the full YAML in one edit.
Config schemas
config/agents.d/ana.yaml
agents:
- id: ana
credentials:
whatsapp: personal # must match whatsapp.yaml instance
telegram: ana_bot # must match telegram.yaml instance
google: ana@gmail.com # must match google-auth.yaml accounts[].id
# Opt-out for the symmetric-binding warning when inbound bot and
# outbound bot are intentionally different:
# telegram_asymmetric: true
inbound_bindings:
- { plugin: whatsapp, instance: personal }
- { plugin: telegram, instance: ana_bot }
config/plugins/whatsapp.yaml
whatsapp:
- instance: personal
session_dir: ./data/workspace/ana/whatsapp/personal
media_dir: ./data/media/whatsapp/personal
allow_agents: [ana] # defense-in-depth ACL
- instance: work
session_dir: ./data/workspace/kate/whatsapp/work
media_dir: ./data/media/whatsapp/work
allow_agents: [kate]
config/plugins/telegram.yaml
telegram:
- instance: ana_bot
token: ${file:./secrets/telegram/ana_token.txt}
allow_agents: [ana]
allowlist:
chat_ids: [1194292426]
- instance: kate_bot
token: ${file:./secrets/telegram/kate_token.txt}
allow_agents: [kate]
config/plugins/google-auth.yaml
google_auth:
accounts:
- id: ana@gmail.com
agent_id: ana # 1:1 — the gauntlet enforces it
client_id_path: ./secrets/google/ana_client_id.txt
client_secret_path: ./secrets/google/ana_client_secret.txt
token_path: ./secrets/google/ana_token.json
scopes:
- https://www.googleapis.com/auth/gmail.modify
Agents that still declare the legacy inline google_auth block are
auto-migrated into this store on boot (a warning tells you to migrate).
What the gauntlet validates
| Check | Lenient | Strict |
|---|---|---|
Duplicate session_dir across instances | error | error |
session_dir that is a parent of another | error | error |
| Credential file with lax permissions (linux 0o077) | error | error |
credentials.<ch> points to an instance that does not exist | error | error |
Agent listens on >1 instance without declaring credentials.<ch> | error | error |
Instance allow_agents excludes a binding agent | error | error |
Inbound instance ≠ outbound instance (no <ch>_asymmetric) | warn | error |
Inline agents.<id>.google_auth without matching google-auth.yaml | warn | warn |
Linux permission check is skipped for /run/secrets/* (Docker secrets)
and can be disabled entirely with CHAT_AUTH_SKIP_PERM_CHECK=1.
Topics
Outbound tool calls land on instance-suffixed topics when the resolver has a binding:
plugin.outbound.whatsapp.<instance>
plugin.outbound.telegram.<instance>
Unlabelled (instance: None) plugin entries keep publishing to the
legacy bare topic plugin.outbound.whatsapp / plugin.outbound.telegram
for full back-compat.
CLI gate
# Run the full gauntlet without booting the daemon. Exits 0 clean,
# 1 on errors, 2 on warnings-only.
agent --config ./config --check-config
# Promote warnings to errors (CI lane).
agent --config ./config --check-config --strict
The gate scans agents.yaml, every agents.d/*.yaml,
whatsapp.yaml, telegram.yaml, and google-auth.yaml. Sample
failure:
credentials: FAILED with 1 error(s):
1. agent 'ana_per_binding_example' binds credentials.telegram='ana_tg' but no such telegram instance exists (available: [])
Secrets in logs
The credential layer never logs a raw account id. Every reference is
via an 8-byte sha256(account_id) fingerprint rendered as hex:
2025-04-24T16:03:42Z INFO credentials.audit agent="ana" channel="whatsapp" fp=a3f2…7c direction=outbound
The fingerprint is pinned — switching the algorithm is an explicit
breaking change tracked by crates/auth/tests/fingerprint_stability.rs.
Observability
Nine Prometheus series land at /metrics:
| Series | Type | Labels |
|---|---|---|
credentials_accounts_total | gauge | channel |
credentials_bindings_total | gauge | agent, channel |
channel_account_usage_total | counter | agent, channel, direction, instance |
channel_acl_denied_total | counter | agent, channel, instance |
credentials_resolve_errors_total | counter | channel, reason |
credentials_breaker_state | gauge | channel, instance |
credentials_boot_validation_errors_total | counter | kind |
credentials_insecure_paths_total | gauge | — |
credentials_google_token_refresh_total | counter | account_fp, outcome |
Back-compat
- Configs without a
credentials:block keep working — the resolver infers outbound from the singleinbound_bindingsentry when it is unambiguous; otherwise outbound tools are marked unbound and fall back to the legacy bare topic. - Plugin entries with
instance: Nonestay on the legacy bare topic. agents.<id>.google_authstill registersgoogle_*tools for that agent;google-auth.yamlis preferred going forward.
Hot-reload (no daemon restart)
Edit agents.d/*.yaml, plugins/whatsapp.yaml, plugins/telegram.yaml,
or plugins/google-auth.yaml, then trigger a reload via the loopback
admin endpoint:
curl -fsSX POST http://127.0.0.1:9091/admin/credentials/reload | jq
{
"accounts_wa": 2,
"accounts_tg": 2,
"accounts_google": 1,
"warnings": [],
"version": 4
}
The resolver runs the gauntlet against the fresh files, then atomically
swaps bindings in place. Plugin tools holding Arc<…> references see
the new state on their next call. Failure mode: gauntlet errors
return HTTP 400 with the error list; the previous bindings stay
active so a typo in YAML does not knock out the runtime.
CredentialHandles already issued to in-flight tool calls keep
working — handles are by-value clones; the resolver only mediates
lookup of future calls.
What the reload does NOT cover
- Adding a brand-new WhatsApp / Telegram instance still requires a
restart for the plugin (each instance owns its own session_dir
- websocket). The resolver picks up the new account but the plugin side stays as-was until next boot.
- Removing an account leaks its breaker entry in
BreakerRegistryuntil restart. No correctness impact.
Google client_id / client_secret rotation
Rewriting the secret files (./secrets/<agent>_google_client_id.txt,
..._client_secret.txt) is picked up automatically on the next
google_* tool call — GoogleAuthClient checks file mtime before
each network hop and re-reads when it advanced. No reload call
required for that case. Audit log line:
INFO credentials.audit event="google_secrets_refreshed" \
google_*: re-read client_id/client_secret after on-disk rotation
Strict mode
agent --check-config --strict promotes warnings to errors. Two
checks behave differently under strict:
| Condition | Lenient | Strict |
|---|---|---|
Inline agents.<id>.google_auth block (legacy) | warn + auto-migrate | BuildError::LegacyInlineGoogleAuth, fail boot |
Asymmetric inbound ≠ outbound (no <ch>_asymmetric: true) | warn | error |
Run --strict in CI to gate PRs that touch credential YAML.
Migrating
- Add
instance:+allow_agents:to each entry inwhatsapp.yaml/telegram.yaml. - Create
config/plugins/google-auth.yamlwith oneaccounts[]per agent that needs Gmail. - Add
credentials:to eachagents.d/*.yaml. - Run
agent --check-config --strict. Fix every listed error. - Commit.
pollers.yaml
The Phase 19 generic poller subsystem. One runner orchestrates N
modules — each module is an impl Poller (gmail, rss, calendar,
webhook_poll, or anything you write yourself) — and every module
shares the same scheduler, lease, breaker, cursor persistence, and
outbound dispatch via Phase 17 credentials.
Source: crates/poller/, crates/config/src/types/pollers.rs.
Top-level shape
pollers:
enabled: true
state_db: ./data/poller.db
default_jitter_ms: 5000
lease_ttl_factor: 2.0
failure_alert_cooldown_secs: 3600
breaker_threshold: 5
jobs:
- id: ana_leads
kind: gmail
agent: ana
schedule: { every_secs: 60 }
config:
query: "is:unread subject:lead"
deliver: { channel: whatsapp, to: "57300...@s.whatsapp.net" }
message_template: |
New lead 🚨
{snippet}
Absent file → subsystem off (no jobs spawn, no admin endpoint).
Top-level fields
| Field | Default | Purpose |
|---|---|---|
enabled | true | Master switch. false skips everything below. |
state_db | ./data/poller.db | SQLite path for poll_state + poll_lease. Created if missing. |
default_jitter_ms | 5000 | Random offset added to next_run_at when a job's schedule does not declare its own. Avoids thundering herd. |
lease_ttl_factor | 2.0 | Lease TTL = factor × interval (min 30s). A daemon that crashes mid-tick releases the lease via expiry; another worker takes over without rerunning side effects unless your module is non-idempotent. |
failure_alert_cooldown_secs | 3600 | Per-job cooldown for failure_to alerts. Persisted in poll_state.last_failure_alert_at so it survives restarts. |
breaker_threshold | 5 | Consecutive Transient errors before the per-job circuit breaker opens. |
jobs | [] | Per-job entries (see below). |
Per-job fields
| Field | Required | Purpose |
|---|---|---|
id | ✅ | Unique. Used as session key for state, metrics, admin endpoints, lease. |
kind | ✅ | Discriminator. Must match a registered Poller::kind() (see Built-ins and Build a poller). |
agent | ✅ | Agent whose Phase 17 credentials this job uses. The runner looks up the binding for whatever channel the module needs (Google for fetch, WhatsApp/Telegram for outbound, etc). |
schedule | ✅ | One of every, cron, at (see Schedules). |
config | — | Module-specific options. Validated by Poller::validate at boot. Bad config rejects this job only — siblings keep loading. |
failure_to | — | { channel, to } for an alert when consecutive_errors crosses breaker_threshold. Optional — omit to log only. |
paused_on_boot | false | Persist paused = 1 in state at startup. Useful for staged rollouts. |
Schedules
# Repeat every N seconds. Most common.
schedule: { every_secs: 60 }
# 6-field cron: sec min hour dom mon dow.
schedule:
cron: "0 */5 * * * *" # every 5 minutes on the boundary
tz: "America/Bogota" # accepted; evaluated in UTC unless cron-tz feature on
stagger_jitter_ms: 2000 # local override for this job
# One-shot at an RFC3339 instant. After it fires the job stays paused.
schedule: { at: "2026-04-26T15:00:00Z" }
Built-ins
kind | Purpose | Cursor | Auth |
|---|---|---|---|
gmail | Search Gmail, regex extract, dispatch | Reserved (Gmail UNREAD + mark_read does dedup) | Phase 17 Google |
rss | RSS / Atom feeds | ETag + bounded seen-id ring | None |
webhook_poll | Generic JSON GET / POST | Bounded seen-id ring | None / custom headers |
google_calendar | Calendar v3 events incremental sync | nextSyncToken | Phase 17 Google |
gmail
- id: ana_leads
kind: gmail
agent: ana
schedule: { every_secs: 60 }
config:
query: "is:unread subject:(lead OR interesado)"
newer_than: "1d" # avoids back-filling years on first deploy
max_per_tick: 20
dispatch_delay_ms: 1000 # throttle between dispatches in same tick
sender_allowlist: ["@mycompany.com"]
extract:
name: "Nombre:\\s*(.+)"
phone: "Tel:\\s*(\\+?\\d+)"
require_fields: [name, phone]
message_template: |
New lead 🚨 {name} — {phone}
{snippet}
mark_read_on_dispatch: true
deliver: { channel: whatsapp, to: "57300...@s.whatsapp.net" }
Multiple gmail jobs for the same agent share a cached
GoogleAuthClient — token refreshes happen once across all jobs.
google_* errors are classified: 401 / invalid_grant / revoked
→ Permanent (auto-pause), 5xx / network → Transient (backoff).
rss
- id: ana_blog_watch
kind: rss
agent: ana
schedule: { every_secs: 600 }
config:
feed_url: https://example.com/feed.xml
max_per_tick: 5
message_template: "{title}\n{link}"
deliver: { channel: telegram, to: "1194292426" }
ETag from the previous response is sent as If-None-Match. 304 Not Modified produces a zero-cost tick.
webhook_poll
- id: ana_jira_assigned
kind: webhook_poll
agent: ana
schedule: { every_secs: 300 }
config:
url: https://company.atlassian.net/rest/api/3/search
method: GET
headers:
Authorization: "Bearer ${JIRA_TOKEN}"
Accept: "application/json"
items_path: "issues" # dotted path to the array; "" for root
id_field: "id" # field used for dedup
max_per_tick: 10
message_template: "[{key}] {fields}"
deliver: { channel: telegram, to: "1194292426" }
# SSRF guard — must opt in to hit private / loopback hosts:
# allow_private_networks: true
401 / 403 → Permanent. Any other 4xx → Permanent. 5xx →
Transient.
google_calendar
- id: ana_calendar_sync
kind: google_calendar
agent: ana
schedule: { every_secs: 300 }
config:
calendar_id: primary
skip_cancelled: true
message_template: "📅 {summary} — {start}\n{html_link}"
deliver: { channel: telegram, to: "1194292426" }
First tick captures nextSyncToken and dispatches nothing (baseline).
Subsequent ticks use syncToken=... and dispatch the diff. 410 Gone
(token expired) is classified Permanent — operator runs
agent pollers reset <id> to re-baseline.
Multi-job per built-in
Same agent + same kind, multiple jobs — completely independent. The
runner gives each its own cursor, breaker, schedule, metrics, and
pause/resume controls. The GoogleAuthClient is the only thing
shared (intentional, so quota and refresh costs aren't multiplied).
# Three Gmail polls for Ana, all independent
- id: ana_leads
kind: gmail
agent: ana
schedule: { every_secs: 60 }
config:
query: "is:unread label:lead"
deliver: { channel: whatsapp, to: "57300...@s.whatsapp.net" }
# …
- id: ana_invoices
kind: gmail
agent: ana
schedule: { every_secs: 600 }
config:
query: "is:unread label:invoice"
deliver: { channel: telegram, to: "1194292426" }
# …
- id: ana_alerts
kind: gmail
agent: ana
schedule: { cron: "0 */15 * * * *" }
config:
query: "is:unread from:monitor@infra.com"
deliver: { channel: telegram, to: "9876543210" }
# …
Pause ana_invoices independently with
agent pollers pause ana_invoices.
CLI
agent pollers list # plain table; --json for machine output
agent pollers show ana_leads # detail of one job
agent pollers run ana_leads # manual tick (bypasses schedule + lease)
agent pollers pause ana_invoices # paused = 1
agent pollers resume ana_invoices
agent pollers reset ana_calendar_sync --yes # destructive; clears cursor
agent pollers reload # re-read pollers.yaml + diff
The daemon must be running (CLI hits the loopback admin server at
127.0.0.1:9091).
Admin endpoints
GET /admin/pollers
GET /admin/pollers/<id>
POST /admin/pollers/<id>/run
POST /admin/pollers/<id>/pause
POST /admin/pollers/<id>/resume
POST /admin/pollers/<id>/reset
POST /admin/pollers/reload
reload returns a ReloadPlan JSON: { add, replace, remove, keep }.
Validation runs across every job in the new file before any task is
touched — a typo never knocks healthy siblings offline.
Agent tools
When the poller subsystem is up, every agent gets six LLM-callable
tools registered on its ToolRegistry:
| Tool | Effect |
|---|---|
pollers_list | List every job + status |
pollers_show | Inspect one job |
pollers_run | Trigger a tick out-of-band |
pollers_pause | Set paused = 1 |
pollers_resume | Set paused = 0 |
pollers_reset | Wipe cursor + errors (destructive) |
Each registered Poller impl can also expose per-kind custom tools
via Poller::custom_tools() — gmail ships gmail_count_unread out
of the box. See Build a poller.
Create / delete are intentionally not exposed: prompt-injection
could plant a webhook_poll aimed at internal infra. Operators
own pollers.yaml + agent pollers reload.
Failure-destination
- id: ana_leads
kind: gmail
# …
failure_to:
channel: telegram
to: "1194292426" # alerts on the operator's chat
When the per-job circuit breaker trips
(consecutive_errors >= breaker_threshold), the runner publishes a
text message to the configured channel (resolved via Phase 17 just
like the happy path) and records the timestamp for cooldown
gating. Cooldown is failure_alert_cooldown_secs global default,
overridable per job in a future revision.
Observability
Seven Prometheus series exposed under /metrics:
| Series | Type | Labels |
|---|---|---|
poller_ticks_total | counter | kind, agent, job_id, status={ok,transient,permanent,skipped} |
poller_latency_ms | histogram | kind, agent, job_id |
poller_items_seen_total | counter | kind, agent, job_id |
poller_items_dispatched_total | counter | kind, agent, job_id |
poller_consecutive_errors | gauge | job_id |
poller_breaker_state | gauge | job_id (0=closed, 1=half-open, 2=open) |
poller_lease_takeovers_total | counter | job_id |
Migrating from gmail-poller.yaml
The legacy crate nexo-plugin-gmail-poller keeps its YAML schema
but no longer drives its own loop. On boot the wizard
auto-translates every legacy job into a kind: gmail entry, folds
it into cfg.pollers.jobs, and logs a deprecation warn. Explicit
entries in pollers.yaml win on id collision so a manual migration
is never clobbered.
To migrate cleanly:
- Run
agent --check-configto print every translated id. - Copy each into
config/pollers.yamlunderpollers.jobs, adjusting theagent:field if the legacyagent_idwas inferred. - Delete
config/plugins/gmail-poller.yaml.
Anthropic / Claude
Native Anthropic client with multiple authentication paths: static API key, setup tokens, full OAuth PKCE subscription flow, or automatic import from the local Claude Code CLI.
Source: crates/llm/src/anthropic.rs, crates/llm/src/anthropic_auth.rs.
Phase 15 added the subscription flow end-to-end.
Configuration
# config/llm.yaml
providers:
anthropic:
api_key: ${ANTHROPIC_API_KEY:-}
base_url: https://api.anthropic.com
rate_limit:
requests_per_second: 2.0
auth:
mode: oauth_bundle
bundle: ./secrets/anthropic_oauth.json
Per-agent selection:
model:
provider: anthropic
model: claude-haiku-4-5
Authentication modes
auth.mode | Credential | Header |
|---|---|---|
static | api_key (sk-ant-…) | x-api-key: <key> |
setup_token | sk-ant-oat01-… (min 80 chars) | Authorization: Bearer <key> + anthropic-beta: oauth-2025-04-20 |
oauth_bundle | {access, refresh, expires_at} JSON | Authorization: Bearer <access> |
auto | tries all of the above in order | — |
auto resolution order
Used when auth.mode: auto or omitted:
flowchart TD
START[anthropic client build] --> B1{oauth_bundle<br/>file exists?}
B1 -->|yes| USE1[use OAuth bundle]
B1 -->|no| B2{Claude Code CLI<br/>credentials found?}
B2 -->|yes| USE2[import from<br/>~/.claude/.credentials.json]
B2 -->|no| B3{setup_token<br/>file exists?}
B3 -->|yes| USE3[use setup token]
B3 -->|no| B4{api_key<br/>set?}
B4 -->|yes| USE4[use static key]
B4 -->|no| FAIL([fail: no credentials])
OAuth bundle
The wizard runs a PKCE flow in the browser and writes the bundle to
./secrets/anthropic_oauth.json:
{
"access_token": "...",
"refresh_token": "...",
"expires_at": "2026-05-01T12:00:00Z"
}
- Refresh endpoint:
https://console.anthropic.com/v1/oauth/token - Refresh cadence: 60 seconds before
expires_at, background task POSTsgrant_type=refresh_token - Concurrency: all refreshes serialize behind a mutex
- Shared OAuth client id:
9d1c250a-e61b-44d9-88ed-5944d1962f5e - Stale-token handling: a 401 mid-flight marks the token stale so the next refresh fires immediately instead of waiting for the expiry window
CLI credentials import
If you're already running Claude Code CLI on the same host, the client
auto-detects and imports ~/.claude/.credentials.json. Zero config —
if it exists and is valid, it's used.
Tool calling
Native Anthropic shape:
- Tool definitions:
{name, description, input_schema} - Tool invocation:
tool_useblocks withid,name,input - Tool result:
tool_resultblocks correlated viatool_use_id
Streaming uses native SSE; a dedicated parser in
crates/llm/src/stream.rs handles message_start, content_block_*,
and message_delta events.
Error classification
| Response | Mapping | Behavior |
|---|---|---|
| 429 | LlmError::RateLimit { retry_after_ms } (fallback 60s) | Retried |
| 401 / 403 | LlmError::CredentialInvalid with context (API vs OAuth) | Marks OAuth token stale; fails fast so the operator sees it |
| 5xx | LlmError::ServerError | Retried |
| Other 4xx | LlmError::Other | Fail fast |
OAuth subscription request shape
Anthropic gates Opus 4.x and Sonnet 4.x behind a Claude-Code identity claim when the request is authenticated with a Bearer token (setup token or OAuth bundle). Without the claim, only Haiku passes — every other model returns a 4xx that surfaces as a vague "no quota" error.
When AnthropicAuth::is_subscription() is true (SetupToken or
OAuth variants), the client adds:
- Header
anthropic-beta: claude-code-20250219, oauth-2025-04-20, fine-grained-tool-streaming-2025-05-14(cache betas merged in on top). - Header
anthropic-dangerous-direct-browser-access: true. - Header
User-Agent: claude-cli/<version>. - Header
x-app: cli. - A first system block whose text is exactly:
You are Claude Code, Anthropic's official CLI for Claude.
The user's system_prompt (and any structured system_blocks) follow
the spoof block, preserving the original instructions verbatim.
User-Agent version: defaults to the value of
CLAUDE_CLI_DEFAULT_VERSION in crates/llm/src/anthropic_auth.rs.
Operators can override it without rebuilding by exporting:
export NEXO_CLAUDE_CLI_VERSION=2.1.99
The API-key path is unchanged — none of these headers or the spoof
block are added when AnthropicAuth::ApiKey is in use.
Mirrors OpenClaw's
anthropic-transport-stream.ts:558-641. Reference implementation lives inresearch/src/agents/.
Supported features
- Chat completions ✅
- Tool calling ✅
- Streaming (SSE) ✅
- Multimodal (images) ✅
- Prompt caching ✅ (via Anthropic beta headers)
- Extended thinking ✅ (model-dependent)
- OAuth subscription (Pro / Max plans) ✅ — Opus / Sonnet require the Claude-Code request shape documented above.
Prompt Cache Break Diagnostics (Phase 77.4)
Global detector (all providers/models) in
crates/core/src/agent/llm_behavior.rs:
- After each parsed response, the client compares
cache_read_input_tokensagainst the previous turn in the same session. - If cache-read drops by more than 50%, it emits a warning log:
llm.cache_break. - The log includes a
suspected_breakerhint:provider_swap,model_swap,system_prompt_mutation, orunknown.
Anthropic-specific enrichment in crates/llm/src/anthropic.rs:
- Emits
anthropic.cache_breakwith the same >50% drop trigger. - The log includes a
suspected_breakerhint based on request drift:model_swap,system_prompt_mutation,beta_header_drift, orunknown. - First turn is baseline only (no comparison/log).
Common mistakes
- Setup-token string under 80 chars. The setup-token validator refuses it at parse time. Make sure you pasted the full string.
api_key+oauth_bundleboth set. The auth mode wins. The static key is kept only as a fallback the auto-resolver may pick up if the bundle is missing.- Claude Code CLI credentials being used unintentionally. If
automode is on and you installed CLI on the host, that path wins beforeapi_key. Setauth.mode: staticto pin the static key.
OpenAI-compatible
Client for OpenAI itself and for any upstream that speaks the same wire: Ollama, Groq, OpenRouter, LM Studio, vLLM, Azure OpenAI, or your own proxy.
Source: crates/llm/src/openai_compat.rs.
Configuration
# config/llm.yaml
providers:
openai:
api_key: ${OPENAI_API_KEY:-}
base_url: https://api.openai.com/v1
rate_limit:
requests_per_second: 2.0
Per-agent:
model:
provider: openai
model: gpt-4o
Known-working upstreams
Point base_url at any of these and it works out of the box:
| Upstream | base_url |
|---|---|
| OpenAI | https://api.openai.com/v1 |
| Ollama | http://localhost:11434/v1 |
| Groq | https://api.groq.com/openai/v1 |
| OpenRouter | https://openrouter.ai/api/v1 |
| LM Studio | http://localhost:1234/v1 |
| vLLM | http://<host>:<port>/v1 |
| Azure OpenAI | Azure resource URL (watch for differences) |
| MiniMax (compat mode) | https://api.minimax.io |
Authentication
Single mode: static API key sent as Authorization: Bearer <key>.
Some upstreams ignore the key entirely (Ollama, local vLLM) — supply
any non-empty string to satisfy the config validator.
Features & gaps
| Feature | Status |
|---|---|
| Chat completions | ✅ |
| Tool calling | ✅ (OpenAI function-calling shape) |
| Streaming | ✅ |
tool_choice: auto | required | none | {type:function} | ✅ |
| JSON mode / structured outputs | upstream-dependent |
| Multimodal | upstream-dependent |
| Embeddings | supported for OpenAI proper; other upstreams may vary |
Feature gating when the upstream lacks support: we do not pre-probe
features — a call that requires a feature the upstream doesn't speak
will fail with the upstream's own error (typically a 400). The error
bubbles up as LlmError::Other and does not retry, so you notice
quickly.
Error classification
| Response | Mapping | Behavior |
|---|---|---|
| 429 | LlmError::RateLimit (fallback 30s) | Retried |
| 5xx | LlmError::ServerError | Retried |
| Other 4xx | LlmError::Other | Fail fast |
Common mistakes
- Trailing slash in
base_url. Some upstreams are lenient, some are not. Stick to the form shown in the table. - Using Azure OpenAI without the deployment path. Azure requires
an extra segment (
/openai/deployments/<name>/chat/completions) that the vanilla OpenAI path doesn't. Currently not supported out of the box; use a proxy or a custom provider if you need Azure. - Relying on JSON mode everywhere. Many local servers don't enforce schemas. Validate the response yourself when using Ollama / LM Studio for critical tool args.
DeepSeek
Connector for DeepSeek's hosted models. The API is OpenAI-compatible
end to end (same /v1/chat/completions shape, same SSE streaming,
same Bearer auth) so the connector is a thin factory that wraps
OpenAiClient with DeepSeek's default endpoint.
Source: crates/llm/src/deepseek.rs.
Configuration
# config/llm.yaml
providers:
deepseek:
api_key: ${DEEPSEEK_API_KEY}
# base_url defaults to https://api.deepseek.com/v1 when blank.
# Override only for self-hosted gateways or testing fixtures.
base_url: ""
rate_limit:
requests_per_second: 2.0
quota_alert_threshold: 100000
Pin the agent to it:
agents:
- id: ana
model:
provider: deepseek
model: deepseek-chat
Models
| Model id | Use case |
|---|---|
deepseek-chat | General-purpose. Supports tool calling. |
deepseek-reasoner | Long-form reasoning. No tool calling in current API revision. |
deepseek-reasoner agents must therefore leave allowed_tools empty
(or list only tools the agent never plans to invoke). Tool calls fired
against the reasoner endpoint return an error from upstream.
Streaming
Identical to OpenAI's SSE format, so OpenAiClient::chat_stream parses
it without per-provider code. nexo_llm_stream_ttft_seconds and
nexo_llm_stream_chunks_total Prometheus series labelled with
provider="deepseek" show up automatically.
Tool calling
deepseek-chat follows OpenAI's tool-calling spec verbatim. JSON
arguments deserialise the same way; parallel_tool_calls is honoured.
Rate limits
DeepSeek returns standard 429 with a retry-after header. The
existing retry plumbing (crates/llm/src/retry.rs) consumes that
header so 429s back off cleanly without touching the connector.
Quota / cost
DeepSeek's pricing is per-1M-tokens; the TokenUsage returned by
each ChatResponse is forwarded to the standard
agent_llm_tokens_total counter (labels: provider="deepseek",
model, usage_kind).
Known limitations
- No native embeddings client — DeepSeek does not currently
publish an embeddings endpoint. Use a different provider for
embedding_modelif your agent needs vector search. - Reasoner tool-call gap — see Models. Validate at boot
by leaving
allowed_tools: []on agents pinned todeepseek-reasoner. - Cache awareness — DeepSeek's KV-cache hit information is
surfaced through the same
cache_usagefield as the OpenAI client reports it.
See also
- OpenAI-compatible — same wire format, full notes on the underlying client.
- Rate limiting & retry — backoff policy.
Multi-instance providers + secret-backed keys
Phase 82.10.s ships a long-overdue split between factory type (the
crates/llm/src/<id>.rsclient the daemon dispatches against) and provider instance (the YAML key underllm.yaml.providers.*).Phase 82.10.t adds dynamic model discovery via
/v1/modelsso SPA wizards show the live list a key actually has access to instead of a hardcoded catalog that drifts.
Why
Pre-82.10.s, providers.minimax was both the YAML id AND the factory
name — there was exactly one MiniMax per daemon. Two problems:
- Two microapps in the same daemon couldn't have separate MiniMax
keys. The key was an env var (
MINIMAX_API_KEY), and env vars are process-global. Microapp B would overwrite microapp A's key. - A single tenant couldn't run two MiniMaxes with different keys for billing isolation between their own clients.
Post-82.10.s, the YAML can name as many instances of the same factory as the operator wants, each with its own key:
providers:
# Legacy single-instance path still works (factory_type omitted →
# the YAML key IS the factory id).
minimax:
api_key: ${MINIMAX_API_KEY}
base_url: https://api.minimax.chat/v1
# Multi-instance: name the instance whatever you want, point
# factory_type at a registered factory, supply a per-instance
# secret reference instead of a shared env var.
minimax-cliente-a:
factory_type: minimax
base_url: https://api.minimax.chat/v1
api_key_secret_id: LLM_MINIMAX_CLIENTE_A
minimax-cliente-b:
factory_type: minimax
base_url: https://api.minimax.chat/v1
api_key_secret_id: LLM_MINIMAX_CLIENTE_B
Agents then point at the instance id, not the factory:
agents:
ana:
model:
provider: minimax-cliente-a # ← instance id
model: MiniMax-M2.5
pedro:
model:
provider: minimax-cliente-b # ← different instance, different key
model: MiniMax-M2.5
Each agent dispatches against its own key. Quota / rate-limit / billing all separate.
API key sources — exactly one of three
LlmProviderConfig accepts the API key from one of three sources, and
the upsert RPC + boot resolver enforce exactly one:
| Source | Where it lives | When to use |
|---|---|---|
api_key (inline) | YAML literal — usually ${ENV_VAR} | Dev / single-tenant single-instance |
api_key_secret_id | Reference to <state_root>/secrets/<ID>.txt mode 0600 | Production multi-instance |
api_key_env (legacy) | Env var name — daemon resolves at boot | Pre-82.10.s back-compat |
Setting two of the above at once → loud boot failure with the provider id and the conflicting sources listed.
Boot resolution
After AppConfig::load, main.rs walks every provider instance (global
- tenant-scoped) and:
-
Resolves
api_keyviaLlmConfig::resolve_all_keys(&secrets).- Errors collected per-instance (not fail-fast) so the operator sees every broken provider in one diagnostic, not fix-restart-loop.
FsSecretsStoreimplsSecretsSource(syncread) so config-load reads<secrets_dir>/<id>.txtwithout async machinery.
-
Validates
factory_typeviaLlmRegistry::validate_config.- Each instance's resolved factory id (explicit
factory_typeor fallback to the YAML key) MUST be a registered factory. - Aggregates errors the same way; loud boot fail beats a runtime LLM dispatch error mid-traffic.
- Each instance's resolved factory id (explicit
Sample boot failure:
Error: LLM provider API-key resolution failed for 2 instance(s):
· minimax-cliente-a: secret 'cliente-a-key' read failed: No such file
· openai: no API key source (set `api_key` inline or `api_key_secret_id`)
Admin RPC — nexo/admin/llm_providers/upsert
The admin handler now accepts:
{
"id": "minimax-cliente-a",
"factory_type": "minimax", // optional — defaults to id
"base_url": "https://api.minimax.chat/v1",
"api_key_secret_value": "sk-...", // write-through (audit-redacted)
// mutually exclusive with:
// "api_key_env": "MINIMAX_API_KEY" // legacy
// "api_key_secret_id": "PRE_STAGED_ID" // pre-staged via secrets/write
"tenant_id": "acme" // optional tenant scope
}
When api_key_secret_value is supplied, the daemon:
- Stamps the value into the SecretsStore under a derived id
(
LLM_<INSTANCE_UPPERCASE>) — atomic file write mode 0600. - Sets
api_key_secret_id: LLM_<INSTANCE>on the YAML. - Triggers reload signal so the rebuilt
LlmRegistrypicks up the key without daemon restart.
Audit redactor masks api_key_secret_value as <redacted> so the
cleartext only persists in the SecretsStore, never on disk in
admin_audit.db. api_key_secret_id (a name, not a value) stays
visible for diagnostics.
Admin RPC — nexo/admin/llm_providers/catalog
Returns the list of registered factories with their default base URL
- env var + curated model list. SPA wizards use this to render strict provider/model dropdowns without a hardcoded catalog drifting from the framework. Plugin-registered remote providers (Phase 81.25) appear here too as long as they registered before bootstrap.
Admin RPC — nexo/admin/llm_providers/probe
Phase 82.10.t extended the probe response with a model_names field
parsed from data[].id of an OpenAI-compat /v1/models payload:
{
"ok": true,
"status": 200,
"latency_ms": 142,
"model_count": 47,
"model_names": ["gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "..."]
}
model_names is null when:
- The provider doesn't expose
/v1/models(Anthropic, Gemini). - The body isn't OpenAI-compat shaped.
- No
data[].idstrings could be extracted.
UI fallback in that case: the static factory catalog from
llm_providers/catalog. Names are capped at 200 to bound RPC payload
against pathological providers returning thousands of variants.
Frontend behaviour (agent-creator microapp ≥ 0.0.44)
The Agents page surfaces both flows:
-
Top section — list of configured LLM instances. "Nueva instancia" CTA opens a modal:
- Factory dropdown (from
llm_providers/catalog). - Instance id (validates slug, rejects duplicates client-side).
- Base URL auto-filled from the catalog, editable.
- API key (password input) — write-through via
api_key_secret_value.
- Factory dropdown (from
-
Edit modal per agent — provider dropdown lists the configured instances (
minimax-cliente-a,minimax-cliente-b), not the factory types. Model dropdown:- Probes the instance's
/v1/modelsafter open. - Live names → green "✓ N modelos en vivo" indicator.
- Probe failure / non-OpenAI shape → static catalog fallback with
a hint explaining the provider doesn't expose
/v1/models. - 60 s in-memory cache per instance; concurrent calls deduped.
- Probes the instance's
Edge cases — defensive design notes
- Empty
factory_type: ""is treated as absent (defensive against YAML typos that would otherwise match an empty-string factory). - Empty secret value in the SecretsStore is treated as
NotFound(an operator'secho "" > filedoesn't half-succeed). - Same
factory_type+ same key across instances is allowed — the operator owns fair-share quota when they explicitly clone. - Tenant-scoped instance + global instance with same id — Phase 83.8.12 already wins-tenant; this layer doesn't change that.
- Plugin-registered remote providers appear in
llm_providers/catalogafter theirregistercall. The catalogue snapshot used by admin RPC is taken atAdminRpcBootstrap::buildtime — providers registered after that don't show up until restart.
Migration from legacy YAML
No migration needed — yamls without factory_type keep working under
the back-compat path (instance id IS the factory id). Operators only
touch their YAML when they want a second instance of the same factory
with a different key.
Credential schema (Phase 82.10.u)
Each LLM provider factory declares its credential field schema. The admin RPC
llm_providers/catalogsurfaces it; the SPA wizard renders one input per descriptor; the upsert handler validates the operator's payload against the same schema before persistence. Single source of truth — no drift.
Why
Pre-82.10.u, the operator's llm_providers/upsert always boiled
down to a single api_key. Two problems:
- MiniMax also needs
group_id— without it,/v1/modelsreturns the provider's empty default list and the SPA shows nothing. There was no way to surface the field through the admin RPC. - Anthropic supports OAuth — but the wire shape couldn't express "auth_mode dropdown + setup_token field that only appears when mode=setup_token + bundle JSON paste alternative".
Phase 82.10.u introduces a declarative CredentialFieldDescriptor
shape every factory advertises. The SPA renders dynamically. The
handler validates the same schema server-side.
Schema shape
#![allow(unused)] fn main() { pub struct CredentialFieldDescriptor { pub name: String, // yaml key + secret-store id suffix pub label: String, // operator-facing pub kind: FieldKind, // Text | Password | Select { options } pub required: bool, pub secret: bool, // → SecretsStore vs yaml inline pub default: Option<String>, pub help: Option<String>, pub validation: Option<FieldValidation>, // Regex | Length pub depends_on: Option<DependsOn>, // visibility predicate } }
Persistence rule
secret == true→ value lands in the SecretsStore under derived idLLM_<INSTANCE>_<FIELD_UPPER>. Yaml carries only<field>_secret_idreference.secret == false→ value lands inline inllm.yaml.providers.<id>.<field>.
Validation
| Rule | Server check | SPA check |
|---|---|---|
required + depends_on.satisfied | MISSING_FIELD if absent | red border on blur |
Regex { pattern, hint } | INVALID_FORMAT with hint | hint shown inline |
Length { min, max } | INVALID_FORMAT length n not in [min,max] | char count |
Per-factory schemas
| Factory | Fields | Auth modes | /v1/models? |
|---|---|---|---|
minimax | api_key (secret) · group_id (10-20 digits) · region (select) · key_kind (api/plan) | api_key, oauth_device_code | ✓ |
anthropic | auth_mode (select) · api_key (depends_on api_key) · setup_token (depends_on setup_token) | api_key, setup_token, oauth_auth_code, oauth_bundle_import | ✗ (static catalog) |
openai / deepseek / gemini | api_key | api_key | ✓ / ✓ / via Gemini-specific path |
Wire flow — operator creates a MiniMax instance
// 1) GET catalog
"nexo/admin/llm_providers/catalog" → {
"providers": [{
"id": "minimax",
"credential_schema": [
{"name":"api_key","kind":{"type":"password"},"required":true,"secret":true,...},
{"name":"group_id","kind":{"type":"text"},"required":true,"secret":false,
"validation":{"type":"regex","pattern":"^[0-9]{10,20}$","hint":"10-20 digits"}},
{"name":"region","kind":{"type":"select","options":[...]},"default":"global"},
{"name":"key_kind","kind":{"type":"select","options":[...]},"default":"api"}
],
"supported_auth_modes":["api_key","oauth_device_code"],
"supports_models_probe": true
}, ...]
}
// 2) Validate without persisting (Phase 82.10.u probe_draft)
"nexo/admin/llm_providers/probe_draft" {
"factory_type":"minimax",
"base_url":"https://api.minimax.io/v1",
"auth_mode":"api_key",
"fields":{"api_key":"sk-...", "group_id":"1234567890123", "region":"global", "key_kind":"api"}
}
→ { "ok":true, "status":200, "model_count":12, "model_names":[...] }
// 3) Persist
"nexo/admin/llm_providers/upsert" {
"id":"minimax-cliente-a",
"factory_type":"minimax",
"base_url":"https://api.minimax.io/v1",
"auth_mode":"api_key",
"fields":{ /* same as probe */ }
}
→ summary
Error taxonomy
LlmProviderError rides in AdminRpcError::data so the SPA
discriminates by code:
#![allow(unused)] fn main() { pub enum LlmProviderError { MissingField { field }, UnknownField { field }, InvalidFormat { field, hint }, InvalidAuthMode { factory, mode }, SessionExpired, // OAuth — TTL elapsed SessionNotFound, // OAuth — never issued or replayed OAuthExchangeFailed { upstream_status, message }, ProbeFailed { upstream_status, message }, YamlWriteFailed { detail }, SecretWriteFailed { detail }, } }
Audit
Schema-driven payloads are walked by redact_secret_keys so any
field whose name matches api_key, setup_token,
access_token, refresh_token, oauth_bundle, password,
token, secret is masked as <redacted>. Non-secret
identifiers (group_id, region, key_kind) stay literal in the
audit log for diagnostics.
Back-compat
- Pre-82.10.u microapps that send
api_key_env/api_key_secret_valuekeep working — the handler picks the legacy path whenfieldsis empty. - Pre-82.10.u daemons that don't carry
credential_schemain the catalog response → SPA falls back to the legacy single-api_key UI (default[]from the optional?? []).
OAuth flows (Phase 82.10.u)
Two-step admin RPC flow that lets an operator authorise a Claude subscription or MiniMax Token Plan from the SPA wizard without the SPA ever touching the PKCE verifier or refresh tokens.
Why an admin RPC and not just stdin
Pre-82.10.u, OAuth lived inside crates/setup/src/services/:
interactive stdin paste, claude login style. That works for
single-tenant operators who own the daemon shell, but for a
multi-tenant SaaS the operator is a browser tab — there is no
stdin.
Phase 82.10.u extracts the PKCE primitives (crates/llm-auth)
and exposes them over admin RPC so a SPA can drive the same flow.
The framework owns the verifier across the two HTTP requests via
InMemoryVerifierStore; the SPA only sees opaque session ids.
Endpoints
nexo/admin/llm_providers/oauth_start
// req
{
"factory_type": "anthropic",
"auth_mode": "oauth_auth_code",
"tenant_id": null
}
// resp (auth_code)
{
"session_id": "f2c1...",
"authorize_url": "https://claude.ai/oauth/authorize?...",
"expires_at_ms": 1714776600000,
"flow_kind": "auth_code"
}
// resp (device_code) — for `(minimax, oauth_device_code)`
{
"session_id": "9a3e...",
"authorize_url": "https://api.minimax.io/...",
"expires_at_ms": ...,
"flow_kind": "device_code",
"user_code": "ABC123",
"polling_interval_ms": 2000
}
nexo/admin/llm_providers/oauth_finish
// req — auth_code
{
"session_id": "f2c1...",
"instance_id": "anthropic-personal",
"code": "abc#def" // operator pasted from callback page
}
// req — device_code (no code; daemon polls)
{
"session_id": "9a3e...",
"instance_id": "minimax-cliente-a"
}
// resp
{
"ok": true,
"account_email": "user@example.com", // Anthropic only
"expires_at_ms": 1714780200000,
"secret_id": "LLM_ANTHROPIC_PERSONAL_OAUTH_BUNDLE"
}
State machine
[oauth_start] [oauth_finish]
───────────── ─────────────
(10 min TTL)
PKCE gen → store.put → ........... → take → exchange/poll → bundle → SecretsStore
↓
yaml patch:
auth.mode = oauth_bundle
auth.bundle = <secret path>
↓
reload_signal()
Defensive design
| Concern | Mitigation |
|---|---|
| CSRF | state is checked against the stashed PKCE state inside exchange_code |
| Replay | take() removes the entry BEFORE exchange; second call → SESSION_NOT_FOUND |
| Expired sessions | peek_status discriminates Live / Expired / Missing so the SPA gets accurate diagnostics |
| Memory bloat | Background sweep every 60 s drops stale entries; capacity 100 with FIFO eviction |
| Verifier leak | Verifier never travels to the SPA — only opaque session_id |
| Audit | code, access_token, refresh_token, oauth_bundle masked via redact_secret_keys |
Client SDK
crates/llm-auth exposes the primitives so any consumer (admin
RPC, CLI wizard, future MCP server) shares the same crypto + HTTP
shape:
#![allow(unused)] fn main() { use nexo_llm_auth::{gen_pkce, StateEncoding}; use nexo_llm_auth::anthropic::{build_authorize_url, exchange_code, TOKEN_URL}; let pkce = gen_pkce(StateEncoding::HexOnly); let url = build_authorize_url(&pkce); // ... operator pastes `<code>#<state>` ... let bundle = exchange_code(&pkce, &code, &state, TOKEN_URL).await?; }
CLI flow (unchanged)
agent setup anthropic → oauth_login mode still uses
crates/setup/src/services/anthropic_oauth.rs which now wraps
the same nexo-llm-auth primitives. Operators with shell access
keep the stdin paste UX.
Microapp wizard
The SPA-side UI lives in
agent-creator-microapp/frontend/src/components/OAuthPane.tsx
and the zustand store
agent-creator-microapp/frontend/src/lib/oauthFlow.ts. State
machine: idle → starting → awaiting_user → exchanging → success | error. Auth-code variant renders a paste box; device-code
variant renders the user_code + verification_uri with a Confirm
button (the daemon polls upstream).
Rate limiting & retry
Every LLM provider client sits behind a token bucket and a bounded retry policy with decorrelated jittered exponential backoff. This page is the definitive reference for those two mechanisms.
Source: crates/llm/src/retry.rs, crates/llm/src/rate_limiter.rs,
crates/llm/src/quota_tracker.rs.
Rate limiter
Token bucket, acquired before every outbound request.
interval = 1 / requests_per_second- One token per request
- Bucket fully refills after
intervalper slot - Per-provider, per-agent — each client has its own bucket, so one noisy agent can't starve another even when they share a provider
rate_limit:
requests_per_second: 2.0
quota_alert_threshold: 100000 # optional
At 2.0 rps, the bucket tops up a slot every 500 ms. A burst of 3
requests will wait briefly on the third.
Quota tracker
Optional. When a provider returns remaining-quota info (header,
response body), quota_tracker records it via record_usage() on the
token response. If the remaining crosses quota_alert_threshold, a
structured warn log is emitted:
WARN quota threshold crossed provider=minimax remaining=99500 threshold=100000
Pair with a Prometheus log-scraping rule for an alert.
Retry policy
Retries live above the circuit breaker. They handle transient failures that don't warrant flipping the breaker.
| Error class | Max attempts | Backoff curve |
|---|---|---|
| 429 (rate limit) | 5 | max(retry-after, jittered_backoff) |
| 5xx (server) | 3 | jittered_backoff |
| 401 (auth) | 1 refresh + 1 retry | (internal to the client) |
| Other 4xx | 0 (fail fast) | — |
Decorrelated jittered backoff
Not simple exponential — the next backoff is a uniform random draw in a growing range:
next = uniform(base, max(base, last × multiplier))
Defaults from llm.yaml retry block:
| Field | Default |
|---|---|
initial_backoff_ms | 1000 |
max_backoff_ms | 60000 |
backoff_multiplier | 2.0 |
Why decorrelated jitter: multiple clients hitting the same 429 don't re-fire in lockstep. Desynchronization is built-in.
flowchart LR
REQ[request] --> API{API response}
API -->|200| OK[return ChatResponse]
API -->|429| RL[RateLimit]
API -->|5xx| SE[ServerError]
API -->|401| AU[CredentialInvalid]
API -->|4xx| F[Other fail fast]
RL --> D1{attempts<br/>< 5?}
SE --> D2{attempts<br/>< 3?}
AU --> REF[auth refresh<br/>+ single retry]
D1 -->|yes| BO1[wait max(retry_after,<br/>jittered_backoff)]
D1 -->|no| F
D2 -->|yes| BO2[wait jittered_backoff]
D2 -->|no| F
BO1 --> REQ
BO2 --> REQ
REF --> REQ
Error classification per provider
The providers classify HTTP responses into a shared LlmError so the
retry layer can be common code:
| HTTP | LlmError variant | Retried? |
|---|---|---|
| 200 | Ok(ChatResponse) | — |
| 429 | RateLimit { retry_after_ms } | ✅ up to 5 |
| 5xx | ServerError { status, body } | ✅ up to 3 |
| 401 / 403 | CredentialInvalid | ❌ (client handles refresh internally) |
| Other 4xx | Other | ❌ |
Tuning
- Bursty workloads: bump
requests_per_secondcautiously; the upstream's own rate limits won't move, so you'll just pay more 429s to find the ceiling. - Flaky networks: raise
max_attemptsfor 5xx; keepmax_backoff_msbounded so slow agents don't spiral. - Subscription plans: lower
requests_per_secondto keep daily usage under caps; pair withquota_alert_threshold.
See also
Telegram
Bot API channel with long-polling intake, multi-bot routing, full send/reply/reaction/edit/location/media tool surface, and optional voice auto-transcription.
Source: standalone repo at
nexo-rs-plugin-telegram
(extracted from crates/plugins/telegram/ per Phase 81.18; see
PHASES.md
for the migration notes). The crate ships as a lib + bin Shape B
package: the lib re-exports TelegramPlugin for in-process
consumers (an Android embedded host tomorrow), and the bin is
the subprocess entrypoint the daemon spawns per
cfg.plugins.telegram entry.
Install (Phase 81.18.b.1 — operator action required)
The daemon stopped constructing TelegramPlugin in-tree as of
Phase 81.18.b.1; it now spawns the standalone subprocess binary
per cfg entry. Operators with cfg.plugins.telegram populated
must install the binary and surface its directory through
plugins.discovery.search_paths before starting the daemon, or
the discovery walker logs a clear warning and the plugin never
boots:
# Recommended — download the pre-built tarball from the plugin's
# GitHub Releases into the daemon's plugin dir:
nexo plugin install lordmacu/nexo-plugin-telegram
nexo plugin list
# Or build from source:
cargo install --git https://github.com/lordmacu/nexo-plugin-telegram
nexo plugin install lands the binary + plugin.toml under
<state_dir>/plugins/telegram/, which the daemon's discovery
walker scans by default — no search_paths edit needed. If you
build with cargo install --git instead, point discovery at the
install dir in agents.yaml:
plugins:
discovery:
search_paths:
- ~/.cargo/bin # or wherever you installed the binary
Each cfg.plugins.telegram[] entry maps to one subprocess; per-
instance state (offset_path, media_dir, instance topic
suffix, bot token) is seeded into the child via
NEXO_PLUGIN_TELEGRAM_* env vars at spawn time so multi-bot
operators get true process isolation.
Topics
| Direction | Subject | Notes |
|---|---|---|
| Inbound | plugin.inbound.telegram | Legacy single-bot |
| Inbound | plugin.inbound.telegram.<instance> | Per-bot routing |
| Outbound | plugin.outbound.telegram | Legacy single-bot |
| Outbound | plugin.outbound.telegram.<instance> | Per-bot routing |
Each instance subscribes only to its own outbound topic, so two bots in the same process don't cross-wire.
Config
# config/plugins/telegram.yaml
telegram:
token: ${file:./secrets/telegram_token.txt}
instance: sales_bot
polling:
enabled: true
interval_ms: 25000
offset_path: ./data/media/telegram/sales_bot.offset
allowlist:
chat_ids: [] # empty = accept all
auto_transcribe:
enabled: false
command: ./extensions/openai-whisper/target/release/openai-whisper
language: es
bridge_timeout_ms: 120000
Key fields:
| Field | Default | Purpose |
|---|---|---|
token | — (required) | Bot API token from @BotFather. |
instance | None | Label for multi-bot routing. Unlabelled keeps the legacy bare topic. |
allow_agents | [] | Agents permitted to publish from this bot. Empty = accept any agent holding a resolver handle. Defense-in-depth for the per-agent credentials binding. |
polling.enabled | true | Long-polling intake. Webhook not yet supported. |
polling.interval_ms | 25000 | Long-poll timeout hint. Telegram clamps to [1 s, 50 s]. |
polling.offset_path | ./data/media/telegram/offset | File to persist update offset across restarts. |
allowlist.chat_ids | [] | Numeric chat ids allowed. Empty = accept all. |
auto_transcribe.enabled | false | Voice → text. |
auto_transcribe.command | ./extensions/openai-whisper/.../openai-whisper | Path to whisper binary. |
bridge_timeout_ms | 120000 | Handler deadline before a bridge_timeout event fires. |
Auth
Single mode: static bot token. No OAuth. Store it under
./secrets/ and reference via ${file:...}.
flowchart LR
SETUP[agent setup] --> ASK[ask for bot token]
ASK --> F[./secrets/telegram_token.txt]
F -.->|${file:...}| CFG[config/plugins/telegram.yaml]
CFG --> RUN[runtime: HTTP Bot API with long-poll]
Tools exposed to the LLM
| Tool | Notes |
|---|---|
telegram_send_message | Send text to chat id (negative for groups/channels). |
telegram_send_reply | Quote a specific prior message. |
telegram_send_reaction | Emoji on a message. |
telegram_edit_message | Modify a prior message's text. |
telegram_send_location | GPS coordinates. |
telegram_send_media | File upload with caption and mime hint. |
All tools enforce outbound_allowlist.telegram per binding.
Event shapes
// message
{
"kind": "message",
"from": "12345",
"chat": "12345",
"chat_type": "private",
"text": "hi",
"reply_to": null,
"is_group": false,
"timestamp": 1714000000,
"msg_id": "42",
"username": "jdoe",
"media": [],
"latitude": null,
"longitude": null,
"forward": null
}
// media item (inside `media`)
{
"kind": "voice" | "photo" | "video" | "document" | "audio",
"local_path": "./data/media/telegram/....ogg",
"file_id": "AgACAgEA...",
"mime_type": "audio/ogg",
"duration_s": 4,
"width": null,
"height": null,
"file_name": null
}
// callback_query (inline-keyboard button press, auto-ACKed)
{"kind": "callback_query", "from": "...", "chat": "...", "data": "buy"}
// chat_membership
{"kind": "chat_membership", "chat": "...", "status": "added" | "kicked" | ...}
// lifecycle
{"kind": "connected" | "disconnected"}
{"kind": "bridge_timeout", "msg_id": "...", "waited_ms": ...}
Forwarded messages include a forward object:
"forward": {
"source": "user" | "channel" | "chat",
"from_user_id": 12345,
"from_chat_id": null,
"date": 1714000000
}
Gotchas
- Webhook mode is not supported yet. Long-polling only.
polling.interval_msis clamped by Telegram. Values outside [1000, 50000] get capped by the server side; default 25000 is a good middle ground.- Negative chat ids are groups/channels. Telegram uses negative ids for group chats; positive for private. Don't strip the sign.
- Auto-transcribe requires the whisper skill extension. The
command path must point at a working binary, otherwise inbound
voice messages arrive without
text.
Email plugin
Multi-account IMAP/SMTP channel for Nexo agents. Receives messages
through IMAP IDLE (with a 60 s polling fallback for servers that
don't speak IDLE), sends through SMTP under a circuit-breaker, and
exposes six tools (email_send, email_reply, email_archive,
email_move_to, email_label, email_search) so an agent can read
and act on a mailbox.
Status (Phase 81.20.x shipped 2026-05-16). Email is now a standalone subprocess plugin distributed via crates.io. Install with
cargo install nexo-plugin-email; the daemon's binary-mode discovery walker auto-detects the binary, probes--print-manifest, and wires all five auto-discovery stages (pairing adapter, HTTP routes, admin RPC, Prometheus metrics, dashboard) without any daemon-side code change. The daemon binary no longer compilesnexo-plugin-emailin-tree (cargo tree -i nexo-plugin-emailreturns "did not match any packages").
Install
cargo install nexo-plugin-email # latest crates.io release
The binary lands in $HOME/.cargo/bin/nexo-plugin-email. The
daemon's PluginDiscoveryConfig::default() already includes that
directory in its search_paths, so a fresh nexo daemon boot
finds the plugin without manifest editing. The walker spawns
nexo-plugin-email --print-manifest, captures the bundled TOML,
and registers the plugin's 12 tools + 5 manifest sections via the
generic auto-discovery contract.
If your environment hardens against arbitrary binary execution
during boot, set
plugins.discovery.auto_detect_binaries: false in
config/discovery.yaml and add an explicit
nexo-plugin.toml reference under search_paths instead.
Configuration
config/plugins/email.yaml — multi-account schema. Credentials live
in nexo-auth (Phase 17), not in this YAML; see Per-account
credentials below.
email:
enabled: true
max_body_bytes: 32768 # body_text truncation
max_attachment_bytes: 26214400 # 25 MiB; oversized attachments are
# written truncated and flagged
attachments_dir: data/email-attachments
outbound_queue_dir: data/email-outbound
poll_fallback_seconds: 60 # used when IDLE isn't supported
idle_reissue_minutes: 28 # < RFC 2177's 29-minute ceiling
spf_dkim_warn: true # boot-time DNS check, non-fatal
loop_prevention:
auto_submitted: true # RFC 3834
list_headers: true # List-Id / List-Unsubscribe / Precedence
self_from: true # bounce-back from our own outbound
accounts:
- instance: ops
address: ops@example.com
provider: custom # gmail | outlook | yahoo | icloud | custom
imap: { host: imap.example.com, port: 993, tls: implicit_tls }
smtp: { host: smtp.example.com, port: 587, tls: starttls }
folders:
inbox: INBOX
sent: Sent
archive: Archive
filters:
from_allowlist: []
from_denylist: []
Topics: plugin.inbound.email.<instance> (parsed inbound),
plugin.outbound.email.<instance> (commands you publish to send),
plugin.outbound.email.<instance>.ack (per-message ack), and
email.bounce.<instance> (DSNs).
Per-account credentials
secrets/email/<instance>.toml — chmod 0o600 enforced at boot.
Three auth kinds are supported.
# Password (app password works fine for Outlook / iCloud / Yahoo).
[auth]
kind = "password"
username = "ops@example.com"
password = "${EMAIL_OPS_PASSWORD}"
# Pre-issued OAuth2 bearer (bring-your-own-token).
[auth]
kind = "oauth2_static"
username = "ops@gmail.com"
access_token = "${EMAIL_OPS_TOKEN}"
refresh_token = "${EMAIL_OPS_REFRESH}" # optional
expires_at = 1735689600 # optional unix sec
# Reuse an account already in `config/plugins/google-auth.yaml`.
[auth]
kind = "oauth2_google"
username = "ops@gmail.com"
google_account_id = "ops"
${ENV} placeholders are resolved at boot via
nexo_config::env::resolve_placeholders. The OAuth2-Google variant
delegates token reads to the Google credential store and shares
its per-account refresh mutex so concurrent IMAP IDLE workers
never race a token rotation.
Provider auto-detect
The setup helper provider_hint(domain) recognises five families
out of the box:
| Domain | Provider | IMAP host | SMTP host |
|---|---|---|---|
gmail.com, googlemail.com | Gmail | imap.gmail.com:993 | smtp.gmail.com:587 |
outlook.com, hotmail.com, live.com, msn.com | Outlook | outlook.office365.com:993 | smtp.office365.com:587 |
yahoo.com, yahoo.co.uk, ymail.com, rocketmail.com | Yahoo | imap.mail.yahoo.com:993 | smtp.mail.yahoo.com:587 |
icloud.com, me.com, mac.com | iCloud | imap.mail.me.com:993 | smtp.mail.me.com:587 |
| anything else | Custom | (prompt) | (prompt) |
Gmail addresses also get a suggest_oauth_google = true hint so
the wizard offers to reuse google-auth.yaml instead of asking
for an app password.
Tools
The agent gets six tools when the email plugin is active:
| Tool | Purpose |
|---|---|
email_send | Send a new message. from is pinned to the account address (anti-spoof). |
email_reply | Fetch the parent by UID, derive recipients (reply_all adds parent.To/Cc minus own), inherit In-Reply-To / References. |
email_archive | UID MOVE to the configured archive folder; falls back to COPY + STORE \Deleted + EXPUNGE. |
email_move_to | Same as archive but to an arbitrary folder (no auto-create). |
email_label | Gmail-only: STORE +X-GM-LABELS / -X-GM-LABELS. Errors on non-Gmail. |
email_search | Portable JSON DSL → IMAP SEARCH atoms. Default limit 50, max 200. |
Every result is wrapped in a { ok: bool, ... } envelope. Errors
become { ok: false, error: "..." } rather than thrown exceptions
so the agent doesn't have to branch on exception types.
email_search query shape:
{
"instance": "ops",
"folder": "INBOX",
"query": {
"from": "alice@x", "to": "bob@x",
"subject": "report", "body": "kpi",
"since": "2024-01-01", "before": "2024-12-31",
"unseen": true, "seen": false
},
"limit": 50
}
User-controlled strings pass through imap_quote (RFC 3501
quoted-string + CR/LF collapse) before reaching the wire — that's
the security boundary against atom injection.
Outbound attachments are referenced by file path; the dispatcher
reads the bytes at enqueue time so a missing file fails fast with
ack: Failed instead of parking a doomed job:
{
"instance": "ops",
"to": ["alice@x"],
"subject": "Report",
"body": "see attached",
"attachments": [
{ "data_path": "/tmp/q3.pdf", "filename": "q3.pdf" }
]
}
Inbound events
Published as JSON on plugin.inbound.email.<instance>:
{
"account_id": "ops@example.com",
"instance": "ops",
"uid": 42,
"internal_date": 1700000000,
"raw_bytes": "<.eml bytes (binary-safe via serde_bytes)>",
"meta": {
"message_id": "<abc@x>",
"in_reply_to": "<parent@x>",
"references": ["<root@x>", "<parent@x>"],
"from": { "address": "alice@x", "name": "Alice Doe" },
"to": [{ "address": "ops@example.com" }],
"cc": [],
"subject": "Re: hi",
"body_text": "...",
"body_html": null,
"date": 1700000000,
"headers_extra": { "list-id": "<l@x>" },
"body_truncated": false
},
"attachments": [
{
"sha256": "abc...",
"local_path": "data/email-attachments/abc...",
"size_bytes": 4096,
"mime_type": "application/pdf",
"filename": "report.pdf",
"disposition": "attachment",
"truncated": false
}
],
"thread_root_id": "<root@x>"
}
thread_root_id is the canonical session key — pass it through
session_id_for_thread() (UUIDv5) to bridge into nexo-core's
session map.
Bounce events
Delivery reports never reach the LLM as conversational content.
They publish on email.bounce.<instance>:
{
"account_id": "ops@example.com",
"instance": "ops",
"original_message_id": "<our-outbound@example.com>",
"recipient": "ghost@unknown.com",
"status_code": "5.1.1",
"action": "failed",
"reason": "smtp; 550 5.1.1 user unknown",
"classification": "permanent"
}
classification follows SMTP convention: 5.x.x → permanent,
4.x.x → transient, anything else → unknown. The detector
fires on a Content-Type: multipart/report; report-type=delivery- status envelope; legacy Postfix / sendmail bounces without that
marker are caught via a From localpart heuristic
(MAILER-DAEMON, mail-daemon, mail.daemon, postmaster).
Loop-prevention
After parse, before publish, the worker walks LoopPreventionCfg
in priority order and short-circuits on the first match:
| Reason | Trigger |
|---|---|
auto_submitted | Auto-Submitted header is anything other than no (RFC 3834). |
list_mail | List-Id or List-Unsubscribe present (RFC 2369). |
precedence_bulk | Precedence: bulk|junk|list (RFC 2076). |
self_from | Inbound From matches the account's own address. |
dsn_inbound | parse_bounce returned Some (handled before loop walk). |
Each suppressed message advances the IMAP cursor — it has been processed, just not surfaced.
SPF / DKIM boot warns
When spf_dkim_warn: true, each account triggers a 3 s
non-blocking DNS lookup at start. WARN lines are
operator-actionable:
| Tag | Means |
|---|---|
email.spf.missing | No v=spf1 TXT record at the apex of the From domain. |
email.spf.misalignment | SPF policy exists but doesn't authorise the configured SMTP host. |
email.dkim.missing | No TXT at default._domainkey.<domain>. Try selectors default, google, selector1, mail. |
email.spf_dkim.dns_unavailable | The DNS lookup itself failed. Often transient. |
DMARC, multi-selector DKIM rotation, and signature verification are deliberately out of scope for v1.
Troubleshooting
email.idle.unsupported— the server doesn't advertise IDLE; the worker is permanently in 60 s polling mode. Yahoo Plus and some legacy IMAP servers behave this way.email.uidvalidity.changed— the mailbox was recreated server-side; the cursor reset tolast_uid=0and every existing message will be processed again.- Outbound DLQ growing — inspect
data/email-outbound/<instance>.dlq.jsonl. After 5 transient attempts (or any 5xx) jobs land here; there's no auto-purge. email.auth.xoauth2_failed— the OAuth2 token was rejected. The worker retries once with a forced refresh; if it still fails the SMTP / IMAP circuit-breaker opens.EMAIL_INSECURE_TLS=1— disables TLS cert verification. Logged at WARN; only safe for fake servers / loopback.
Limitations
| Deferred | Tracked in |
|---|---|
| Persistent bounce history | proyecto/FOLLOWUPS.md |
| Interactive setup wizard | proyecto/FOLLOWUPS.md |
| greenmail e2e test harness | proyecto/FOLLOWUPS.md |
| Email-specific Prometheus metrics | proyecto/FOLLOWUPS.md |
| Phase 16 binding-policy auto-filter | proyecto/FOLLOWUPS.md |
| HTML body in outbound | (text/plain only in v1) |
.ics calendar invites | Phase 65 |
| Vision OCR over attached images | Phase 49 |
Deployment (Phase 81.19.b)
The email plugin is shipped as a standalone repo:
nexo-rs-plugin-email
(nexo-plugin-email v0.1.2+ on crates.io). The crate is dual-mode:
| Mode | Used for | Wire path |
|---|---|---|
| In-process | Default — daemon registers a singleton factory | factory_registry.register("email", email_plugin_factory(...)) |
| Subprocess | Operator drops manifest in search_paths and removes the in-tree factory | discovery walker auto-spawns the binary via JSON-RPC stdio |
By default the daemon runs the email plugin in-process, exactly
as before the extract. The factory wins over discovery's
auto-subprocess fallback (init_loop.rs:417), so an email manifest
in plugins.discovery.search_paths does NOT spawn the subprocess
unless the operator strips the in-tree factory registration.
Subprocess opt-in (advanced)
For deployments that want process-level isolation of the IMAP/SMTP work, install the binary and remove the in-tree factory:
cargo install nexo-plugin-email
mkdir -p ~/.config/nexo/plugins.d/
cp $(which nexo-plugin-email) ~/.config/nexo/plugins.d/
# Copy the manifest from $CARGO_HOME/.../nexo-plugin-email-0.1.2/
# nexo-plugin.toml into the same dir.
Then in agents.yaml:
plugins:
discovery:
search_paths:
- ~/.config/nexo/plugins.d
And strip the in-tree email_plugin_factory registration from
the daemon source (proyecto/src/main.rs Phase 81.19.b block).
Without that strip, both paths are visible but the factory wins.
The subprocess advertises zero tool defs in its initialize
reply — tool dispatch (email_send / email_reply / …) requires
the in-process surface and currently doesn't work in pure
subprocess mode. Follow-up 81.19.b.tool-dispatch-subprocess
tracks closing that gap.
Browser (Chrome DevTools Protocol)
Drives a real Chrome/Chromium instance via CDP. Agents can navigate, click, fill, screenshot, and run JS — with stable element refs that work across DOM mutations within a single turn.
Phase 81.17.c (2026-05-07). The browser plugin now ships as a standalone subprocess (
nexo-rs-plugin-browser), loaded by the daemon via discovery + auto-subprocess fallback (Phase 81.17.b). The 12browser_*tools route through 81.29RemoteToolHandlerover JSON-RPC stdio. The in-treecrates/plugins/browser/source stays in the workspace dormant for one migration window; deletion is tracked in follow-up81.17.c.in-tree-removal.
Source-of-truth: standalone repo
github.com/lordmacu/nexo-plugin-browser
(local: /home/familia/chat/nexo-rs-plugin-browser/). In-tree mirror at
crates/plugins/browser/ is dormant; the daemon does NOT instantiate it
in-process anymore.
Out-of-tree subprocess install
cd /path/to/nexo-rs-plugin-browser
cargo build --release
# Copy binary + manifest into a discovery search path.
mkdir -p ~/.local/share/nexo/plugins/browser
cp target/release/nexo-plugin-browser ~/.local/share/nexo/plugins/browser/
cp nexo-plugin.toml ~/.local/share/nexo/plugins/browser/
In plugins.yaml:
plugins:
discovery:
search_paths:
- ~/.local/share/nexo/plugins
The discovery walker picks up the manifest on next boot;
auto-subprocess fallback spawns the binary; tool handlers register
in the agent's scoped registry via RemoteToolHandler. ENV vars
flow from cfg.plugins.browser YAML via the daemon's
seed_browser_subprocess_env helper.
The standalone repo's README covers ENV var reference, sandbox notes, and the latency budget.
Topics
| Direction | Subject | Notes |
|---|---|---|
| Outbound | plugin.outbound.browser | Tool invocations |
| Events | plugin.events.browser.<method_suffix> | Mirrored CDP notifications |
Browser is an outbound-only plugin — there is no unsolicited inbound event from a web page to the agent.
Config
Two shapes accepted: a single map (0.2.x back-compat — keeps the
legacy per-agent_id profile fan-out from Phase 81.17.c.multi-profile)
or a sequence of maps (0.3.0+ declared multi-instance — operator
names each session, every instance has its own Chrome process +
user_data_dir).
Single-map (legacy)
# config/plugins/browser.yaml
browser:
headless: false
executable: "" # empty → search PATH
cdp_url: "" # empty → launch new Chrome
user_data_dir: ./data/browser/profile
window_width: 1280
window_height: 800
connect_timeout_ms: 10000
command_timeout_ms: 15000
args: [] # extra CLI flags for Chrome
Multi-instance (0.3.0+)
# config/plugins/browser.yaml
browser:
- instance: marketing
headless: false
user_data_dir: "" # empty → ${state_dir}/instances/marketing/
allow_agents: [ana] # empty = accept any agent
- instance: research
headless: true
allow_agents: [juan, marketing]
Every browser_* tool gains an optional instance: string argument.
Resolution:
- Explicit
instancematches a declared label → routes there. - No
instance+ exactly 1 declared instance → uses it (compat shim). - No
instance+ 0 declared instances → falls back to the legacy per-agent_idprofile path. - No
instance+ N>1 declared instances →ArgumentInvalid(the caller must name an instance).
Pairing flow: the dashboard surfaces each declared instance with its
.nexo-paired sentinel status. Operator clicks "open Chrome" via the
admin RPC nexo/admin/browser/launch_visible, logs in to the sites
manually, then "mark paired" persists the sentinel so the wizard
flips green. The runtime headless: true → false toggle on
launch_visible is a deferred follow-up
(browser.launch_visible.runtime).
Cost: ~200-500 MB RAM per declared Chrome instance. Single-instance shared-profile mode stays available via the legacy single-map shape or a 1-element array.
| Field | Default | Purpose |
|---|---|---|
headless | false | Launch Chrome without a UI. |
executable | "" | Chrome binary path. Empty = search PATH. |
cdp_url | "" | Connect to an existing Chrome DevTools server (e.g. http://127.0.0.1:9222). Empty = launch a new instance. |
user_data_dir | ./data/browser/profile | Chrome profile cache. Keeps cookies, logins. |
window_width / window_height | 1280 / 800 | Viewport. |
connect_timeout_ms | 10000 | How long to wait for Chrome startup / remote connect. |
command_timeout_ms | 15000 | Per-CDP-command execution timeout. |
args | [] | Extra CLI flags forwarded verbatim to the spawned Chrome. Ignored when cdp_url is set. Later args win — use this to override built-in flags when a restricted environment needs it (e.g. --no-sandbox on Termux). |
Auth
None. CDP is an unauthenticated protocol — use cdp_url only with a
loopback / firewalled Chrome.
Tools exposed to the LLM
| Tool | Purpose |
|---|---|
browser_navigate | Load URL and wait for load event. |
browser_click | Click by element ref (@e12) or CSS selector. |
browser_fill | Type into input / textarea / contenteditable. Replaces content. |
browser_screenshot | Base64 PNG of the viewport. |
browser_evaluate | Run JS, return value as JSON. |
browser_snapshot | Text DOM tree with stable element refs. |
browser_scroll_to | Scroll a target element into view. |
browser_current_url | Current page URL. |
browser_wait_for | Poll for an element to appear. |
browser_go_back / browser_go_forward | Navigation history. |
browser_press_key | Keyboard events. |
All tools are prefixed browser_* for glob filtering in
allowed_tools.
Element refs
browser_snapshot emits a text tree where every actionable element
has a ref like @e12. Those refs are stable within the snapshot
turn but invalidated by any subsequent DOM mutation:
sequenceDiagram
participant A as Agent
participant B as Browser plugin
participant C as Chrome
A->>B: browser_snapshot
B->>C: DOM.describeNode(..)
C-->>B: tree
B-->>A: "Login @e12\nEmail @e13\n..."
A->>B: browser_fill(@e13, "user@…")
B->>C: DOM.focus + Input.dispatch
A->>B: browser_click(@e12)
Note over A,B: refs still valid<br/>(same snapshot turn)
A->>B: browser_snapshot
Note over B: refs from prior snapshot<br/>now INVALID
Rule: take a snapshot, act on refs from that snapshot, take a new snapshot before acting again.
Gotchas
browser_fillreplaces content. No append mode. To add text to existing content, read the current value first (viaevaluate) then send the merged string.- Connecting to an existing Chrome (
cdp_url) skips the profile setup. Any login state is whatever that Chrome already has. - Element refs expire on DOM mutation. The plugin does not auto-refresh — refs from a stale snapshot will error or misfire.
- Headless sites break. Some sites detect headless Chrome and
behave differently. Use
headless: falsefor those.
Google (OAuth, Gmail, Calendar, Drive) + gmail-poller
Two related subsystems:
googleplugin — per-agent OAuth client plus a genericgoogle_calltool that lets an agent hit any Google API the granted scopes allowgmail-pollerplugin — cron-style scheduler that polls Gmail, matches subjects/bodies with regex, and dispatches results to any outbound topic (WhatsApp, Telegram, another agent)
Phase 94 — extracted to standalone subprocess plugin. The agent-callable surface (
google_auth_start,google_auth_status,google_call,google_auth_revoke) now lives innexo-rs-plugin-google, packaged as a separate binary the daemon spawns via discovery. Operator install:cargo install nexo-plugin-googleThe binary self-publishes its manifest at boot (
nexo-plugin-google --print-manifest) and exposes a--oauth-once <agent_id>CLI subcommand the setup wizard uses for initial consent (loopback by default;--devicefor headless).The in-tree
crates/plugins/google/lib survives as the dep fornexo-poller'sgoogle_calendar+gmailbuiltins (call the OAuth client in-process). Future cleanup: migrate poller to the publishednexo-plugin-google 0.2.0lib crate.
Sources: nexo-rs-plugin-google/ (standalone repo) and the legacy
in-tree crates/plugins/google/ (poller-only).
google — per-agent OAuth
Config
Two shapes supported:
Preferred (Phase 17) — declare accounts in a dedicated store and
bind them from the agent via credentials.google:
# config/plugins/google-auth.yaml
google_auth:
accounts:
- id: ana@gmail.com
agent_id: ana # 1:1; gauntlet enforces the binding
client_id_path: ./secrets/google/ana_client_id.txt
client_secret_path: ./secrets/google/ana_client_secret.txt
token_path: ./secrets/google/ana_token.json
scopes:
- https://www.googleapis.com/auth/gmail.modify
Gmail-poller picks these up automatically; agents see google_* tools
when the store has an entry matching their agent_id.
Legacy inline (still works, logs a migration warn):
# agents.yaml
google_auth:
client_id: ${GOOGLE_CLIENT_ID}
client_secret: ${file:./secrets/google_secret.txt}
scopes:
- gmail.readonly
- gmail.send
- calendar
- drive.file
token_file: ./data/workspace/ana/google_token.json
redirect_port: 17653
| Field | Default | Purpose |
|---|---|---|
client_id / client_secret | — | OAuth app creds from Google Cloud Console. |
scopes | — | OAuth scopes. Short-form (gmail.readonly) auto-expanded to full URL. |
token_file | google_tokens.json | Persistent refresh-token JSON. Relative paths resolve from workspace. |
redirect_port | 8765 | Loopback callback port. Must match the "Authorized redirect URI" in the OAuth client. |
Pairing flow
sequenceDiagram
participant A as Agent LLM
participant T as google_auth_start
participant B as Browser
participant L as Loopback listener<br/>127.0.0.1:<port>/callback
participant G as Google OAuth
A->>T: invoke
T->>L: start listener
T-->>A: return consent URL
A->>B: ask user to open URL
B->>G: consent flow
G->>L: redirect w/ code
L->>G: exchange code → tokens
L->>L: persist refresh_token<br/>(mode 0o600)
L-->>A: success
The wizard wraps this as a one-shot step, but runtime tools expose the same primitives for re-auth.
Device-code flow (headless setup)
agent setup google offers a second consent path that does not
require a local browser — useful for servers, CI, and SSH-only
environments. The wizard:
- POSTs to
oauth2.googleapis.com/device/codewith the account'sclient_idand scopes. - Prints a 6-character
user_code+ averification_urlto the terminal. - Polls
oauth2.googleapis.com/token(default every 5 s) until the operator approves on any device. - Persists the resulting refresh_token at
token_pathwith mode0o600.
╭─ Device-code OAuth ───────────────────────────────────────
│ Open in any browser: https://www.google.com/device
│ Code to enter: HBQM-WLNF
│ (valid for 1800s)
╰───────────────────────────────────────────────────────────
Waiting for approval...
✔ Tokens persisted at ./secrets/ana_google_token.json.
The Google Cloud Console OAuth client must be type "TVs and
Limited Input devices" for this flow — Desktop/Web clients reject
device-code with client_type_disabled.
Lazy-refresh of client_id / client_secret
GoogleAuthClient.config is ArcSwap<GoogleAuthConfig>. Every
network call (exchange_code, request_device_code,
poll_device_token, refresh_token) first invokes
refresh_secrets_if_changed, which compares mtime on
client_id_path and client_secret_path and re-reads them when
they advance. Rotating the secret files (e.g. quarterly key
rotation in Google Cloud Console) takes effect on the next
tool call without a daemon restart.
Steady-state cost: one fs::metadata call per outbound request.
Audit trail (target credentials.audit):
INFO event="google_secrets_refreshed" \
google_*: re-read client_id/client_secret after on-disk rotation
Tools exposed
| Tool | Purpose |
|---|---|
google_auth_start | Start OAuth, return the consent URL. |
google_auth_status | Report {authenticated, expires_in_secs, has_refresh, scopes}. Safe to poll. |
google_call | Generic {method, url, body?} against any *.googleapis.com endpoint. Auto-refreshes access token. |
google_auth_revoke | Revoke the refresh token; forces full re-auth. |
Supported APIs
Anything under *.googleapis.com that the granted scopes permit.
Common call shapes:
- Gmail v1 —
https://gmail.googleapis.com/gmail/v1/users/me/messages?q=is:unread - Calendar v3 —
https://www.googleapis.com/calendar/v3/calendars/primary/events - Drive v3 —
https://www.googleapis.com/drive/v3/files?q=mimeType='application/pdf' - Sheets v4 —
https://sheets.googleapis.com/v4/spreadsheets/<id>/values/A1:D10
Gotchas
- 401 means the refresh token was revoked. Re-auth via
google_auth_start. - 403 means a scope wasn't granted. Add the scope, revoke, re-auth.
- Token file leaks → revoke immediately. The file holds a refresh token with the granted scopes.
gmail-poller — cron-style Gmail bridge
Poll Gmail, extract fields via regex, render a template, dispatch to any outbound topic. Multi-account, allowlisted by sender substring, rate-limited per dispatch.
Config
# config/plugins/gmail-poller.yaml
gmail_poller:
enabled: true
interval_secs: 60
accounts:
- id: default
agent_id: ana # Phase 17 — binds the account to an agent; defaults to `id` when omitted
token_path: ./data/workspace/ana/google_token.json
client_id_path: ./secrets/google_client_id.txt
client_secret_path: ./secrets/google_client_secret.txt
jobs:
- name: lead_forward
account: default
query: "is:unread subject:(lead OR interesado)"
newer_than: 1d
interval_secs: 120
forward_to_subject: plugin.outbound.whatsapp.default
forward_to: "573000000000@s.whatsapp.net"
extract:
name: "Nombre:\\s*(.+)"
phone: "Tel:\\s*(\\+?\\d+)"
require_fields: [name, phone]
message_template: |
New lead 🚨
{name} — {phone}
Subject: {subject}
{snippet}
mark_read_on_dispatch: true
max_per_tick: 20
dispatch_delay_ms: 1000
sender_allowlist: ["@mycompany.com", "partners@"]
Per-job fields
| Field | Default | Purpose |
|---|---|---|
name | — (required) | Job id. |
account | "default" | Which OAuth account to use. |
query | — (required) | Gmail search (is:unread, etc.). |
newer_than | — | Gmail newer_than: suffix (1d, 2h) — avoids back-filling. |
interval_secs | root interval | Override per-job poll cadence. |
forward_to_subject | — | Broker topic to publish dispatched message. |
forward_to | — | Recipient passed through (JID, chat id, phone). |
extract | {} | Named regex groups applied to the email body. First group wins. |
require_fields | [] | Skip dispatch if any listed extracted field is empty. |
message_template | — (required) | Template with {field}, {subject}, {snippet} placeholders. |
mark_read_on_dispatch | true | Mark the thread as read after successful dispatch. |
dispatch_delay_ms | 1000 | Sleep between multi-match dispatches. |
max_per_tick | 20 | Hard cap per poll cycle. |
sender_allowlist | [] | Substring/domain filter on From: header. Empty = accept all. |
Event shape
{
"to": "<forward_to>",
"kind": "text",
"text": "<rendered message_template>",
"subject": "<email subject>",
"<extract key>": "<captured group>"
}
Published to <forward_to_subject>.
Error backoff
Sustained errors are backed off: [0, 0, 0, 30, 60, 120, 300] seconds
(caps at 300). Transient failures don't stop the poll loop.
Gotchas
- Gmail API only — no IMAP. This plugin is Google-specific. For generic IMAP triage, use a custom extension.
sender_allowlistis substring, not regex. Simpler to read, simpler to get wrong. Quote boundary characters explicitly.extractregex must compile. Invalid regex fails the whole job at boot with an error naming the field.
See also
Long-term memory (SQLite)
Durable memory shared by every agent in the process. One SQLite file,
multi-tenant via an agent_id column on every row. Survives restarts.
Source: crates/memory/src/long_term.rs.
Storage location
long_term:
backend: sqlite
sqlite:
path: ./data/memory.db
One file for all agents. Per-agent isolation is enforced by
WHERE agent_id = ? on every query — not by separate DB files. An
idx_memories_agent(agent_id, created_at DESC) index keeps those
queries fast.
If you want per-agent file separation, override sqlite.path per
agent via an inbound_bindings[] override or a per-agent config
directory.
Schema
The runtime creates these tables at boot if they don't exist.
memories — atomic facts
CREATE TABLE memories (
id TEXT PRIMARY KEY, -- UUID
agent_id TEXT NOT NULL,
content TEXT NOT NULL,
tags TEXT DEFAULT '[]', -- JSON array
concept_tags TEXT DEFAULT '[]', -- auto-derived (phase 10.7)
created_at INTEGER NOT NULL -- ms since epoch
);
CREATE INDEX idx_memories_agent ON memories(agent_id, created_at DESC);
memories_fts — full-text search (FTS5)
CREATE VIRTUAL TABLE memories_fts USING fts5(
content,
id UNINDEXED,
agent_id UNINDEXED
);
Powers the keyword recall mode with BM25 ranking.
interactions — conversation archive
CREATE TABLE interactions (
id TEXT PRIMARY KEY,
session_id TEXT NOT NULL,
agent_id TEXT NOT NULL,
role TEXT,
content TEXT,
created_at INTEGER
);
CREATE INDEX idx_interactions_session ON interactions(session_id, created_at DESC);
reminders — phase 7 heartbeat reminders
CREATE TABLE reminders (
id TEXT PRIMARY KEY,
agent_id TEXT NOT NULL,
session_id TEXT NOT NULL,
plugin TEXT,
recipient TEXT,
message TEXT,
due_at INTEGER,
claimed_at INTEGER,
delivered_at INTEGER,
created_at INTEGER
);
CREATE INDEX idx_reminders_due
ON reminders(agent_id, delivered_at, due_at ASC);
recall_events — signal tracking (phase 10.5)
CREATE TABLE recall_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
agent_id TEXT,
memory_id TEXT,
query TEXT,
score REAL,
ts_ms INTEGER
);
Every recall() hit records a row. Dream sweeps read this to decide
what to promote.
memory_promotions — dreaming ledger (phase 10.6)
CREATE TABLE memory_promotions (
memory_id TEXT PRIMARY KEY,
agent_id TEXT,
promoted_at INTEGER,
score REAL,
phase TEXT
);
Prevents double-promotion across sweeps.
vec_memories — vector index (phase 5.4, optional)
Created on demand when vector.enabled: true. See
Vector search.
What gets written when
| Action | Writes to |
|---|---|
Agent calls memory.remember(content, tags) | memories, memories_fts, vec_memories (if enabled) |
| Every turn | interactions (used for transcripts, not promoted into memories) |
Agent calls forge_reminder(...) | reminders |
Every recall() hit | recall_events (one row per result returned) |
| Dream sweep promotes hot memory | memory_promotions |
Memory tool
Single unified tool with three actions, visible to the LLM as memory:
| Action | Required | Optional | Returns |
|---|---|---|---|
remember | content | tags[], context | {ok, id} |
recall | query | limit (default 5), mode (keyword | vector | hybrid) | {ok, results: [{id, content, tags}]} |
forget | id | — | {ok} |
Results do not include similarity scores — only content and tags. Scores are used internally for dreaming signal tracking but aren't surfaced to the LLM to avoid encouraging score-gaming prompts.
Other memory-related tools:
forge_memory_checkpoint— snapshot the workspace-git repo (phase 10.9)memory_history— git log + optional unified diff (phase 10.9)
Per-agent isolation
flowchart TB
subgraph PROC[agent process]
DB[(./data/memory.db<br/>single SQLite file)]
end
A1[agent: ana] -->|WHERE agent_id = 'ana'| DB
A2[agent: kate] -->|WHERE agent_id = 'kate'| DB
A3[agent: ops] -->|WHERE agent_id = 'ops'| DB
One LongTermMemory instance per process, shared across agents via
Arc. The MemoryTool attached to each agent passes
ctx.agent_id to every query.
Workspace-git (phase 10.9)
A separate per-agent git repo lives in the agent's workspace
directory (not inside the memory DB). When workspace_git.enabled: true, the runtime commits after:
- Dream sweeps (Phase 10.6)
forge_memory_checkpointtool calls- Session close (
on_expire)
Good for forensic replay — you can git log to see the memory state
at any point. See Soul — MEMORY.md.
Gotchas
- One DB, multi-tenant. A query missing its
agent_idfilter would leak across agents. All runtime code goes through theLongTermMemoryAPI which injects it automatically. - Vacuum is manual. SQLite does not auto-compact after deletes.
Run
VACUUM;periodically (orPRAGMA auto_vacuum=incrementalfrom day one). recall_eventsgrows unboundedly. Dream sweeps periodically prune, but a dreaming-disabled agent's table will grow forever. Add a retention job if you run without dreaming.
Vector search
Optional semantic memory via sqlite-vec — a virtual table inside the same SQLite file used for long-term memory. No separate service, no extra process, no migration.
Source: crates/memory/src/vector.rs,
crates/memory/src/embedding/.
Turning it on
vector:
enabled: true
backend: sqlite-vec
default_recall_mode: hybrid
embedding:
provider: http
base_url: https://api.openai.com/v1
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY}
dimensions: 1536
timeout_secs: 30
Dimension must match the model output:
| Model | Dimensions |
|---|---|
text-embedding-3-small | 1536 |
text-embedding-3-large | 3072 |
nomic-embed-text | 768 |
Gemini text-embedding-004 | 768 |
A mismatch aborts startup with an explicit error. If you already have vectors at a different dimension, you must delete the DB (or the vector table) and rebuild the index.
Storage
CREATE VIRTUAL TABLE vec_memories USING vec0(
memory_id TEXT PRIMARY KEY,
embedding FLOAT[<dimensions>]
);
The virtual table lives in the same SQLite file as memories. A join
on memory_id brings you back the content and tags.
Embedding provider
#![allow(unused)] fn main() { trait EmbeddingProvider { fn dimension(&self) -> usize; async fn embed(&self, texts: &[String]) -> Result<Vec<Vec<f32>>>; } }
Phase 5.4 ships one provider: http — any OpenAI-compatible
/embeddings endpoint. That covers OpenAI, Gemini (via its API),
Ollama, LM Studio, and self-hosted inference.
Local-only providers (fastembed, candle) are intentional follow-ups — the HTTP provider is enough to unblock everything downstream.
Recall modes
Set the default in memory.yaml and override per tool call with the
mode argument.
keyword — FTS5 + concept expansion
flowchart LR
Q[query] --> CT[derive 3 concept tags]
Q --> M[FTS5 MATCH<br/>query OR tag1 OR tag2 OR tag3]
CT --> M
M --> R[rank by BM25]
R --> RES[top N]
- Fast, no embedding cost
- Misses semantic neighbors that don't share vocabulary
- The extra concept tags are auto-derived from the query and help narrow down concept matches
vector — nearest-neighbor
flowchart LR
Q[query] --> EMB[embed]
EMB --> VEC[vec_memories<br/>MATCH k=N*2]
VEC --> JOIN[join memories<br/>filter by agent_id]
JOIN --> RES[top N by distance]
- Catches paraphrases and cross-vocabulary matches
- Embedding request on every call — watch costs and latency
- Falls back to
keywordon provider error (viahybrid) — not on purevectormode, where errors surface
hybrid — Reciprocal Rank Fusion
The default recommendation. Runs both keyword and vector, then fuses
ranks with the RRF formula 1 / (K + rank + 1) where K = 60:
flowchart LR
Q[query] --> K[keyword search]
Q --> V[vector search]
K --> RRF[RRF fusion<br/>K=60]
V --> RRF
RRF --> RES[top N by fused score]
Vector errors degrade gracefully to keyword-only without raising.
Tool interaction
The memory tool takes an optional mode param:
{
"action": "recall",
"query": "what's the client's address?",
"limit": 5,
"mode": "hybrid"
}
If omitted, default_recall_mode is used.
Cost and latency profile
| Mode | Per recall |
|---|---|
keyword | 1 SQL query, no LLM call |
vector | 1 embedding HTTP call + 1 SQL query |
hybrid | 1 embedding HTTP call + 2 SQL queries + fusion |
For high-throughput agents that recall on every turn, start with
keyword and upgrade to hybrid only where you see miss rate
matter.
Gotchas
- Changing embedding model = full reindex. The dimension check catches the obvious case, but even same-dimension model swaps produce semantically different vectors; the old index becomes stale.
sqlite3_auto_extensionregisters once per process. Not a problem in production, but test suites that instantiate multiple SQLite connections across tests may hit edge cases.- Vector returns distance, not similarity. Lower is closer. Hybrid fusion normalizes across both, so callers don't see this directly unless they bypass the tool.
Stdio runtime + Discovery
The stdio runtime is the default way extensions run: a child process speaking line-delimited JSON-RPC over stdin/stdout. This page covers how the runtime discovers, spawns, supervises, and registers tools from a stdio extension.
Source: crates/extensions/src/discovery.rs,
crates/extensions/src/runtime/stdio.rs.
Discovery
# config/extensions.yaml
extensions:
enabled: true
search_paths: [./extensions]
ignore_dirs: [node_modules, .git, target]
disabled: []
allowlist: [] # empty = all allowed
max_depth: 4
follow_links: false
watch:
enabled: false
debounce_ms: 500
ExtensionDiscovery walks each search path, looking for
plugin.toml files:
flowchart TD
ROOT[search_paths root] --> WALK[walkdir max_depth]
WALK --> IGNORE{dir in<br/>ignore_dirs?}
IGNORE -->|yes| SKIP[skip]
IGNORE -->|no| FIND[find plugin.toml]
FIND --> PARSE[parse + validate manifest]
PARSE --> SIDE[sidecar .mcp.json if manifest<br/>has no mcp_servers]
SIDE --> PRUNE[prune nested candidates]
PRUNE --> DEDUP[dedupe by id]
DEDUP --> DIS[apply disabled filter]
DIS --> ALLOW[apply allowlist filter]
ALLOW --> SORT[sort by root_index, id]
SORT --> CANDS[DiscoveryReport<br/>candidates + diagnostics]
Prune-nested removes any candidate whose root_dir is a strict
descendant of another — avoids registering an extension twice if it
happens to live inside another extension's tree. Algorithm is
O(N × depth).
follow_links = false is the default (monorepo-safe). When enabled,
symlink escapes out of the root raise DiagnosticLevel::Error.
Gating
Before spawn, Requires::missing() runs:
flowchart LR
CAND[candidate] --> REQ[requires.bins<br/>+ requires.env]
REQ --> BINS{all on $PATH?}
BINS -->|no| SKIP1[warn + skip]
BINS -->|yes| ENV{all env set?}
ENV -->|no| SKIP2[warn + skip]
ENV -->|yes| SPAWN[spawn runtime]
A skipped extension does not register any tools. The warn log names exactly which bin or env var was missing.
Spawn model
sequenceDiagram
participant H as Host (agent)
participant S as StdioRuntime
participant C as Child process
H->>S: spawn(manifest, cwd)
S->>C: tokio::process::Command
S->>C: {"jsonrpc":"2.0","method":"initialize",<br/>"params":{"agent_version","extension_id"},"id":0}
C-->>S: {"result":{"server_version","tools":[...],"hooks":[...]}}
S-->>H: HandshakeInfo
H->>H: register each tool as ExtensionTool
H->>H: register each hook as ExtensionHook
- Child is spawned with the extension's directory as
cwd stdin+stdoutis the RPC channel (line-delimited JSON)stderris routed to the agent'stracingoutput- Handshake timeout: default 10 s
Tool descriptors
{
"name": "get_weather",
"description": "Look up weather by city.",
"input_schema": { "type": "object", "properties": { "city": { "type": "string" } }, "required": ["city"] }
}
The host wraps each descriptor in an ExtensionTool:
- Registered name:
ext_{plugin_id}_{tool_name}(truncated with hash suffix if it exceeds 64 chars) - Description prefixed with
[ext:{id}]so the LLM knows the origin input_schemacopied to the registered tool
Context passthrough
If the manifest sets context.passthrough = true, every call()
injects:
{ "_meta": { "agent_id": "...", "session_id": "..." }, ...user_args }
The extension can decide how to split state per agent or session.
Env injection
The host passes through most env vars to the child, but blocks secret-like names via substring/suffix rules:
- Suffixes:
_TOKEN,_KEY,_SECRET,_PASSWORD,_CREDENTIAL,_PAT,_AUTH,_APIKEY,_BEARER,_SESSION - Substrings:
PASSWORD,SECRET,CREDENTIAL,PRIVATE_KEY
Extensions that need a secret should read it from a file path the
host passes by argument, or have the secret baked into their own
requires.env entry (which the operator whitelists consciously).
Supervision
stateDiagram-v2
[*] --> Spawning
Spawning --> Ready: handshake ok
Ready --> Restarting: child crash
Restarting --> Ready: handshake ok again
Restarting --> Failed: max attempts<br/>in restart_window
Ready --> Shutdown: graceful signal
Failed --> Shutdown
Shutdown --> [*]
Supervisor policy:
- Max restart attempts within a sliding
restart_window - Exponential backoff
base_backoff→max_backoff - Each transport is wrapped in a
CircuitBreakernamedext:stdio:{id}so hung children don't freeze the agent loop
Graceful shutdown sends an empty message, waits shutdown_grace
(default 3 s), then kills the child.
Watcher (phase 11.2 follow-up)
With extensions.watch.enabled: true the runtime watches
search_paths for changes to any plugin.toml. Change-set is
debounced (debounce_ms) and compared by SHA-256 of the file to
squash spurious writes.
On change the runtime logs — it does not auto-reload. The operator restarts the agent to pick up the new manifest. Hot reload is a future phase.
Gotchas
- Blocked env vars surprise extensions. If an extension expected
OPENAI_API_KEYto come through and it wasn't declared inrequires.env, the name-based block may silently strip it. Declare the env you need — that whitelists it. follow_links: true+ symlinked monorepo layouts can cause discovery to traverse out of the search root. Keepfollow_links: falseunless you know the layout is bounded.- Children crashing during handshake. You get a single
DiagnosticLevel::Errorper candidate, not a retry loop. Fix the binary, restart the host.
NATS runtime
For extensions that run out-of-process and manage their own lifecycle — a long-lived service on another machine, a container in an orchestrator, an operator-maintained daemon. The agent talks to them over NATS RPC instead of stdin/stdout.
Source: crates/extensions/src/runtime/nats.rs.
When to pick NATS over stdio
| Use stdio | Use NATS |
|---|---|
| Extension is a binary you ship with the agent | Extension is a separate service you operate |
| Lifecycle is tied to the agent | Lifecycle is independent (k8s, systemd) |
| Fast local startup; co-resident on same host | Might be remote or shared between hosts |
| Dev-loop: install once and forget | Sensitive deployment — deploy independently of the agent |
Stdio is the default. Reach for NATS when the extension's failure domain must be separated from the agent's.
Manifest
[plugin]
id = "heavy-compute"
version = "0.3.0"
[capabilities]
tools = ["long_running_job"]
[transport]
type = "nats"
subject_prefix = "ext.heavy-compute"
Wire shape
Single request/reply subject:
{subject_prefix}.{extension_id}.rpc
sequenceDiagram
participant A as Agent
participant N as NATS
participant E as Extension service
A->>N: publish ext.heavy-compute.rpc<br/>{method:"initialize", ...}
N->>E: deliver
E->>N: reply HandshakeInfo
N-->>A: tools + hooks
A->>A: register ExtensionTool per tool
Note over A,E: steady state
loop tool call
A->>N: {method:"tools/long_running_job", params, id}
N->>E: deliver
E-->>N: result
N-->>A: reply
end
The JSON-RPC shape is identical to stdio — only the transport changes. Extensions don't need to know which form the host chose.
Liveness
Instead of supervising a child process, the NATS runtime uses heartbeats:
| Field | Default | Purpose |
|---|---|---|
heartbeat_interval | 15 s | Expected beacon cadence from the extension. |
heartbeat_grace_factor | 3 | Mark failed after grace_factor × interval silence. |
A failed extension logs a warn and is marked unavailable. Tools stay registered in the registry but calls error out immediately. When the extension starts beaconing again, it's automatically marked available.
Circuit breaker
Same pattern as stdio: one CircuitBreaker per extension,
ext:nats:{id}, wrapping every RPC. Prevents a flapping extension
from piling up outstanding calls against it.
Deployment recipes
Docker compose side service
services:
agent:
image: nexo-rs:latest
depends_on: [nats, heavy-compute]
nats:
image: nats:2.10-alpine
heavy-compute:
image: my-ext:0.3.0
command: ["--nats-url", "nats://nats:4222",
"--subject-prefix", "ext.heavy-compute"]
Kubernetes
Run the extension as its own Deployment with its own resource
limits, rollouts, and observability. Share the NATS cluster via a
Service. Scale extensions independently of agents.
Gotchas
subject_prefixcollisions. Two extensions with the same prefix will step on each other. Enforce uniqueness in your ops convention.- Latency. NATS over LAN is sub-millisecond, but any network hop is orders of magnitude slower than stdio's pipe. Don't pick NATS for a 1 kHz tool call pattern.
- Auth on the broker. NATS auth applies to extensions too — if you turn on NKey mTLS, every extension service must be enrolled.
1Password extension
A bundled stdio extension that wraps the op CLI
with a service-account token. Read-only: it never creates or edits
secrets. Two main use cases:
- Look up a secret you don't already have in env (
read_secret). - Use a secret in a command without ever exposing it to the agent
(
inject_template).
Source: extensions/onepassword/. Skill prompt: skills/onepassword/SKILL.md.
Tools
| Tool | Reveals secret? | Audited |
|---|---|---|
status | no | no |
whoami | no | no |
list_vaults | no | no |
list_items | no — strips field values | no |
read_secret | only if OP_ALLOW_REVEAL=true | yes |
inject_template | template-only mode reveals only with OP_ALLOW_REVEAL=true; exec mode never reveals to the LLM | yes |
read_secret
{ "action": "read_secret", "reference": "op://Prod/Stripe/api_key" }
Default response (reveal off):
{
"ok": true,
"reference": "op://Prod/Stripe/api_key",
"vault": "Prod", "item": "Stripe", "field": "api_key",
"length": 26,
"fingerprint_sha256_prefix": "3f9a7c2e1b48d5a0",
"reveal": false
}
With OP_ALLOW_REVEAL=true|1|yes set on the agent process, the
response also contains { "value": "...", "reveal": true }.
inject_template
Resolves {{ op://Vault/Item/field }} placeholders via op inject.
Two execution paths:
Template-only
{ "action": "inject_template",
"template": "Authorization: Bearer {{ op://Prod/API/token }}\n" }
- Reveal off →
{ length, fingerprint_sha256_prefix, reveal: false } - Reveal on →
{ rendered: "Authorization: Bearer abc…", reveal: true }
Exec (piped to a command)
{ "action": "inject_template",
"template": "Bearer {{ op://Prod/API/token }}",
"command": "curl",
"args": ["-H", "@-", "https://api.example.com/me"] }
commandmust be inOP_INJECT_COMMAND_ALLOWLIST(comma-separated). Default empty → exec mode disabled.- Rendered template is never returned to the LLM. Only the
downstream command's
exit_code,stdout(capped atmax_stdout_bytes, default 4096, max 16384), andstderr. - Both
stdoutandstderrare redacted before being returned — Bearer JWT,sk-…,sk-ant-…,AKIA…, and 32+ char hex tokens are replaced with[REDACTED:<label>].
Dry run
{ "action": "inject_template",
"template": "{{ op://A/B/c }} {{ op://X/Y/z }}",
"dry_run": true }
Validates each op:// reference's shape without resolving values.
Returns references_validated.
Configuration
Environment variables consumed by the extension:
| Var | Purpose | Default |
|---|---|---|
OP_SERVICE_ACCOUNT_TOKEN | required | — |
OP_ALLOW_REVEAL | true/1/yes to allow value reveal | off |
OP_AUDIT_LOG_PATH | JSONL audit log path | ./data/secrets-audit.jsonl |
OP_INJECT_COMMAND_ALLOWLIST | comma-separated allowed exec commands | empty (exec disabled) |
OP_INJECT_TIMEOUT_SECS | per-call timeout (capped at MAX_TIMEOUT_SECS) | 30 |
OP_TIMEOUT_SECS | per-call timeout for non-inject commands | 15 |
AGENT_ID | injected by the host on spawn — appears in audit | — |
AGENT_SESSION_ID | injected by the host on spawn | — |
Audit log
read_secret and inject_template append one JSON line per call to
OP_AUDIT_LOG_PATH. The log is append-only and contains only
metadata — never the secret value.
{"ts":"2026-04-25T18:00:00Z","action":"read_secret","agent_id":"kate","session_id":"f1...","op_reference":"op://Prod/Stripe/token","fingerprint_sha256_prefix":"a1b2c3d4e5f6789a","reveal_allowed":false,"ok":true}
{"ts":"2026-04-25T18:00:05Z","action":"inject_template","agent_id":"kate","session_id":"f1...","references":["op://Prod/Stripe/token"],"command":"curl","args_count":4,"dry_run":false,"ok":true,"exit_code":0,"stdout_total_bytes":124,"stdout_returned_bytes":124,"stdout_truncated":false}
{"ts":"2026-04-25T18:00:10Z","action":"inject_template","agent_id":"kate","session_id":null,"references":["op://Bad/Ref"],"command":"rm","args_count":0,"dry_run":false,"ok":false,"error":"command_not_in_allowlist"}
Failures writing the log are reported to stderr and never block the tool — the secret has already been read or piped; refusing to log would be worst-of-both-worlds.
Rotate with logrotate or any append-aware rotator. Keeping the log
on a partition with limited write access (separate user, AppArmor,
or dedicated tmpfs) reduces forensic tampering surface.
Threat model
- The agent process is trusted. Reveal is gated by an env var the operator controls; once on, the value is just a string in memory that flows through the LLM, transcripts, and any tool that touches it.
- Exec mode is the recommended path for any operation that does not require the agent to see the secret. The LLM only knows that the operation succeeded, not what the credential looked like.
- Redaction is best-effort. Stdout from a poorly-behaved command
could still leak a secret in a shape we don't recognize. Cap the
max_stdout_bytesaggressively when in doubt. - The audit log is not encrypted. It contains references and fingerprints, not values. If even the references are sensitive, put the log on a permissioned filesystem.
Building microapps in Rust
A microapp is an external program that talks to the Nexo daemon over a stable wire contract. It can be a single JSON-RPC stdio extension (Phase 11), a NATS subscriber, an HTTP service consuming the webhook envelope, or any combination.
This page lists the helper crates published from the framework that take care of the wire-shape boilerplate so you can focus on the microapp's actual logic.
Tier A — publishable utility crates
nexo-tool-meta
Wire-shape types shared between the daemon and any consumer. Slim, four-dependency, sub-second compile.
[dependencies]
nexo-tool-meta = "0.1"
What's inside:
BindingContext—(channel, account_id, agent_id, session_id, binding_id, mcp_channel_source)tuple stamped on every tool call. Read it fromparams._meta.nexo.binding. Stable across turns within a binding.InboundMessageMeta— per-turn metadata about the message that triggered the agent turn (kind, sender_id, msg_id, inbound_ts, reply_to_msg_id, has_media, origin_session_id). Read it fromparams._meta.nexo.inbound. Provider-agnostic shape; same for whatsapp / future channels / webhook / event-subscriber / delegation / heartbeat.InboundKind— 3-way discriminator (external_user/internal_system/inter_session) surfacing the origin of the turn so microapps can branch handlers without re-deriving from sender presence alone.build_meta_value/parse_binding_from_meta/parse_inbound_from_meta— the inverse trio around the dual-write_metapayload. The daemon emits, the microapp parses.WebhookEnvelope— typed JSON envelope the daemon publishes to NATS after every accepted webhook request.format_webhook_source— Phase 72 turn-log marker helper.
Round-trip example:
#![allow(unused)] fn main() { use nexo_tool_meta::{ parse_binding_from_meta, parse_inbound_from_meta, BindingContext, InboundKind, }; // Inside a JSON-RPC `tools/call` handler. fn handle_call(args: &serde_json::Value) { let meta = &args["_meta"]; if let Some(binding) = parse_binding_from_meta(meta) { // Route the work to the right tenant. match binding.channel.as_deref() { Some("whatsapp") => { /* WA-specific */ } _ => { /* future channels */ } } } else { // Bindingless path: delegation receive, heartbeat // bootstrap, tests. Microapps that don't care still // see the legacy flat block at `meta["agent_id"]` etc. } // Per-turn metadata: who sent what, when, replying to which // earlier message, with media or not. if let Some(inbound) = parse_inbound_from_meta(meta) { match inbound.kind { InboundKind::ExternalUser => { // Real end-user — apply per-sender rate limits, // anti-loop heuristics, etc. let _sender = inbound.sender_id.as_deref(); let _msg_id = inbound.msg_id.as_deref(); } InboundKind::InternalSystem => { // Cron tick / scheduler / yaml-declared internal // event — skip user-facing checks. } InboundKind::InterSession => { // Peer-agent delegation — `origin_session_id` // carries the calling peer's request token. let _origin = inbound.origin_session_id; } _ => { /* future kinds */ } } } } }
Wire layout
Both buckets live as siblings under _meta.nexo.*:
{
"_meta": {
"agent_id": "ana",
"session_id": "00000000-0000-0000-0000-000000000000",
"nexo": {
"binding": {
"agent_id": "ana",
"channel": "whatsapp",
"account_id": "personal",
"binding_id": "whatsapp:personal"
},
"inbound": {
"kind": "external_user",
"sender_id": "+5491100",
"msg_id": "wa.ABCD1234",
"inbound_ts": "2026-05-01T12:34:56Z",
"reply_to_msg_id": "wa.PREV0001",
"has_media": false
}
}
}
}
Either bucket can be absent: binding is omitted on bindingless
paths (delegation receive, heartbeat, tests), inbound is omitted
when the producer didn't populate it (legacy paths predating
Phase 82.5). A microapp must tolerate either being missing.
Producers
| Path | kind | sender_id | msg_id | Source |
|---|---|---|---|---|
| whatsapp inbound | external_user | E.164 phone | wa.<id> | core runtime intake |
| event-subscriber | yaml-declared | JSONPath extract | event id | core runtime synthesizer |
| webhook receiver | yaml-declared (via subscriber) | header/body extract | request id | webhook receiver → subscriber |
| delegation receive | inter_session | None | None | core runtime route_sub |
| proactive tick | internal_system | None | None | core runtime heartbeat_sub |
| email-followup tick | internal_system | None | None | llm_behavior |
nexo-webhook-receiver
Provider-agnostic per-source webhook verification primitives. HMAC-SHA256 / HMAC-SHA1 / raw-token signature verify + event kind extraction (header or JSON path) + NATS publish topic rendering. No HTTP listener — pure-fn surface.
[dependencies]
nexo-webhook-receiver = "0.1"
nexo-webhook-server
Axum-based HTTP listener that mounts the receiver behind a 5-gate
defense pipeline (method / body cap / per-source concurrency /
(source, client_ip) rate limit / signature). Suitable as a
standalone webhook ingestion service in any Rust daemon.
[dependencies]
nexo-webhook-server = "0.1"
nexo-resilience
Circuit breaker + retry + rate-limit primitives. Nothing nexo-specific — drop-in for any Rust service that needs them.
[dependencies]
nexo-resilience = "0.1"
nexo-driver-permission
Bash safety classifier — destructive-command warning, sed-in-place detection, read-only validation, sandbox heuristic. Useful for any tool that lets an LLM (or any other untrusted source) emit shell commands.
[dependencies]
nexo-driver-permission = "0.1"
Tier B — runtime helpers (Phase 83.4)
nexo-microapp-sdk (planned) will package the JSON-RPC stdio
loop, the BindingContext parser, and the webhook envelope
consumer behind ergonomic helpers — replaces the ~200 LOC of
boilerplate every microapp would otherwise rewrite. Watch
Phase 83 in proyecto/PHASES-microapps.md.
Forward-compatibility
Every Tier A type that crosses the wire is either
#[non_exhaustive] (microapps cannot rely on field exhaustivity
when reading) or has a documented field-add policy. Field
additions are deliberate semver-minor: a microapp built against
0.1.0 keeps working when the daemon emits a 0.2.x-shaped
payload because:
- Read-side: serde's permissive default ignores unknown keys.
- Write-side: the daemon never removes fields without bumping major.
Reference microapp
agent-creator-microapp (out-of-tree at
https://github.com/lordmacu/agent-creator-microapp) is a
working microapp that demonstrates:
- JSON-RPC stdio loop (
initialize/tools/list/tools/call/shutdown/ hooks). - Wire-contract integration test that spawns the binary as a subprocess and asserts the daemon-side payload shape.
parse_binding_from_metaconsumption fromnexo-tool-meta.
Use it as the starting template until the dedicated crates/template-rust/
microapp scaffold lands in Phase 83.7.
Testing microapps
Microapps you build on nexo-microapp-sdk get a full in-process
test harness so tool / hook handlers run without a daemon. Two
pieces:
MicroappTestHarnessdrives aMicroappbuilder through the JSON-RPC dispatch loop end-to-end, returning the parsed result frame. Tools and hooks see the sameToolCtx/HookCtxthey would in production.MockAdminRpcis a programmable stand-in for the daemon side ofnexo/admin/*. Register canned responses per method, hand the mock to the harness, and your tools that callctx.admin().call(...)see the canned values. The mock also records every request so tests assert on shape.
Both ship behind the SDK's test-harness cargo feature; the
MockAdminRpc additionally requires the admin feature.
# In your microapp's Cargo.toml
[dev-dependencies]
nexo-microapp-sdk = { version = "0.1", features = ["admin", "test-harness"] }
The reference test in extensions/template-microapp-rust/src/main.rs
exercises every piece below; copy it as a starting template.
Smoke test (no admin, no binding)
#![allow(unused)] fn main() { use nexo_microapp_sdk::{Microapp, MicroappTestHarness, ToolCtx, ToolError, ToolReply}; use serde_json::{json, Value}; async fn ping(_args: Value, _ctx: ToolCtx) -> Result<ToolReply, ToolError> { Ok(ToolReply::ok_json(json!({ "pong": true }))) } #[tokio::test] async fn ping_returns_pong() { let app = Microapp::new("my-microapp", "0.1.0").with_tool("ping", ping); let h = MicroappTestHarness::new(app); let out = h.call_tool("ping", json!({})).await.unwrap(); assert_eq!(out["pong"], true); } }
The harness consumes the Microapp once per call. Tests that
need multiple calls build a fresh app each time, or factor the
builder into a build_app() helper (see the template).
Tool with BindingContext
ctx.binding() returns the (agent_id, channel, account_id, …)
the daemon resolved for this turn. In production it's threaded
through _meta.nexo.binding; tests inject a MockBindingContext
through the same path.
#![allow(unused)] fn main() { use nexo_microapp_sdk::{MicroappTestHarness, MockBindingContext}; #[tokio::test] async fn tool_reads_agent_id_from_binding() { let binding = MockBindingContext::new() .with_agent("ana") .with_channel("whatsapp") .with_account("acme") .build(); let h = MicroappTestHarness::new(build_app()); let out = h .call_tool_with_binding("greet", json!({ "name": "world" }), binding) .await .unwrap(); assert_eq!(out["agent_id"], "ana"); } }
MockBindingContext::new().build() panics if agent_id is
unset — the daemon never delivers a tool call without one, so
the panic surfaces test wiring mistakes immediately.
Tool that calls nexo/admin/*
When a tool calls ctx.admin().call(...) the production path
talks JSON-RPC over stdio to the daemon. The harness installs
the MockAdminRpc's AdminClient instead:
#![allow(unused)] fn main() { use nexo_microapp_sdk::admin::MockAdminRpc; use nexo_microapp_sdk::AdminError; #[tokio::test] async fn whoami_calls_admin_and_surfaces_detail() { let mock = MockAdminRpc::new(); // Register a canned `Ok(value)` response. mock.on( "nexo/admin/agents/get", json!({ "id": "ana", "active": true, "model": { "provider": "minimax" } }), ); let binding = MockBindingContext::new().with_agent("ana").build(); let h = MicroappTestHarness::new(build_app()) .with_admin_mock(&mock) .await; let out = h .call_tool_with_binding("whoami", json!({}), binding) .await .unwrap(); assert_eq!(out["queried_agent"], "ana"); // Mock recorded the request — assert on shape. let calls = mock.requests_for("nexo/admin/agents/get"); assert_eq!(calls.len(), 1); assert_eq!(calls[0].params["agent_id"], "ana"); } }
Three flavours of on*
| Method | Signature | When |
|---|---|---|
on(method, value) | &self, &str, Value | Static Ok(value) |
on_err(method, err) | &self, &str, AdminError | Static Err(err) |
on_with(method, F) | &self, &str, F: Fn(Value) -> Result<Value, AdminError> | Closure responder — receives the request params, returns the result. Use this when the response depends on input or the test wants to count invocations |
A method without a registered responder returns
AdminError::MethodNotFound. The mock is fail-loud on purpose
— tests that forget to wire a response see a clear error rather
than hanging on a default response.
Asserting on errors
The error round-trip is variant-preserving. A daemon that
returns CapabilityNotGranted on the wire shows up as the same
typed variant on the microapp side, and the mock matches that
shape:
#![allow(unused)] fn main() { mock.on_err( "nexo/admin/agents/upsert", AdminError::CapabilityNotGranted { capability: "agents_crud".into(), method: "nexo/admin/agents/upsert".into(), }, ); }
The tool's ctx.admin().call(...) returns Err(AdminError::CapabilityNotGranted { .. })
verbatim — so the tool's error-mapping logic gets exercised
exactly as it would against the live daemon.
Counting invocations from a closure
on_with captures any state the closure needs:
#![allow(unused)] fn main() { use std::sync::Arc; use std::sync::atomic::{AtomicUsize, Ordering}; let count = Arc::new(AtomicUsize::new(0)); let count_clone = Arc::clone(&count); mock.on_with("nexo/admin/ping", move |_| { count_clone.fetch_add(1, Ordering::SeqCst); Ok(json!({})) }); // ... drive the harness ... assert_eq!(count.load(Ordering::SeqCst), 3); }
Hooks
fire_hook(hook_name, args) returns the parsed HookOutcome.
Same harness, different surface:
#![allow(unused)] fn main() { let h = MicroappTestHarness::new(build_app()); let outcome = h .fire_hook("before_message", json!({ "body": "hi" })) .await .unwrap(); assert!(matches!(outcome, HookOutcome::Continue)); }
For Abort cases, match on the variant and inspect reason.
What the harness does NOT do
- Boot a real daemon. No NATS, no
agents.yaml, no live agent loop. Use the harness for tool / hook unit tests; reach for an end-to-end test (a real daemon process spawned from the test) when you need the full pipeline. - Subscribe to the firehose.
nexo/notify/agent_eventdelivery is daemon-side; the harness exits after one request/response. Future helper lands in 83.15.b.b. - Persist anything. Every harness call gets a fresh
Handlersregistry; admin mock state is theMockAdminRpcyou explicitly hand it. Tests are isolated by construction.
Reference
The template microapp ships every pattern above as runnable tests:
cargo test -p template-microapp-rust
See extensions/template-microapp-rust/src/main.rs#tests for
the source. Copy whichever tests apply when you start a new
microapp.
Compliance primitives — when to use which
nexo-compliance-primitives (Phase 83.5) ships six reusable
primitives every conversational microapp needs. This page maps
each primitive to the decision it makes and the symptom that
tells you to wire it in.
| Primitive | Decision | Symptom that demands it |
|---|---|---|
AntiLoopDetector | Block when same body N+ times in window OR auto-reply signature seen | Bot is talking to a bot; "Recibido / Mensaje automático" replies bouncing |
AntiManipulationMatcher | Block on prompt-injection / role-hijack | "Ignore previous instructions", "Act as", system-prompt extraction |
OptOutMatcher | Block + do_not_reply_again on opt-out keyword | User says "STOP", "no me escribas más", "unsubscribe" |
PiiRedactor | Transform: strip PII before LLM sees text | Compliance / data-handling rules forbid passing card / phone / email through LLM |
RateLimitPerUser | Block on bucket exhausted | One user spamming; protect downstream from runaway senders |
ConsentTracker | Gate outbound: only send when OptedIn | GDPR / CAN-SPAM / WhatsApp Business cold-outbound rules |
All six plug into the Phase 83.3 hook interceptor. The microapp
casts the verdict into a HookOutcome::{Block, Transform, Continue} and the daemon acts.
AntiLoopDetector — anti-loop
When to wire it:
- Channel is bidirectional and the user-agent could be another bot (WhatsApp Business with auto-replies, email out-of-office responders, etc.).
- You see your own bot's messages echoed back in the inbound feed.
Wire shape:
#![allow(unused)] fn main() { let mut detector = AntiLoopDetector::new(3, Duration::from_secs(300)); async fn before_message(args: Value, ctx: HookCtx) -> Result<HookOutcome, ToolError> { let body = args.get("body").and_then(|v| v.as_str()).unwrap_or(""); match DETECTOR.lock().unwrap().record_and_evaluate(body) { LoopVerdict::Repetition { count } => Ok(HookOutcome::Block { reason: format!("loop: same message {count}× in 5 min"), do_not_reply_again: true, }), LoopVerdict::AutoReplySignature { phrase } => Ok(HookOutcome::Block { reason: format!("auto-reply signature: `{phrase}`"), do_not_reply_again: true, }), LoopVerdict::Clear => Ok(HookOutcome::Continue), } } }
Tunables: threshold (count to trip) + window (rolling
duration). Custom signature lists via with_signatures(...).
AntiManipulationMatcher — prompt-injection
When to wire it:
- User-controlled inbound text reaches the LLM verbatim.
- Compliance rules prohibit roles or instruction overrides.
Wire shape:
#![allow(unused)] fn main() { let m = AntiManipulationMatcher::default(); async fn before_message(args: Value, _ctx: HookCtx) -> Result<HookOutcome, ToolError> { let body = args.get("body").and_then(|v| v.as_str()).unwrap_or(""); match m.evaluate(body) { ManipulationVerdict::Matched { phrase } => Ok(HookOutcome::Block { reason: format!("manipulation phrase: `{phrase}`"), do_not_reply_again: false, }), ManipulationVerdict::Clear => Ok(HookOutcome::Continue), } } }
Tunables: add domain-specific phrases via
with_extra_phrases(...). Replace the entire list via
with_phrases(...) — useful if you only care about a small
subset.
OptOutMatcher — unsubscribe
When to wire it:
- Channel is regulated (CAN-SPAM, GDPR, WhatsApp Business).
- User can text a keyword to stop replies.
Wire shape:
#![allow(unused)] fn main() { let opt_out = OptOutMatcher::default(); async fn before_message(args: Value, ctx: HookCtx) -> Result<HookOutcome, ToolError> { let body = args.get("body").and_then(|v| v.as_str()).unwrap_or(""); match opt_out.evaluate(body) { OptOutVerdict::OptOut { keyword } => { // Persist the opt-out via ConsentTracker so future // outbound is also gated. CONSENT.lock().unwrap().opt_out( ctx.binding().map(|b| b.account_id.as_deref().unwrap_or("default")).unwrap_or("default"), "stop_keyword", ); Ok(HookOutcome::Block { reason: format!("opt-out keyword: `{keyword}`"), do_not_reply_again: true, }) } OptOutVerdict::Clear => Ok(HookOutcome::Continue), } } }
Whole-word matching avoids substring false positives — "baja"
in "darse de baja" matches; "baja" inside "bajaron" does
not.
PiiRedactor — strip PII before LLM
When to wire it:
- Compliance rules say card / phone / email cannot leave the trust boundary.
- LLM provider's terms of service constrain what you can send.
Wire shape:
#![allow(unused)] fn main() { let redactor = PiiRedactor::new().with_luhn(true); async fn before_message(args: Value, _ctx: HookCtx) -> Result<HookOutcome, ToolError> { let body = args.get("body").and_then(|v| v.as_str()).unwrap_or(""); let (clean, stats) = redactor.redact(body); if stats.total() == 0 { return Ok(HookOutcome::Continue); } Ok(HookOutcome::Transform { transformed_body: clean, reason: Some(format!( "redacted: {} cards, {} phones, {} emails", stats.cards_redacted, stats.phones_redacted, stats.emails_redacted )), do_not_reply_again: false, }) } }
Tunables: with_luhn(true) filters Luhn-invalid
16-digit runs out of the card path (cuts false positives).
skip_phones() / skip_cards() / skip_emails() turn off
individual categories.
RateLimitPerUser — token bucket
When to wire it:
- One user can spam.
- Protect downstream services (LLM provider, your DB, your outbound channel API).
Wire shape:
#![allow(unused)] fn main() { let mut limiter = RateLimitPerUser::flat(20, Duration::from_secs(60)); async fn before_message(args: Value, ctx: HookCtx) -> Result<HookOutcome, ToolError> { let user_key = ctx .inbound() .map(|m| m.from.clone()) .unwrap_or_else(|| "anon".into()); match limiter.try_acquire(&user_key) { RateLimitVerdict::Allowed { .. } => Ok(HookOutcome::Continue), RateLimitVerdict::Denied { retry_after } => Ok(HookOutcome::Block { reason: format!("rate-limited; retry in {retry_after:?}"), do_not_reply_again: false, }), } } }
Tunables: RateLimitPerUser::new(rate, window, max) lets
you specify a different burst max from the long-run rate. The
constructor clamps max >= rate so the long-run rate is
honoured.
ConsentTracker — opt-in gate for outbound
When to wire it:
- Cold outbound (you message users who didn't message first).
- Regulatory: GDPR, CAN-SPAM, WhatsApp Business policy.
Wire shape:
#![allow(unused)] fn main() { let tracker = ConsentTracker::new(); // On every outbound dispatch: fn can_send(user_key: &str) -> bool { TRACKER.lock().unwrap().allows_outbound(user_key) } // On opt-in form submission: TRACKER.lock().unwrap().opt_in(user_key, "web_form"); // On opt-out keyword: TRACKER.lock().unwrap().opt_out(user_key, "stop_keyword"); }
Default Unknown status means no outbound — CAN-SPAM
"express consent" default-deny. The audit log
(history_for_user(...)) gives you a per-user timestamped
record of every consent change.
Composition
In production you wire ALL of them in one before-message hook:
#![allow(unused)] fn main() { async fn before_message(args: Value, ctx: HookCtx) -> Result<HookOutcome, ToolError> { let body = extract_body(&args); let user_key = extract_user_key(&ctx); // Order matters: cheapest checks first. if matches!(opt_out.evaluate(body), OptOutVerdict::OptOut { .. }) { return block_with_do_not_reply(); } if matches!(manipulation.evaluate(body), ManipulationVerdict::Matched { .. }) { return block(); } if let RateLimitVerdict::Denied { .. } = limiter.try_acquire(user_key) { return block_rate_limited(); } if let LoopVerdict::Repetition { .. } | LoopVerdict::AutoReplySignature { .. } = loop_detector.record_and_evaluate(body) { return block_loop(); } // PII redaction LAST so the redacted body goes to the agent. let (clean, stats) = pii.redact(body); if stats.total() > 0 { return Ok(HookOutcome::Transform { transformed_body: clean, .. }); } Ok(HookOutcome::Continue) } }
See also
- contract.md — wire protocol spec.
- rust.md — Rust SDK reference.
- Phase 83.3 hook interceptor (vote-to-block) — the interceptor that consumes these verdicts.
- Phase 82.1 BindingContext — supplies
agent_id,channel,account_idkeys for per-tenant compliance state.
Publishing the SDK crates
This page documents the publish sequence for the framework's microapp-author-facing crates. Operators run these in order when cutting a 0.x release.
Publishable crates (Tier A)
The framework publishes four crates microapp authors consume:
| Crate | Phase | Depends on |
|---|---|---|
nexo-tool-meta | 82.2.b | (no nexo deps) |
nexo-plugin-manifest | 81 | (no nexo deps) |
nexo-compliance-primitives | 83.5 | (no nexo deps) |
nexo-microapp-sdk | 83.4 | nexo-tool-meta |
Other framework crates (nexo-core, nexo-config,
nexo-broker, etc.) are NOT publishable — they are
daemon-internal and microapp authors should not import them.
Publish order
Because nexo-microapp-sdk depends on nexo-tool-meta, the
publish sequence is:
# 1. Standalone crates (any order)
cargo publish -p nexo-tool-meta
cargo publish -p nexo-plugin-manifest
cargo publish -p nexo-compliance-primitives
# 2. Dependent crate (after tool-meta lands on crates.io)
cargo publish -p nexo-microapp-sdk
Verify each step with a dry-run first:
cargo publish -p nexo-tool-meta --dry-run
A clean dry-run prints
Uploading … warning: aborting upload due to dry run and
exits 0. Anything else (path-dep complaint, missing license,
etc.) aborts before upload.
Path-dep elision
The workspace's root Cargo.toml declares each crate with
both version = "..." AND path = "...". Cargo strips the
path segment when packaging for crates.io, so the published
artifact contains version-only deps. Operators do not need
to edit Cargo.toml between local dev and publish.
Versioning policy
Pre-1.0 (current state):
- Breaking changes ALLOWED with a minor bump (
0.1→0.2). - Per-crate CHANGELOG.md describes the migration.
- One release of grace before removing a deprecated symbol —
i.e. release N marks the symbol
#[deprecated], release N+1 removes it. - Wire-format changes propagate together: bumping
nexo-tool-metatriggers a coordinated bump onnexo-microapp-sdk.
Post-1.0 (future):
- Strict semver. Breaking changes require a major bump.
- Wire format changes require coordinated multi-language release (Rust SDK + any future Python/TS SDK).
Out-of-tree microapp migration
After publish, out-of-tree microapps swap their Cargo.toml:
[dependencies]
-nexo-microapp-sdk = { path = "../nexo-rs/crates/microapp-sdk" }
-nexo-tool-meta = { path = "../nexo-rs/crates/tool-meta" }
-nexo-compliance-primitives = { path = "../nexo-rs/crates/compliance-primitives" }
+nexo-microapp-sdk = "0.1"
+nexo-tool-meta = "0.1"
+nexo-compliance-primitives = "0.1"
A microapp that depends only on published versions can build without any nexo-rs source on disk — strictly the published artifact.
CI integration (deferred)
Auto-publish on tag via release-plz is the operator-side deliverable per Phase 83.14:
.github/workflows/publish.ymlruns onv*.*.*tags.- Reads
CARGO_REGISTRY_TOKENfrom secrets. - Calls
cargo publishper crate in the order documented above. - Sequencing: publishes
nexo-tool-metafirst, waits for crates.io index propagation (typically <60 s), then publishes the rest.
The release-plz integration itself lands when the operator
actually tags v0.1.0; until then the per-crate
Cargo.toml + CHANGELOG.md is publish-ready and the dry-run
checks pass.
Model Context Protocol (MCP)
nexo-rs is both an MCP client (consumes tools from external MCP servers) and an MCP server (exposes its own tools so editors like Claude Desktop, Cursor, Zed can use them). Same wire, different directions.
Source: crates/mcp/, bridges in crates/core/src/agent/mcp_*.
The two directions
flowchart LR
subgraph IDE[MCP clients]
CD[Claude Desktop]
CUR[Cursor]
ZED[Zed]
end
subgraph AGENT[agent process]
AS[Agent-as-server<br/>stdio bridge]
AC[Agent-as-client<br/>session runtime]
end
subgraph EXT[External MCP servers]
GS[Gmail MCP]
DB[DB MCP]
WF[Workflow MCP]
end
IDE --> AS
AS --> AR[Agent tools registry]
AC --> EXT
AR --> AC
- Server side — an MCP client (e.g. Claude Desktop) runs
agent mcp serve. The agent's internal tools appear as MCP tools in that client. - Client side — the agent spawns external MCP servers (stdio or
HTTP) and registers their tools into its own
ToolRegistry, so agents can call them exactly like built-ins or extensions.
Phase map
| Phase | What it adds |
|---|---|
| 12.1 | MCP client over stdio |
| 12.2 | MCP client over HTTP (streamable + SSE fallback) |
| 12.3 | Tool catalog — merge MCP tools with extensions and built-ins |
| 12.4 | Session runtime — per-session child spawn, sentinel-shared default |
| 12.5 | Resources — resources/list + resources/read with optional LRU cache |
| 12.6 | Agent as MCP server (stdio) |
| 12.7 | MCP servers declared by extensions |
| 12.8 | tools/list_changed debounced hot-reload |
All eight landed. See PHASES.md.
Why both sides
Being a client lets agents tap any MCP ecosystem without needing a custom extension per service — if the thing you want speaks MCP, you can reach it today.
Being a server lets the carefully-sandboxed tool surface of
nexo-rs (allowed_tools, outbound_allowlist, etc.) be reused from
any MCP-speaking client. Your LLM-driven IDE gets access to WhatsApp
send, Gmail poll, browser CDP, and everything else — without you
wiring each one into the IDE's config.
Wire shape (both directions)
JSON-RPC 2.0. For transports:
- stdio — child process, line-delimited JSON on stdin/stdout
- streamable HTTP — modern MCP 2024-11-05 shape
- SSE — legacy; used as automatic fallback
sequenceDiagram
participant H as Host (agent or IDE)
participant S as MCP server
H->>S: initialize (id=0)
S-->>H: InitializeResult (capabilities, serverInfo)
H->>S: notifications/initialized (fire-and-forget)
loop steady state
H->>S: tools/list
S-->>H: tools[]
H->>S: tools/call {name, args}
S-->>H: content blocks
end
alt tool list changes
S-->>H: notifications/tools/list_changed
H->>S: tools/list (debounced refresh)
end
Where to go next
- Client (stdio + HTTP) — consuming external MCP servers from agents
- Agent as MCP server — exposing the agent's tools over MCP
MCP client (stdio + HTTP)
How nexo-rs consumes tools from external MCP servers. Every MCP tool
ends up in the same ToolRegistry that hosts built-ins and
extensions — the LLM calls them identically.
Source: crates/mcp/src/client.rs, crates/mcp/src/http/client.rs,
crates/mcp/src/manager.rs, crates/mcp/src/session.rs,
crates/core/src/agent/mcp_catalog.rs.
Config
# config/mcp.yaml
mcp:
enabled: true
session_ttl: 30m
idle_reap_interval: 60s
connect_timeout_ms: 10000
call_timeout_ms: 30000
shutdown_grace_ms: 3000
servers:
gmail:
transport:
type: stdio
command: ./mcp-gmail
args: []
env:
GMAIL_TOKEN: ${file:./secrets/gmail_token.json}
workflow:
transport:
type: http
url: https://mcp.example.com/workflow
mode: auto # streamable_http | sse | auto
headers:
Authorization: Bearer ${WORKFLOW_TOKEN}
resource_cache:
enabled: true
ttl: 30s
max_entries: 256
resource_uri_allowlist: [] # empty = permissive
strict_root_paths: false
context:
passthrough: true
sampling:
enabled: false
watch:
enabled: false
debounce_ms: 200
Transports
stdio
Child process per server. Line-delimited JSON-RPC 2.0 over
stdin/stdout. stderr is routed to the agent's tracing output.
sequenceDiagram
participant M as McpRuntimeManager
participant S as Server (child process)
M->>S: spawn Command(cmd, args, env)
M->>S: {"method":"initialize","id":0, ...}
S-->>M: capabilities + serverInfo
M->>S: notifications/initialized (no-reply)
Note over M,S: steady state — tools/list, tools/call, resources/*
M->>S: notifications/cancelled (per in-flight id)<br/>then shutdown_grace
HTTP — streamable vs SSE
Three modes selectable per server:
mode | Behavior |
|---|---|
streamable_http | MCP 2024-11-05 spec — modern |
sse | Legacy Server-Sent Events fallback |
auto (default) | Try streamable_http; on 404/405/415, fall back to SSE |
Each connection gets an mcp-session-id header. Additional headers
(auth, routing) pass through a HeaderMap; values are env-resolved
at config load.
Session runtime
A single McpRuntimeManager lives per process. Inside, a
SessionMcpRuntime per conversation session keeps its own map of
live MCP clients:
flowchart TB
MGR[McpRuntimeManager<br/>one per process]
MGR --> SENT[Sentinel session<br/>UUID = nil<br/>shared by all agents]
MGR --> S1[session A runtime]
MGR --> S2[session B runtime]
SENT --> C1[mcp client: gmail]
SENT --> C2[mcp client: workflow]
S1 --> CX[session-scoped clients<br/>for stateful servers]
- Sentinel session (UUID =
nil) is the default shared namespace — all agents see the same clients, avoiding duplicate child processes for servers that don't need per-session isolation - Per-session runtimes are spawned when a server genuinely needs independent state (example: a workflow engine that tracks its own context per user)
- Idle reap — every
idle_reap_interval, the manager disposes sessions unused for longer thansession_ttl, shutting their clients down gracefully - Config fingerprinting — changes to the
serversset produce a new fingerprint; runtimes are rebuilt on request; concurrent requests de-dupe so only one rebuild happens
Tool catalog
McpToolCatalog::build() calls tools/list on every configured
server in parallel and merges the results:
flowchart LR
LIST[tools/list per server<br/>parallel] --> PREFIX[prefix names:<br/>server_toolname]
PREFIX --> MERGE[merge into ToolRegistry]
MERGE --> LLM[tools visible to LLM]
LIST -.->|single-server error| ERR[non-fatal:<br/>server visible with error=...]
- Names are always prefixed
{server_name}_{tool_name}so collisions across servers can't happen - Duplicates within the same server → first wins, warn log
input_schemais passed through verbatim- Server capability
resourcesunlocks two meta-tools for reading resources
Tool call flow
sequenceDiagram
participant A as Agent
participant C as McpCatalog tool
participant R as SessionMcpRuntime
participant S as MCP server
participant CB as CircuitBreaker
A->>C: invoke gmail_list_messages(...)
C->>R: call(server=gmail, tool=list_messages, args)
R->>CB: allow?
CB-->>R: yes
R->>S: tools/call {name, args, _meta}
S-->>R: content blocks
R-->>C: content
C-->>A: result
Every RPC goes through a per-server CircuitBreaker. If the breaker
is open, the call fails fast instead of hanging on a dead server.
Context passthrough
When mcp.context.passthrough: true, tools/call injects:
{ "_meta": { "agent_id": "ana", "session_id": "..." }, ...args }
Server-side code can use this to scope state per agent without the schema leaking that concern.
Resources
Servers advertising resources capability unlock:
resources/list(paginated viacursor, max 64 pages)resources/read(optionally cached via LRU)resources/templates/list(URI templates)
Cache config:
resource_cache:
enabled: true
ttl: 30s
max_entries: 256
Cache invalidates on
notifications/resources/list_changed. Optional per-scheme allowlist
(resource_uri_allowlist: ["file", "db"]) rejects unknown URI
schemes before dispatch.
Hot reload (phase 12.8)
flowchart LR
S[server notifies<br/>tools/list_changed] --> DBC[200 ms debounce]
DBC --> REL[catalog rebuild]
REL --> REG[ToolRegistry re-populated<br/>with new schema]
Same flow for resources. Agents in flight at the moment of the rebuild keep their references to the old tool definitions — next turn uses the refreshed registry.
Gotchas
- One MCP child per server by default. Turn on per-session isolation only for servers that genuinely need it; spawning a child per session multiplies resource cost.
notifications/initializedis fire-and-forget. If the server insists on acknowledging it, you have a broken server.- SSE is a last resort. It's in
autofor compatibility; new server deployments should speak streamable HTTP. - Circuit breakers are per-server. One bad server doesn't freeze the catalog; but a flapping one still slows the agent loop via backoff waits.
Agent as MCP server
Expose the agent's tools over MCP so Claude Desktop, Cursor, Zed, or any other MCP-speaking client can use them. Stdio transport; the agent runs as a child process of the consuming client.
Source: crates/mcp/src/server/, crates/core/src/agent/mcp_server_bridge.rs.
Config
# config/mcp_server.yaml
enabled: true
name: agent
allowlist: [] # empty = every native tool; populated = strict allowlist
expose_proxies: false # set true to also expose ext_* and mcp_* proxy tools
auth_token_env: "" # optional env var holding a shared bearer token
| Field | Default | Purpose |
|---|---|---|
enabled | false | Must be true for the server subcommand to start. |
name | "agent" | Reported as serverInfo.name in handshake. |
allowlist | [] | Empty = all native tools. Populated = only these names reach the MCP client. Globs (memory_*) supported. |
expose_proxies | false | Whether ext_* (extension) and mcp_* (upstream MCP) proxy tools are surfaced. |
auth_token_env | "" | If set, the initialize request must present this token; unauthenticated clients get rejected. |
Running it
agent mcp serve --config ./config
The process reads JSON-RPC from stdin and writes responses to stdout — exactly the shape Claude Desktop, Cursor, etc. expect.
Claude Desktop example
~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"nexo": {
"command": "/usr/local/bin/agent",
"args": ["mcp", "serve", "--config", "/srv/nexo-rs/config"],
"env": {
"ANTHROPIC_API_KEY": "sk-ant-..."
}
}
}
}
The Anthropic client spawns the agent, handshakes, and then every agent tool shows up in the conversation's tool list.
Wire flow
sequenceDiagram
participant IDE as MCP client (Claude Desktop)
participant A as agent mcp serve
participant TR as ToolRegistry
participant AG as Agent tools
IDE->>A: initialize (auth_token if configured)
A-->>IDE: capabilities + serverInfo (name, version)
IDE->>A: notifications/initialized
loop every turn
IDE->>A: tools/list
A->>TR: filtered by allowlist + expose_proxies
A-->>IDE: tool defs
IDE->>A: tools/call {name, args}
A->>AG: invoke tool
AG-->>A: result
A-->>IDE: content blocks
end
Tool exposure rules
flowchart TD
ALL[every tool registered in ToolRegistry]
ALL --> FILT1{allowlist<br/>empty?}
FILT1 -->|yes| NATIVE[keep native tools only]
FILT1 -->|no| GLOB[keep tools matching allowlist]
NATIVE --> FILT2{expose_proxies?}
GLOB --> FILT2
FILT2 -->|yes| OUT[include ext_* and mcp_* too]
FILT2 -->|no| SKIP[drop ext_* and mcp_*]
OUT --> EMIT[tools/list response]
SKIP --> EMIT
- Native tools —
memory_*,whatsapp_*,telegram_*,browser_*,forge_*, etc. - Proxy tools —
ext_<id>_<tool>for extensions,<server>_<tool>for upstream MCP. Hidden by default to avoid proxying an external server through to another external client.
Capabilities advertised
tools— alwaysresources— advertised only if the agent exposes any via the server handler (phase 12.5 puts the groundwork in, consumer features follow)prompts— reserved, not advertised yetlogging— conditional on handler implementation
Auth
When auth_token_env is set, the initialize request must present
the token (via a server-specific header convention or as an _meta
field). Clients that don't know the token get rejected before
anything else happens. Useful when the agent is launched through a
shared-host proxy rather than a local command: spawn.
Security model
- Read-only by default? No — the server exposes whatever the
allowlist permits. Model it explicitly:
allowlist: - memory_recall # read memory - memory_store # write memory (remove for read-only) - Outbound channels (
whatsapp_send_message,telegram_send_message) will send real messages from the agent's configured accounts. Include them in the allowlist only if the IDE user should be able to do that. expose_proxies: trueis transitive power. It gives the IDE the full tool set of every extension and upstream MCP server too.
Gotchas
- Allowlist globs match tool names, not prefixes.
memory_*matchesmemory_recallandmemory_storebut notmemory_history(phase 10.9 tool). Write the pattern to match the real set. - No per-IDE-user identity. The server has one identity = the agent's configured credentials. If multiple humans share the IDE, they share the agent's blast radius.
- Proxies forward the agent's rate limits. Calling
whatsapp_send_messagethrough the MCP server is the same as an agent calling it — counts against the same WhatsApp rate bucket.
MCP channels — inbound surfaces from Slack / Telegram / iMessage
An MCP channel is any MCP server that declares the
experimental['nexo/channel'] capability and pushes user
messages into the agent via notifications/nexo/channel. The
runtime treats those messages as trusted inbound: it wraps them
in <channel source="...">…</channel> XML and delivers them
through the same intake lane as a paired WhatsApp / Telegram /
email message.
Outbound is the mirror image: the agent invokes the server's
send_message tool (or the operator-configured equivalent) via
the channel_send LLM tool. Per-server permission relay lets a
user approve risky tools from their phone via a structured
yes <id> / no <id> reply.
This page covers the operator-facing surface. For the schema
details see agents.channels in the YAML reference.
Why channels
Channels turn the agent from a thing you ask things on a terminal into a thing that lives in the platforms your team already uses. The same primitives that drive chat-side intake (pairing, dispatch policy, per-binding rate limits) apply to channel inbound — channels are not a special case for the gates that decide whether a sender is trusted.
YAML shape
agents:
- id: kate
channels:
enabled: true
max_content_chars: 16000
default_rate_limit:
rps: 5.0
burst: 20
approved:
- server: slack
plugin_source: slack@anthropic
outbound_tool_name: chat.postMessage
rate_limit:
rps: 10.0
burst: 50
- server: telegram
# plugin_source omitted — accept any installed source
# outbound_tool_name omitted — defaults to "send_message"
# rate_limit omitted — inherits default_rate_limit
inbound_bindings:
- plugin: telegram
instance: kate_tg
allowed_channel_servers:
- slack
- telegram
The 5-step gate
Every channel registration runs through a 5-step filter:
- Capability — server declared
experimental['nexo/channel']. - Killswitch —
agents.channels.enabled = true. Hot reloadable. - Per-binding session allowlist — server name is in the
binding's
allowed_channel_servers. - Plugin source verification — when the approved entry
declares
plugin_source, the runtime's stamp must match exactly. Catches a malicious plugin clone with a different source. - Approved allowlist — server appears in
agents.channels.approved. Operators can separate "binding may route through this server" (gate 3) from "we vetted the server itself" (gate 5).
Each gate emits a typed Skip { kind, reason } on failure so
debug output points at the exact YAML knob to fix.
Threading
Each (server, meta) pair maps to a stable agent session uuid
via ChannelSessionKey::derive. Threading priority goes
thread_ts (Slack) → thread_id → chat_id (Telegram, Discord)
→ conversation_id → room_id → channel_id → to. Without
any matching key the session collapses to one per server.
The mapping persists through the SQLite-backed
SqliteSessionRegistry so daemon restarts don't reset Slack
threads — the bot doesn't have to re-introduce itself every
reboot.
Outbound + permission relay
channel_send(server, content, arguments?) resolves the
server's outbound tool from the RegisteredChannel snapshot
(default send_message, configurable per-server) and invokes
it through the existing MCP runtime. arguments is passed
verbatim; content populates a text key when the operator
hasn't supplied one.
When a tool requires approval AND the agent's binding has a
channel server with experimental['nexo/channel/permission'],
the runtime emits notifications/nexo/channel/permission_request
to the server and races every channel reply against the local
prompt. The first decision wins. Reply format the server
parses and forwards as a structured event:
^\s*(y|yes|n|no)\s+([a-km-z]{5})\s*$
The 5-letter ID uses the alphabet a-z minus l (visually
confusable with 1 / I in many fonts). Phone autocorrect's
capitalisation of the prefix is tolerated.
Rate limit
Per-server token bucket throttles inbound before parsing. When
the bucket is empty the message is dropped with a structured
warn — a noisy server cannot blow up memory or flood the
conversation context. Configure via default_rate_limit (global
ceiling) and per-server rate_limit (override). 0/0 means
unthrottled; the validator caps rps at 1000 to catch typos.
Hot-reload
Flipping channels.enabled or removing a server from
approved triggers a re-evaluation of every active
registration via ChannelRegistry::reevaluate. Entries that no
longer pass the gate get unregistered with a typed
SkipKind reason; surviving entries stay live without a daemon
restart.
LLM tools the agent gets
channel_list— list active registrations for the agent's current binding (read-only, auto-approve-friendly).channel_send— outbound wrapper.channel_status [server?]— diagnostic surface (registered? plugin source? permission relay? registered-at-ms?). Whenserveris omitted, returns one row per registered server.
All three resolve binding_id from ctx.effective.binding_index
at call time, falling back to agent_id for paths without a
binding match.
Audit
Every turn driven by a channel inbound writes
source: "channel:<server>" into the Phase 72 turn-log
(goal_turns table). Operators can answer "what came in via
Slack today?" with a single SQL filter on the indexed source
column.
See also
- Channel doctor (operator CLI)
- Concept — pairing — channel inbound
flows through the same pairing gate as WhatsApp / Telegram
inbound, so a sender that hasn't been allowlisted will see
a
[pairing]denial just like any other surface.
MCP server (HTTP + SSE)
The agent can expose its own tools as an MCP server
so other clients (Claude Desktop, Cursor, Zed, custom IDE plugins,
remote consumers, third-party plugins like the upcoming
nexo-marketing extension) can call them. The transport ships in
two flavours, both backed by the same Dispatcher and so both
share identical wire-level behaviour:
| Transport | Status | Path | Use case |
|---|---|---|---|
| stdio | shipped (Phase 12.6) | agent mcp-server over the process stdio | Local IDE plugins that spawn the agent as a subprocess |
| HTTP+SSE (Streamable) | shipped (Phase 76.1) | POST /mcp, GET /mcp, DELETE /mcp | Remote clients, multi-process consumers, browser-based tools |
| Legacy SSE alias | optional (Phase 76.1) | GET /sse, POST /messages?sessionId=… | Older Claude Desktop builds still on the 2024-11-05 spec |
Phase 76.1 only ships the transport layer. Pluggable auth (Phase 76.3), multi-tenant isolation (76.4), per-tool rate-limit (76.5), durable sessions + SSE replay (76.8 — see "Session resumption" below), and TLS-in-process (76.13) are tracked separately. For production today, terminate TLS at nginx/caddy/Traefik in front of the loopback bind.
Enabling HTTP
Edit config/mcp_server.yaml:
mcp_server:
enabled: true
http:
enabled: true
bind: "127.0.0.1:7575"
auth_token_env: "NEXO_MCP_HTTP_TOKEN"
allow_origins:
- "http://localhost"
- "http://127.0.0.1"
body_max_bytes: 1048576
request_timeout_secs: 30
session_idle_timeout_secs: 300
max_sessions: 1000
enable_legacy_sse: false
Start the daemon as usual; agent mcp-server boots both stdio
and the HTTP listener when http.enabled: true.
Authentication (Phase 76.3)
The HTTP transport supports four pluggable authentication modes.
All modes share an anti-enumeration response shape: every rejection
returns the same 401 body
({"jsonrpc":"2.0","error":{"code":-32001,"message":"unauthorized"}})
so a probing client cannot distinguish missing token, wrong
token, expired token, unknown kid, etc. The reason is logged
via tracing::warn! only.
Configure via mcp_server.http.auth. The block is mutually
exclusive with the legacy auth_token_env; set one or the other.
kind: none
Disables authentication. The runtime refuses to boot if
bind is not a loopback address (127.0.0.0/8 or ::1). For
local dev only.
kind: static_token
Constant-time-compared bearer token.
mcp_server:
http:
enabled: true
auth:
kind: static_token
token_env: "NEXO_MCP_TOKEN"
The env var must resolve to a non-empty string at boot. Clients
present the token via either Authorization: Bearer <token> or
Mcp-Auth-Token: <token>. Comparison runs through subtle::ct_eq
to defeat timing side-channels; length-mismatch returns false
immediately (the length channel is not protected — pick a
fixed-length token).
kind: bearer_jwt
JWT validated against a remote JWKS endpoint with cache + stale-OK fallback.
mcp_server:
http:
enabled: true
auth:
kind: bearer_jwt
jwks_url: "https://idp.example.com/.well-known/jwks.json"
jwks_ttl_secs: 300
jwks_refresh_cooldown_secs: 10
algorithms: ["RS256"]
issuer: "https://idp.example.com/"
audiences: ["nexo-mcp"]
tenant_claim: "tenant_id"
scopes_claim: "scope"
leeway_secs: 30
Boot-time validation rejects:
- Empty
algorithmslist. algorithmscontainingnone.- Mixing HMAC (
HS*) and asymmetric (RS*/ES*/PS*) algorithms in the same list — the algorithm-confusion CVE class.
JWKS robustness:
- The cache uses single-flight refresh (one in-flight HTTP fetch
per
kid, others wait ontokio::sync::Notify). - Refresh attempts are rate-limited by
jwks_refresh_cooldown_secs. - If a refresh fails and a previously-cached key for the same
kidexists, the stale key is reused and awarn!line is emitted (the IdP is allowed transient outages). - If no usable cached key is available, the request returns
HTTP 503 (
-32099 authentication backend unavailable) rather than 401, since the failure is on our side.
The Principal produced by a successful JWT validation carries
tenant_id, subject, and scopes — those flow into
DispatchContext.principal and are available to handlers.
kind: mutual_tls (mode: from_header)
mTLS terminated by a reverse proxy (nginx, Caddy, Traefik). The proxy validates the client cert and forwards the CN/SAN via a trusted header.
mcp_server:
http:
enabled: true
bind: "127.0.0.1:7575" # MUST be loopback in this mode
auth:
kind: mutual_tls
mode: from_header
header_name: "X-Client-Cert-Cn"
cn_allowlist:
- "agent-1.internal"
- "agent-2.internal"
The runtime refuses to boot when bind is not loopback in
this mode — without that constraint any internet client could
forge the header. cn_allowlist is exact-match (no glob, no
substring).
Backward compatibility
The legacy mcp_server.http.auth_token_env field still works.
When set with no auth block, the runtime promotes it to
AuthConfig::StaticToken and emits a tracing::warn! with a
deprecation hint. Setting both auth and auth_token_env
simultaneously fails fast at boot.
Tenant isolation (Phase 76.4)
Every authenticated request carries a validated TenantId on its
[Principal]. The tenant flows from the auth boundary into
DispatchContext::tenant(), and from there into helpers that
namespace filesystem paths and SQLite databases.
Origin of the tenant id
The tenant id is always server-derived from the Principal. A
tool must never read tenant_id from its own arguments — that
would let a caller forge a tenant tag. Pattern ported from
upstream agent CLI:
the client passes only repo, the organizationId is validated
on the server side from the Bearer token. Nexo follows the same
discipline.
How each auth mode derives the tenant
| Mode | Source | Default | Failure |
|---|---|---|---|
none | hardcoded "local" | — | — |
static_token | YAML tenant: field | "default" | invalid id → boot fail |
bearer_jwt | JWT claim named by tenant_claim | reject if missing | invalid format → 401 (TenantClaimMissing) |
mutual_tls (from_header) | cn_to_tenant map → CN itself | — | dotted CN without remap → 401 |
mcp_server:
http:
enabled: true
auth:
kind: static_token
token_env: NEXO_MCP_TOKEN
tenant: prod-corp # 76.4 — pin the tenant for this token
mcp_server:
http:
enabled: true
auth:
kind: mutual_tls
mode: from_header
cn_allowlist: [agent-1.internal, agent-2.internal]
cn_to_tenant: # 76.4 — required for dotted CNs
agent-1.internal: tenant-a
agent-2.internal: tenant-b
Dotted CNs (e.g.
agent-1.internal) cannot be parsed as tenant ids on their own — the strictTenantIdvalidator rejects.. Providecn_to_tenantto remap, or rename the CN. We deliberately do not silently rewrite CNs (no automatic.→-); silent rewrites of identity claims are a security smell.
TenantId validation
TenantId::parse(raw) enforces:
- No NUL bytes (C-syscall truncation vector).
- Input must already be in NFKC canonical form — fullwidth-form
bypasses (e.g.
Tenant,../) are rejected. - Percent-decode-and-recheck:
%2e%2e%2fsmuggling is rejected. - Length: 1–64 bytes.
- Charset:
[a-z0-9_-]only (lowercase ASCII; no dot, slash, uppercase, or whitespace). - No leading or trailing
_or-.
These rules are direct ports of
upstream agent CLI
(sanitizePathKey).
Path scoping
#![allow(unused)] fn main() { use nexo_mcp::server::auth::{tenant_scoped_path, tenant_db_path}; // New writes — non-canonicalising, fast. let p = tenant_scoped_path(&root, ctx.tenant(), "memory/notes.txt"); // Reads — symlink-aware, ports // upstream agent CLI // (validateTeamMemWritePath). let p = tenant_scoped_canonicalize(&root, ctx.tenant(), "memory/notes.txt")?; }
tenant_scoped_canonicalize performs a two-pass containment check:
- Lexical resolution rejects
..and absolute suffixes. realpath()on the deepest existing ancestor follows symlinks and asserts the resolved path is strictly under<root>/tenants/<tenant>/. Symlink loops (ELOOP), dangling symlinks, and sibling-tenant traversal (tenants/t-evil/...trying to pass astenants/t/...) all surface as distinctTenantPathErrorvariants.
Symlink defense is gated on cfg(unix) — Windows
std::fs::canonicalize returns UNC paths that break the prefix
check. Phase 76.4 production targets are Linux musl + Termux; full
Windows port is a follow-up.
TenantScoped<T> trip-wire
#![allow(unused)] fn main() { use nexo_mcp::server::auth::TenantScoped; let db = TenantScoped::new(tenant_a.clone(), open_db_for("tenant-a")); let raw = db.try_into_inner(&tenant_b)?; // → CrossTenantError }
Thin wrapper that pairs a value with the tenant it was constructed
for. try_into_inner is the trip-wire: extracting under a wrong
tenant returns CrossTenantError rather than silently leaking. Not
a load-bearing security boundary on its own — the actual isolation
comes from path scoping at construction time — but cheap defense
in depth against future bugs.
SQLite layout
tenant_db_path(root, tenant) returns
<root>/tenants/<tenant>/state.sqlite3. One DB per tenant is the
strongest isolation rusqlite makes easy: a corrupted DB blasts
exactly one tenant. The production reference at
upstream agent CLI is
file-based + server-side scope enforcement; one-DB-per-tenant in
nexo is a step beyond that, suited to the in-process MCP server
shape.
Per-principal rate-limit (Phase 76.5)
A second rate-limit layer sits inside the dispatcher,
keyed on (tenant_id, tool_name). It complements the per-IP
layer (Phase 76.1, HTTP middleware): the per-IP layer rejects
broad floods at the HTTP level (429 + Retry-After); the
per-principal layer protects individual tools from a single
authenticated tenant exhausting them (200 + JSON-RPC -32099 + data.retry_after_ms).
Wire shape
The per-IP and per-principal layers return different wire shapes — intentional, since they fire at different stack levels:
| Layer | Status | Body |
|---|---|---|
| Per-IP (76.1, before parsing) | 429 Too Many Requests + Retry-After: <secs> header | minimal |
| Per-principal (76.5, inside dispatcher) | 200 OK + JSON-RPC error | {"jsonrpc":"2.0","error":{"code":-32099,"message":"rate limit exceeded","data":{"retry_after_ms":<n>}},"id":<request_id>} |
A client that handles both sees one shape (HTTP 429) for "you're
hitting the public IP gate too hard" and another (JSON-RPC -32099)
for "this tenant has used its tool quota". retry_after_ms is
the time until one token refills.
The Retry-After header parsing pattern (seconds → milliseconds)
is ported from
upstream agent CLI getRetryAfterMs.
Configuration
mcp_server:
http:
enabled: true
per_principal_rate_limit:
enabled: true # default
default: { rps: 100.0, burst: 200.0 } # applies to any tool not in per_tool
per_tool:
agent_turn: { rps: 10.0, burst: 20.0 } # heavier tool, lower limit
memory_search: { rps: 50.0, burst: 100.0 }
max_buckets: 50000 # hard cap on the bucket map
stale_ttl_secs: 300 # prune buckets idle > 5 min
warn_threshold: 0.8 # log when utilization ≥ 80%
When the per_principal_rate_limit block is omitted entirely,
the limiter is not built (zero overhead in the dispatcher
hot path). When the block is present but enabled: false,
the limiter is built but check() short-circuits.
What gets rate-limited
| JSON-RPC method | Gated by 76.5? |
|---|---|
tools/call | yes |
tools/list | no — list calls are cheap, no abuse vector beyond per-IP |
initialize | no — once per session, gated by auth + per-IP |
shutdown | no |
resources/* | no (Phase 76.7 may add a separate gate) |
Stdio principals (auth_method: stdio) bypass the limiter
entirely — stdio is single-tenant by construction, so a
self-throttling agent makes no sense.
Bucket eviction
The bucket map is bounded by max_buckets (default 50 000) with
two eviction strategies running in parallel:
- Hard cap: when
len() ≥ max_bucketsand a fresh key is about to be inserted, the limiter evicts ~1% of the cap from the buckets with the smallestlast_seentimestamp (LRU). - Background sweeper: a
tokio::spawntask wakes every 60 s and prunes any bucket withlast_seenolder thanstale_ttl_secs. The task holds aWeak<Self>so it dies when the limiter is dropped.
This pattern is ported from OpenClaw
research/src/gateway/control-plane-rate-limit.ts:6-7,101-110
(10 k cap + 5-min stale-TTL pruner). The upstream CLI (a prior CLI tool
Code CLI) is client-side only and does not implement
server-side rate-limiting itself; we port the wire shape from
The upstream CLI and the eviction policy from OpenClaw.
Early-warning log
When a bucket's utilization crosses warn_threshold (default
0.8), the limiter emits a tracing::warn! with tenant, tool,
and the current utilization. Useful as an "approaching saturation"
signal so operators can pre-emptively raise a per-tool override
before clients start hitting -32099. Pattern from
upstream agent CLI EARLY_WARNING_CONFIGS, simplified to a single fixed threshold.
Per-principal concurrency cap + per-call timeout (Phase 76.6)
The third gate in the dispatch path. Sits after the rate-limit layer (76.5) and protects against a different failure mode: not "too many requests per second" but "too many requests in flight at once" — typical when handlers are slow and a client keeps firing.
| Layer | Measures | Wire when exceeded |
|---|---|---|
| 76.1 per-IP (HTTP middleware) | requests / second per source IP | HTTP 429 |
| 76.5 per-principal rate-limit | requests / second per (tenant, tool) | JSON-RPC -32099 |
| 76.6 per-principal concurrency cap | in-flight requests per (tenant, tool) | JSON-RPC -32002 |
| 76.6 per-call timeout | wall-clock duration of a single call | JSON-RPC -32001 |
A request must clear all four to reach the handler.
Wire shape
| Outcome | Code | Body data |
|---|---|---|
| Concurrency cap exceeded (queue wait expired) | -32002 | {"max_in_flight": <n>, "queue_wait_ms_exceeded": <n>} |
| Per-call timeout exceeded | -32001 | {"timeout_ms": <n>} |
-32002 is reserved for "operator-side overload" — distinct from
-32099 which means "you, the client, asked too much".
Configuration
mcp_server:
http:
enabled: true
per_principal_concurrency:
enabled: true # default
default: { max_in_flight: 10 } # per-(tenant, tool) default
per_tool:
agent_turn: { max_in_flight: 5, timeout_secs: 300 }
memory_search: { max_in_flight: 20, timeout_secs: 5 }
default_timeout_secs: 30 # fallback when per-tool omits
queue_wait_ms: 5000 # how long to wait for a permit
max_buckets: 50000 # hard cap on the semaphore map
stale_ttl_secs: 300 # prune buckets idle > 5 min
When the block is omitted entirely, the cap is not built
(zero overhead). When enabled: false, the cap is built but
acquire short-circuits to a no-op permit.
What gets capped
| JSON-RPC method | Capped by 76.6? |
|---|---|
tools/call | yes |
tools/list | no |
initialize | no |
shutdown | no |
resources/* | no |
Stdio principals (auth_method: stdio) bypass the cap entirely
(single-tenant by construction).
How permits work
Each (tenant, tool) pair gets a tokio::sync::Semaphore with
max_in_flight permits. The dispatcher acquires one permit before
calling the handler and drops it (RAII) on:
- successful return,
- handler error,
- per-call timeout firing,
- client/session cancellation.
The permit is always released — there is no path that strands
one. Verified by tests/http_concurrency_load_test.rs and the
test fixture in PHASES.md (handler sleeps 60 s with timeout 5 s →
returns -32001 within ~5 s, semaphore back to full permits).
Queue wait
When all permits are taken, a new request waits up to
queue_wait_ms for one to free up. If the wait expires, the
request is rejected with -32002. queue_wait_ms: 0 means "reject
immediately if no permit is available" (no queueing).
Cancellation during the wait (HTTP client disconnect, session
shutdown, tokio::select! on the caller side) propagates: the
acquire returns Cancelled → dispatcher returns -32800 request cancelled rather than waiting out the full queue interval.
Per-call timeout
Independent of the concurrency cap. Wraps the handler future in
tokio::time::timeout(timeout_for(tool), ...). On elapse the
inner future is dropped at its next .await (cooperative
cancellation), the permit is released, and the dispatcher returns
-32001 with data.timeout_ms. Lookup priority for the timeout:
per_tool[<name>].timeout_secsdefault.timeout_secsdefault_timeout_secs
Hard cap on any timeout is 600 s (mirrors
http_config::MAX_REQUEST_TIMEOUT_SECS).
Bucket eviction
Same shape as 76.5: a hard cap (max_buckets, default 50 000)
with LRU eviction at insert + a background sweeper that runs
every 60 s and prunes entries with last_seen older than
stale_ttl_secs. The sweeper only drops entries whose
semaphore has all permits available — it never strands an
in-flight permit. Worst case: a tenant that always has at least
one call in flight never gets its entry pruned, bounded by the
hard cap LRU at insert time.
Reference patterns
- RAII permit + cancel-aware acquire — in-tree
crates/mcp/src/client.rs:873-899(76.1 client side). - DashMap + sweeper + hard-cap eviction — Phase 76.5
per_principal_rate_limit.rs. We mirror the same shape withSemaphorein place ofTokenBucket. tokio::select!cancellation — Phase 76.2dispatch.rs:201-205(biased; cancel; do_dispatch).- AbortSignal/AbortController equivalent —
upstream agent CLIandsrc/services/tools/toolExecution.ts:415-416. The upstream CLI does not implement server-side concurrency caps (it's a client), so only the cancellation propagation idea is portable. - Anti-pattern (NOT ported): OpenClaw
research/src/acp/control-plane/session-actor-queue.ts:6-37uses an unbounded keyed-async-queue. Phase 76.6 explicitly rejects unbounded queues (max_buckets+queue_wait_mstogether bound both memory and tail latency).
Server-side notifications + streaming (Phase 76.7)
Phase 76.7 closes the server→client notification loop on top of
the per-session SSE channel that Phase 76.1 already wired. Three
JSON-RPC notifications are now emitted by the in-tree dispatcher,
plus a fourth (notifications/progress) that tools opt into via
a streaming-aware handler method.
| Notification | Trigger | Wire shape |
|---|---|---|
notifications/tools/list_changed | HttpServerHandle::notify_tools_list_changed() | {"jsonrpc":"2.0","method":"notifications/tools/list_changed"} |
notifications/resources/list_changed | HttpServerHandle::notify_resources_list_changed() | {"jsonrpc":"2.0","method":"notifications/resources/list_changed"} |
notifications/resources/updated | HttpServerHandle::notify_resource_updated(uri, contents) | {"jsonrpc":"2.0","method":"notifications/resources/updated","params":{"uri":<…>,"contents":<…>?}} |
notifications/progress | tool calls progress.report(progress, total?, message?) | {"jsonrpc":"2.0","method":"notifications/progress","params":{"progressToken":<echoed>,"progress":<n>,"total":<n>?,"message":<…>?}} |
Capability advertisement
The default McpServerHandler::capabilities() now returns:
{
"tools": { "listChanged": true },
"resources": { "listChanged": true, "subscribe": true }
}
Implementors that don't support subscriptions can override the method.
Progress reporter
A tool that wants to emit progress overrides call_tool_streaming
on its McpServerHandler (the default delegates to call_tool
and ignores the reporter):
#![allow(unused)] fn main() { async fn call_tool_streaming( &self, name: &str, args: Value, progress: ProgressReporter, ) -> Result<McpToolResult, McpError> { for i in 1..=100 { progress.report(i as f64, Some(100.0), Some(format!("step {i}"))); do_one_step().await; } Ok(/* result */) } }
progress.reportis non-blocking. Drop-oldest on broadcast overflow; sender never panics if the SSE consumer disconnected.- A 20 ms coalescing gate (per reporter) collapses storms — a
tool that calls
report1 000 times in a tight loop produces ≤ 50 events/sec on the wire, with the most recent values emitted on each gate fire. - The reporter is a noop when the originating request did not
include
params._meta.progressToken. Tools callreportunconditionally without branching.
resources/subscribe semantics
→ {"jsonrpc":"2.0","method":"resources/subscribe","params":{"uri":"file:///x"},"id":1}
← {"jsonrpc":"2.0","result":{},"id":1}
Subscriptions are stored in a DashSet<String> on the session,
cleared when the session is removed. The host pushes
notifications/resources/updated via
HttpServerHandle::notify_resource_updated(uri, contents); only
sessions whose subscription set contains uri receive the event.
Reference patterns
upstream agent CLI— client-side consumption oftools/list_changed. The upstream CLI is client-side and does NOT implement server-side notifications; we port the wire shape and build the server-side broadcast ourselves on top of the existingbroadcast::Sender<SessionEvent>per session (Phase 76.1,crates/mcp/src/server/http_session.rs:39-46).crates/mcp/src/server/http_transport.rs:815-820—Laggedevent handling on SSE overflow. Reused as-is fornotifications/progressstorm scenarios.
Session resumption + SSE replay (Phase 76.8)
The HTTP transport persists every server-pushed SSE frame to a
SQLite event store so a reconnecting client can replay the gap via
the Last-Event-ID header instead of re-initialize-ing from
scratch.
Wire contract
- SSE frames carry
id: <seq>(per-session monotonic, starting at- plus
event: message/data: <json-rpc-frame>.
- plus
- Reconnect:
GET /mcpwithMcp-Session-Id: <uuid>+Last-Event-ID: <seq>. The server replays persisted frames withseq > <Last-Event-ID>(capped atmax_replay_batch) before the live broadcast loop attaches. - Header absent → no replay (live only). Header present (any
numeric value, including
0) → replay everything above. - Unknown
Mcp-Session-Id→ HTTP 404 + JSON-RPC body{"error":{"code":-32001,"message":"Session not found"}}. This matches the prior agent CLI client'sisMcpSessionExpiredErrorcontract — a permanent failure that the client must recover by re-initialize.
Configuration
mcp_server:
http:
session_event_store:
enabled: true # opt-in; default off when block omitted
db_path: "data/mcp_sessions.db" # absolute path recommended in prod
max_events_per_session: 10000 # ring cap; oldest pruned every 1000 emits
max_replay_batch: 1000 # hard ceiling per replay (max 10000)
purge_interval_secs: 60 # background prune older than session_max_lifetime_secs
The session_max_lifetime_secs (default 24 h) gates how long
events live in the store. The background purge worker stops on
parent shutdown; SIGTERM does not block on it.
What does not survive a daemon restart
The in-memory HttpSession (broadcast channel + cancellation
token) is gone after a restart. Only events + subscriptions
persist on disk. A client that reconnects with its old session-id
gets the 404 + -32001 contract above and is expected to
re-initialize. Full session reattach (rehydrating
HttpSession entire) is parked as 76.8.b until a real client
asks for it — the upstream client treats expired sessions as
permanent failure, so the parity gap is intentional.
Observability
The same mcp_requests_total{outcome} and mcp_request_duration_seconds
metrics from 76.10 cover replay path requests transparently.
Replay-specific counters (mcp_replay_rows_total,
mcp_replay_skipped_total{reason="cap"}) are deferred to a
follow-up — file an issue if you need them sooner.
Reference patterns
upstream agent CLI— wire format SSEid:+Last-Event-IDreconnect.upstream agent CLI— HTTP 404 + JSON-RPC-32001permanent-failure contract.crates/agent-registry/src/turn_log.rs:64-89— in-treeTurnLogStorepattern mirrored verbatim for theSessionEventStoretrait shape (Phase 72 alignment).
Observability + health (Phase 76.10)
The server emits Prometheus metrics for every dispatch path
plus enriched /healthz + /readyz responses. Metrics are
hand-rolled (LazyLock<DashMap<Key, AtomicU64>> module globals)
following the in-tree pattern (crates/web-search/src/telemetry.rs,
crates/llm/src/telemetry.rs) — render-on-scrape, no
prometheus crate dependency.
Metric inventory
| Metric | Type | Labels | Bumped at |
|---|---|---|---|
mcp_requests_total | counter | tenant, tool, outcome | Dispatcher post-call (every tools/call outcome) |
mcp_request_duration_seconds | histogram (8 buckets: 50/100/250/500/1k/2.5k/5k/10k ms) | tenant, tool | Dispatcher post-call |
mcp_in_flight | gauge (signed) | tenant, tool | RAII InFlightGuard — increment on entry, decrement on every exit path (incl. panic unwind) |
mcp_rate_limit_hits_total | counter | tenant, tool | 76.5 rate-limit reject |
mcp_timeouts_total | counter | tenant, tool | 76.6 per-call timeout reject (-32001) |
mcp_concurrency_rejections_total | counter | tenant, tool | 76.6 concurrency cap reject (-32002) |
mcp_progress_notifications_total | counter | outcome (ok|drop) | 76.7 reporter emit / drop-oldest overflow |
outcome enum (bounded set, byte-stable):
ok | error | cancelled | timeout | rate_limited | denied | panicked.
Cardinality discipline
Tool labels are bounded by MAX_DISTINCT_TOOLS = 256. Beyond that,
every new tool name collapses to "other". Pattern ported from
upstream agent CLI
(mcp__* tools collapsed to 'mcp'). Tenant labels are bounded
by TenantId::parse ([a-z0-9_-]{1,64}) — even a misconfigured
deployment can't blow up the metric.
correlation_id propagation
The HTTP transport extracts X-Request-ID from request headers
(or generates a UUIDv4 when absent), echoes it in the response
header, and stamps it on DispatchContext.correlation_id. The
dispatcher logs it on every mcp.dispatch span:
INFO mcp.dispatch{tenant=acme tool=agent_turn correlation_id=4d8c...} ...
Client-supplied values longer than 128 chars are replaced with a fresh UUIDv4 — don't trust unbounded headers.
/healthz vs /readyz
/healthz (port from Phase 9.3): liveness only, returns
200 {"status":"ok"} as long as the process is alive.
/readyz: structured readiness check with cached snapshot
(TTL 5 s — absorbs scrape thundering-herd):
{
"ready": true,
"checks": {
"broker": true,
"sessions_capacity_ok": true
}
}
Returns HTTP 200 when ready is true, 503 otherwise. Operators
should hit /readyz from k8s readinessProbe and /healthz
from livenessProbe.
Reference patterns
- Cardinality bounding —
upstream agent CLI(MCP tool collapsing) and:281-299(model-name normalisation). Direct port: 256-tool allowlist +"other"collapse. - In-tree precedent —
crates/web-search/src/telemetry.rs:14-260(8-bucket histogram layout),crates/core/src/telemetry.rs:483-557(aggregator). - Anti-pattern flagged —
crates/poller/src/telemetry.rs:74-94uses user-providedjob_id: Stringas a label, which can grow unboundedly. Phase 76.10 deliberately avoids unbounded labels.
Defaults and hardening
HttpTransportConfig::validate() refuses to boot the HTTP
listener when the operator picks an insecure combination:
- Non-loopback
bindwithoutauth_token_env. - Non-loopback
bindwith emptyallow_origins. - Non-loopback
bindwithallow_origins: ["*"]. body_max_bytesabove the 16 MiB hard cap.session_idle_timeout_secsabove 86 400 s (24 h hard cap).request_timeout_secsabove 600 s.session_max_lifetime_secs < session_idle_timeout_secs.
Body parsing is hardened against pathological inputs:
- JSON nesting beyond depth 64 is rejected (
-32600) BEFOREserde_jsonallocates — defends against stack-overflow payloads. - Batch (array) requests are rejected (MCP 2025-11-25 forbids them).
methodandparams.namestrings beyond 64 KiB are rejected.- Notifications (
idabsent) yield202 No Contentand never produce a response body.
Endpoints
POST /mcp
JSON-RPC over HTTP. initialize allocates a new session — the
response carries Mcp-Session-Id: <uuid>. Every subsequent
request MUST include the same header; missing or unknown
session id returns 404.
curl -i -H 'Authorization: Bearer ${TOKEN}' \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","method":"initialize","params":{},"id":1}' \
http://127.0.0.1:7575/mcp
GET /mcp (SSE)
Opens a Server-Sent Events stream for unsolicited notifications
(tools/list_changed, future progress events). Required header
is Mcp-Session-Id. Stream events:
event: message— JSON-RPC envelope from server to client.event: lagged— payload{"dropped": <n>}when the per-session buffer (default 256) overflows due to a slow consumer.event: shutdown— payload{"reason": "<…>"}on graceful daemon shutdown.event: end— payload{"reason": "session_closed" | "max_age" | "expired"}.
DELETE /mcp
Tears down the session referenced by Mcp-Session-Id. Returns
204 on success, 404 if the id is unknown. SSE consumers
listening on the same session receive event: end with
reason: "session_closed".
GET /healthz and GET /readyz
Always reachable, never authenticated, no origin check.
/healthz returns 200 ok while the listener is alive.
/readyz returns 503 until the first successful initialize,
then 200 for the rest of the process lifetime.
Legacy SSE alias (enable_legacy_sse: true)
GET /sse— opens an SSE stream and emits a singleevent: endpointwhosedatais the absolute URL the client must POST to (http://<host>/messages?sessionId=<uuid>). Subsequent server→client events come through the same stream.POST /messages?sessionId=X— equivalent toPOST /mcp, but the JSON-RPC response is delivered on the SSE stream as anevent: messagerather than in the HTTP body. The HTTP body is202 No Content.
Reverse-proxy guidance
In production, terminate TLS in front of the agent. Three recipes below.
Nginx
server {
listen 443 ssl http2;
server_name mcp.example.com;
ssl_certificate /etc/letsencrypt/live/mcp.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/mcp.example.com/privkey.pem;
location /mcp {
proxy_pass http://127.0.0.1:7575;
proxy_http_version 1.1;
proxy_buffering off; # keep SSE responsive
proxy_read_timeout 1h; # SSE long-poll
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header X-Forwarded-Proto $scheme;
}
location /healthz {
proxy_pass http://127.0.0.1:7575;
proxy_http_version 1.1;
}
location /readyz {
proxy_pass http://127.0.0.1:7575;
proxy_http_version 1.1;
}
}
Caddy (v2)
Caddy auto-provisions Let's Encrypt certificates. Minimal
Caddyfile:
mcp.example.com {
reverse_proxy /mcp* 127.0.0.1:7575
reverse_proxy /healthz 127.0.0.1:7575
reverse_proxy /readyz 127.0.0.1:7575
# SSE needs these tuned:
@sse path /mcp
header @sse Cache-Control no-store
header @sse X-Accel-Buffering no
}
Traefik (v3)
YAML static config snippet:
entryPoints:
websecure:
address: ":443"
http:
tls:
certResolver: letsencrypt
http:
routers:
mcp:
rule: "Host(`mcp.example.com`)"
entryPoints: ["websecure"]
service: mcp-backend
tls:
certResolver: letsencrypt
services:
mcp-backend:
loadBalancer:
servers:
- url: "http://127.0.0.1:7575"
With Docker labels (Compose):
services:
nexo-mcp:
labels:
- "traefik.enable=true"
- "traefik.http.routers.mcp.rule=Host(`mcp.example.com`)"
- "traefik.http.routers.mcp.entrypoints=websecure"
- "traefik.http.routers.mcp.tls.certresolver=letsencrypt"
- "traefik.http.services.mcp.loadbalancer.server.port=7575"
# SSE: disable buffering on the MCP route
- "traefik.http.middlewares.mcp-sse.buffering.maxRequestBodyBytes=0"
- "traefik.http.routers.mcp.middlewares=mcp-sse"
mTLS (mutual TLS)
For in-VPC or zero-trust deployments where the MCP server must authenticate the client via certificate:
server {
listen 443 ssl http2;
server_name mcp.internal.example.com;
ssl_certificate /etc/mcp/server.crt;
ssl_certificate_key /etc/mcp/server.key;
ssl_client_certificate /etc/mcp/client_ca.crt;
ssl_verify_client on;
ssl_verify_depth 2;
error_page 495 /_mtls_fail;
location /_mtls_fail {
internal;
return 400 "client certificate required\n";
}
location /mcp {
proxy_pass http://127.0.0.1:7575;
proxy_http_version 1.1;
proxy_buffering off;
proxy_read_timeout 1h;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header X-Client-Cert-Subject $ssl_client_s_dn;
}
}
Caddy mTLS:
mcp.internal.example.com {
tls /etc/mcp/server.crt /etc/mcp/server.key {
client_auth {
mode require_and_verify
trusted_ca_cert_file /etc/mcp/client_ca.crt
}
}
reverse_proxy 127.0.0.1:7575
}
Note: mTLS provides transport-level authentication. When the proxy enforces client certificates, the MCP server's application-layer token/auth requirement can be relaxed (validate accepts
tls.client_ca_pathas a substitute forauth_token).
In-process TLS (server-tls feature)
For deployments that can't/won't run a reverse proxy, the crate ships
an optional server-tls feature:
# Cargo.toml
nexo-mcp = { version = "...", features = ["server-tls"] }
# config/mcp_server.yaml
mcp_server:
enabled: true
http:
tls:
cert_path: /etc/mcp/server.crt
key_path: /etc/mcp/server.key
client_ca_path: /etc/mcp/client_ca.crt # optional: enables mTLS
Current status: the YAML schema and config validation accept the
tls block. The runtime in-process TLS listener is blocked on axum
0.7's serve() which only accepts TcpListener; full support lands
with the axum 0.8 upgrade (generic Listener trait). Today, use the
reverse-proxy recipes above and leave the tls block empty.
The agent's per-IP rate limiter trusts X-Forwarded-For only when
the listener is bound to loopback (operator behind a proxy);
otherwise the direct peer IP is authoritative.
Exposing additional tools (Phase 76.16)
By default the MCP server exposes the five agent introspection tools
(who_am_i, what_do_i_know, my_stats, memory, session_logs).
To surface any subset of the Phase 79 agentic tools to external MCP
clients, add them to expose_tools in config/mcp_server.yaml:
mcp_server:
expose_tools:
- EnterPlanMode # puts the session into read-only plan review mode
- ExitPlanMode # lifts plan-mode; requires operator approval
- ToolSearch # on-demand schema fetch for deferred tools
- TodoWrite # ephemeral intra-turn checklist
- SyntheticOutput # typed/structured output forcing
- NotebookEdit # Jupyter cell-level edits
- RemoteTrigger # webhook / NATS publish from inside a turn
Unknown names and the two gated tools (Config, Lsp) are skipped
with a tracing::warn! log at startup — the daemon continues
normally. The existing allowlist field in mcp_server.yaml still
applies on top of expose_tools, letting operators further restrict
which of the registered tools each client session may call.
Denied-by-default tools (Heartbeat, delegate, RemoteTrigger)
require an additional safe profile:
- List the tool in
expose_denied_tools. - Enable
denied_tools_profile.enabled. - Set the matching
denied_tools_profile.allow.* = true.
Example (safe minimal override for reminders only):
mcp_server:
auth_token_env: MCP_SERVER_TOKEN
expose_tools: ["Heartbeat"]
expose_denied_tools: ["Heartbeat"]
denied_tools_profile:
enabled: true
require_auth: true
require_delegate_allowlist: true
require_remote_trigger_targets: true
allow:
heartbeat: true
delegate: false
remote_trigger: false
Security note:
Config(self-config write-back) andLsp(in-process rust-analyzer / pylsp) require additional infrastructure and are deferred to a later sub-phase. They are intentionally not enabled viaexpose_toolstoday.
Testing the server
Run the full conformance + fuzz suite (Phase 76.12):
cargo test -p nexo-mcp --features server-conformance
This runs:
- 5 proptest cases over
parse_jsonrpc_frame— arbitrary bytes, strings, methods, depths, and batch arrays. Invariant: no panic. - 11 HTTP conformance cases — MCP 2025-11-25 spec fixtures via HTTP transport.
- 11 stdio conformance cases — same fixtures via stdio transport, verifying transport parity.
For the load smoke test (50 sessions × 200 requests = 10 000 calls, p99 gate < 500 ms; takes ~5 s):
cargo test -p nexo-mcp --features server-conformance \
-- --include-ignored load_smoke
Coming in later sub-phases
- 76.13 ✅ — TLS config schema + feature flag + nginx/caddy/Traefik/mTLS reverse-proxy recipes. In-process TLS listener deferred to axum 0.8 upgrade.
- 76.14 ✅ —
nexo mcp-serverCLI ops:inspect,bench,tail-audit. All three subcommands wired and smoke-tested.
Track the rollout in PHASES.md
and the public surface diff in CLAUDE.md.
Building an MCP server extension
Phase 76.15 — operator-friendly walk-through for forking the
template-mcp-serverskeleton into a domain-specific MCP server (e.g.nexo-marketing,nexo-crm). The companion chapter HTTP+SSE transport documents the production knobs (auth, multi-tenant, rate-limit, audit, resume); this chapter is the developer's quickstart.
When to build an MCP server extension
You want one when:
- You have a domain (marketing, CRM, billing, ops) with its own tools, types, and access policy that should NOT live inside the agent process.
- You want separate deployment + auth — for example, the marketing team owns the marketing MCP server and exposes it on their VPC; the agent process is shared infrastructure.
- You want third-party access — Claude Code, Cursor, custom scripts, or another agent connect over HTTP+SSE while the agent proxies through the same surface.
You DON'T want one when:
- The capability is shared by every agent in the workspace —
ship it as a built-in tool inside
crates/core/. - The capability is a thin wrapper over one HTTP API — ship it as an agent extension (stdio JSON-RPC) instead. MCP servers carry per-call auth + rate-limit + audit overhead that's wasted on a single-tenant private endpoint.
The skeleton — extensions/template-mcp-server/
template-mcp-server/
├── Cargo.toml # depends on nexo-mcp (path dep in-tree, crates.io after copy)
├── plugin.toml # extension manifest (id, capabilities)
├── config.example.yaml # documented HTTP block ready to paste
├── README.md # quickstart + production checklist
└── src/
├── main.rs # boot stdio + optional HTTP via env var
└── tools.rs # one typed Echo tool using McpServerBuilder
The whole skeleton is under 250 LOC of Rust. It deliberately
stops short of multi-tenant + audit + rate-limit so the diff stays
readable; everything you need to enable those is in
config.example.yaml plus pointers to the operator chapter.
The 5-step fork
1. Copy + rename
cp -r extensions/template-mcp-server ~/code/nexo-marketing
cd ~/code/nexo-marketing
2. Cargo.toml
- Bump
nametonexo-marketing(or whatever). - Drop
publish = falseif you intend to release. - Switch the
nexo-mcppath dep to a published version:
nexo-mcp = "0.1.1" # was: { path = "../../crates/mcp" }
3. plugin.toml
- Bump
id,name,description. - List your tools under
[capabilities].tools. - Decide whether the agent's extension supervisor should fork the
binary directly (keep
transport.command) or whether you'll run it as a long-lived service (drop the line, register the URL in the agent'smcp_server.httpblock).
4. src/tools.rs
Replace the Echo tool with your domain logic. A typed tool is
three structs + one impl Tool:
#![allow(unused)] fn main() { #[derive(Deserialize, JsonSchema)] pub struct SendEmailArgs { pub to: String, pub subject: String, pub body: String, } #[derive(Serialize, JsonSchema)] pub struct SendEmailOut { pub message_id: String, } pub struct SendEmail { /* config: API key, smtp, etc. */ } #[async_trait] impl Tool for SendEmail { type Args = SendEmailArgs; type Output = SendEmailOut; fn name(&self) -> &str { "send_email" } fn description(&self) -> &str { "Send a transactional email." } async fn call(&self, args: Self::Args, _ctx: ToolCtx<'_>) -> Result<Self::Output, McpError> { // call your provider — propagate errors as McpError variants. Ok(SendEmailOut { message_id: "...".into() }) } } }
The schema for SendEmailArgs is derived once at registration
and cached; runtime cost per tools/call is one BTreeMap::get
on the tool name. The _ctx: ToolCtx<'_> parameter exposes
tenant, correlation_id, session_id, progress, cancel —
use them when you need multi-tenant routing or want to emit
notifications/progress for long-running calls.
5. Wire the agent
Two patterns:
Pattern A — child process (simplest). The agent's extension
supervisor forks your binary as a child and pipes stdio. Add to
the agent's config/agents.yaml:
extensions:
- id: nexo-marketing
command: "/path/to/nexo-marketing"
transport: stdio
Pattern B — long-lived HTTP service. Run the binary as a
systemd unit on its own host. Configure the agent's
mcp_server.http block in config/mcp_server.yaml to point at
it. This unlocks per-tenant auth, audit, rate-limit, and lets
non-agent clients (Claude Code, Cursor) hit the same server.
Production checklist
Before exposing a forked server beyond loopback:
| Knob | Phase | Why |
|---|---|---|
auth.kind: static_token or bearer_jwt | 76.3 | Loopback bind without auth refuses to boot |
allow_origins: [...] (no *) | 76.1 | CORS hard-rejected on non-loopback bind |
audit_log.enabled: true | 76.11 | Per-call durable trail; survives restart |
per_principal_rate_limit.enabled: true | 76.5 | Cap noisy tenants before they exhaust paid APIs |
per_principal_concurrency.enabled: true | 76.6 | Keep one tenant from starving others |
session_event_store.enabled: true | 76.8 | SSE consumers can resume via Last-Event-ID |
| TLS in front (nginx/caddy/Traefik) | 76.13 (deferred) | Direct rustls parked; treat binary as cleartext |
Each row maps to a config block in
config.example.yaml.
Uncomment and fill in.
Quickstart smoke
# Fresh checkout, in-tree build:
cargo build -p template-mcp-server
# Stdio (what the agent's supervisor sees):
echo '{"jsonrpc":"2.0","method":"initialize","params":{},"id":1}' | \
./target/debug/template-mcp-server
# HTTP for direct curl/claude-mcp testing:
MCP_TEMPLATE_HTTP_BIND=127.0.0.1:7676 \
cargo run -p template-mcp-server &
curl -s -X POST http://127.0.0.1:7676/mcp \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","method":"initialize","params":{},"id":1}'
What's NOT in the template
These are deliberately out of scope to keep the skeleton small:
- Tool with
notifications/progress— seecrates/mcp/src/server/progress.rsand Phase 76.7 docs. - Tool with
notifications/tools/list_changedfor hot-reload — the runtime can broadcast it viaHttpServerHandle::notify_tools_list_changed(), but the template doesn't ship a sample of when to fire it. - Resources / prompts surface — only tools are wired. See
crates/mcp/src/bin/mock_mcp_server.rsfor the fullresources/list+resources/readshape. - Custom error types — the template returns
McpErrorvia?. Map your provider errors to specific JSON-RPC error codes (-32602,-32603, custom application codes ≥ -32000) when you need clients to distinguish them.
Reference patterns mined for this template
upstream agent CLI— minimal stdio MCP server in the prior agent CLI source. The upstream CLI uses imperativesetRequestHandler(SchemaName, async (req) => …)per spec method; we collapse that into oneMcpServerBuilder::tool(impl Tool)chain.upstream agent CLI— the upstream factory pattern returning a configuredServer. Ourbuild_handlerclosure plays the same role per transport.crates/mcp/src/bin/mock_mcp_server.rs— exhaustive in-tree reference for protocol corner cases (initialize, paginate, errors, notifications, resources, sampling). Read it when this template's surface stops being enough.research/extensions/firecrawl/— OpenClaw extension layout. Different model (TypeScript provider contracts) but informs theplugin.tomlshape.
Gating by env / bins
Both kinds of skills (extension skills under extensions/ and local
skills under skills_dir) declare what they need to work. The
runtime checks those preconditions at load time and reacts
differently depending on skill kind.
The declaration
Both kinds use the same shape. For an extension, it lives in
plugin.toml:
[requires]
bins = ["ffmpeg", "ffprobe"]
env = ["OPENAI_API_KEY"]
For a local skill it lives in the YAML frontmatter of SKILL.md:
---
name: "Whisper transcription"
requires:
bins: ["ffmpeg"]
env: ["OPENAI_API_KEY"]
---
Check semantics (source: crates/extensions/src/manifest.rs
Requires::missing(), crates/core/src/agent/skills.rs):
- bins — each name looked up on
$PATH. On Windows also<bin>.exe. - env — each name must be set and non-empty.
Two reactions, one mechanism
flowchart TD
CHECK[Requires::missing] --> ANY{missing bin<br/>or env?}
ANY -->|no| OK[proceed]
ANY -->|yes| KIND{skill kind}
KIND -->|extension| WARN[warn<br/>continue<br/>tools still registered]
KIND -->|local skill| SKIP[warn<br/>skip<br/>not injected into prompt]
| Skill kind | On missing preconditions |
|---|---|
| Extension | Warn log, still spawn + register tools. A subsequent tool call will fail visibly when the bin/env is absent. |
| Local skill | Warn log, do not inject into the system prompt. The LLM never hears the skill existed. |
Why the difference
A local skill is a description the LLM reads and internalizes —
"you have a transcription skill, call whisper_transcribe." If the
backing binary is missing, the tool call will fail. But the LLM was
told the capability exists, so it will keep trying. Not injecting
the skill prevents promising capabilities that can't be delivered.
An extension tool is observable: the LLM calls it, gets a
concrete error back ("command tesseract not found on PATH"), and
can adapt in the same turn. Warn-and-continue is the friendlier
behavior — the operator sees the warning and can fix the config
without the agent crash-looping.
Where this is logged
Both kinds emit the same structured warn log fields:
WARN skill=weather missing_bins=[] missing_env=[WEATHER_API_KEY]
"skill disabled: required env vars unset or empty"
WARN extension=docker-api missing_bins=[docker] missing_env=[]
"extension preflight: declared requires not satisfied (continuing anyway)"
Filter on missing_env or missing_bins to alert proactively.
Pre-deploy verification
Use the CLI:
agent ext doctor --runtime
This runs Requires::missing() for every discovered extension,
and with --runtime actually spawns each stdio extension to run
the handshake. Nothing is left to chance.
For local skills, a failing agent turn logs all skipped skills — a dry run against the smallest scripted input gives you the same signal without needing a separate command.
Reserved env for secrets
Extensions receive a filtered copy of the host's env. Names matching
the secret-like patterns below are stripped before spawn
(crates/extensions/src/runtime/stdio.rs):
- Suffixes:
_TOKEN,_KEY,_SECRET,_PASSWORD,_PASSWD,_PWD,_CREDENTIAL,_CREDENTIALS,_PAT,_AUTH,_APIKEY,_BEARER,_SESSION - Substrings:
PASSWORD,SECRET,CREDENTIAL,PRIVATE_KEY
Declaring an env in requires.env whitelists it past the
blocklist. That's the only supported way for an extension to
receive a secret env var. Gating and whitelisting come from the same
field — preconditions you declare travel alongside the value you
want.
Write-gating in practice
Some shipped extensions gate destructive operations behind dedicated
flags — separate from requires.env:
| Extension | Write gate env var |
|---|---|
docker-api | DOCKER_API_ALLOW_WRITE |
proxmox | PROXMOX_ALLOW_WRITE |
onepassword | OP_ALLOW_REVEAL (reveal vs metadata-only) |
google | GOOGLE_ALLOW_SEND, GOOGLE_ALLOW_CALENDAR_WRITE, GOOGLE_ALLOW_DRIVE_WRITE, GOOGLE_ALLOW_TASKS_WRITE, GOOGLE_ALLOW_PEOPLE_WRITE |
These are not handled by the generic gating layer — the extension reads them itself and refuses destructive methods when unset. Good pattern to adopt when your own extension wraps an API with destructive endpoints.
Gotchas
- Empty env counts as missing.
EXAMPLE_KEY=is treated the same asEXAMPLE_KEYunset. This is intentional — empty strings rarely mean "use the default" for a secret. requires.binschecks$PATHat discovery. A binary installed after the agent starts won't be picked up until restart — or until you runagent ext doctor --runtimeas a secondary gate.- Local-skill skip is silent to the LLM. If you expected a skill to be present and you don't see it in the system prompt, check the warn logs for the skip reason before debugging agent behavior.
Dependencies — modes and bin versions
A skill that depends on a CLI tool or an environment variable can
declare those needs in requires. The runtime resolves the
declarations at load time and decides whether to expose the skill,
hide it, or expose it with a visible warning the LLM can see.
---
name: ffmpeg-tools
requires:
bins: [ffmpeg]
env: [TRANSCODE_OUTPUT_DIR]
bin_versions:
ffmpeg: ">=4.0"
mode: strict # default
---
Modes
| Mode | When deps are missing | LLM sees the skill? |
|---|---|---|
strict (default) | Skill is dropped | No |
warn | Skill loads with a > ⚠️ MISSING DEPS … banner prepended to its body | Yes — with the warning inline |
disable | Skill is always dropped, even when deps are satisfied | No |
Per-agent override
Operators override a skill's declared mode without editing the skill file:
agents:
- id: kate
skills: [ffmpeg-tools]
skill_overrides:
ffmpeg-tools: warn
Resolution order:
agents.<id>.skill_overrides[<name>](operator wins)- Skill frontmatter
requires.mode strict(built-in default)
Bin versions
requires.bin_versions adds a semver constraint on top of mere bin
presence. Failing the constraint is treated like a missing dep —
the active mode decides whether to skip or warn.
Constraint syntax
semver request strings:
| Want | Constraint |
|---|---|
| At least 4.0 | ">=4.0" |
| Any 4.x compatible release | "^4.0" |
| 4.x but no 5 | ">=4.0, <5.0" |
| Exact 4.2.1 | "=4.2.1" |
| Patch-compatible to 5.1.3 | "~5.1.3" |
Versions like 4.2 are normalized to 4.2.0 before comparison so
constraint matching works against partial outputs.
Custom probe
Defaults: <bin> --version, regex \d+\.\d+(?:\.\d+)?. Override
when a tool emits something idiosyncratic:
requires:
bin_versions:
curl:
constraint: ">=8.0"
command: "--help"
regex: 'curl (\d+\.\d+(?:\.\d+)?)'
The shorthand form bin: ">=4.0" and the long form
bin: { constraint: …, command: …, regex: … } are both accepted.
Probe fail modes
| Reason | When |
|---|---|
bin_not_found | Binary not on PATH |
probe_failed | Spawn errored or timed out (5 s cap) |
parse_failed | The default regex (or override) didn't match |
constraint_unsatisfied | Found version doesn't match the constraint |
invalid_constraint | Constraint string couldn't be parsed as semver |
Invalid constraints log at error level; the skill is treated as
having a missing dep — boot continues so a typo in one skill doesn't
take the whole agent down. Probes are cached process-wide by absolute
path so a bin shared across skills only spawns once.
Banner format
When mode: warn and any dep is missing, the skill body is rendered
to the LLM with this prefix:
> ⚠️ MISSING DEPS for skill `ffmpeg-tools`:
> - bin not found: ffmpeg
> - env unset: TRANSCODE_OUTPUT_DIR
> - version mismatch: ffmpeg requires >=4.0 (found 3.4.2)
> Calls into this skill may fail.
The LLM treats this like any other markdown context, so it has the information it needs to either avoid the skill or report a useful error to the user when a tool call fails.
Backwards compatibility
Skills without requires.mode, requires.bin_versions, or
agents.<id>.skill_overrides keep the prior behavior (strict, no
version checks). The defaults are chosen so an unmodified skill
catalog and existing agents.yaml continue to work unchanged.
TaskFlow model
TaskFlow is a durable, multi-step flow runtime that survives process restarts and external waits. It's designed for work that spans more LLM turns than a single conversation buffer can hold — approvals, data pipelines, delegated subtasks, scheduled actions.
Source: crates/taskflow/ (types.rs, store.rs, engine.rs).
When to use it
Use TaskFlow when any of the following apply:
- A task needs to pause and resume later (hours, days)
- Multiple agents collaborate on one outcome
- You need a full audit trail of what happened and when
- You need recovery from a crash mid-task
If it's a one-shot turn, don't reach for TaskFlow — the runtime's normal session buffer is enough.
Flow shape
A flow is an opaque state_json (free-form JSON) plus metadata:
| Field | Purpose |
|---|---|
id | UUID generated on creation. |
controller_id | String label identifying the flow definition (e.g. kate/inbox-triage). |
goal | Human-readable statement of intent. |
owner_session_key | agent:<id>:session:<session_id> — hard tenancy gate. |
requester_origin | Who asked (user id, external system id). |
current_step | String label for the current phase ("classify", "await_approval", …). |
state_json | Free-form JSON owned by the flow — the LLM mutates this over time. |
wait_json | Current wait condition while status = Waiting. |
status | See state machine below. |
cancel_requested | Sticky flag that forces the next valid transition to Cancelled. |
revision | Monotonic integer; increments on every update. Used for optimistic concurrency. |
created_at / updated_at | Timestamps. |
state_json is shallow-merged on updates: a patch { "foo": 1 }
replaces only the foo key, everything else is preserved.
State machine
stateDiagram-v2
[*] --> Created
Created --> Running: start_running
Running --> Waiting: set_waiting(condition)
Waiting --> Running: resume
Running --> Finished: finish
Running --> Failed: fail
Waiting --> Failed: fail
Created --> Cancelled: cancel
Running --> Cancelled: cancel
Waiting --> Cancelled: cancel
Finished --> [*]
Failed --> [*]
Cancelled --> [*]
- Terminal states:
Finished,Failed,Cancelled. No further transitions allowed. - Sticky cancel:
cancel_requested = trueforces the next allowed transition to land onCancelled. The flag survives restart and is idempotent — multiple cancel requests converge on the same outcome.
Persistence
SQLite-backed via sqlx, pool size 5. Default path
./data/taskflow.db, override with TASKFLOW_DB_PATH.
Tables
CREATE TABLE flows (
id TEXT PRIMARY KEY,
controller_id TEXT,
goal TEXT,
owner_session_key TEXT,
requester_origin TEXT,
current_step TEXT,
state_json TEXT,
wait_json TEXT,
status TEXT,
cancel_requested BOOLEAN,
revision INTEGER,
created_at INTEGER,
updated_at INTEGER
);
CREATE TABLE flow_steps (
id TEXT PRIMARY KEY,
flow_id TEXT NOT NULL,
runtime TEXT, -- Managed | Mirrored
child_session_key TEXT,
run_id TEXT,
task TEXT,
status TEXT,
result_json TEXT,
created_at INTEGER,
updated_at INTEGER,
UNIQUE (flow_id, run_id)
);
CREATE TABLE flow_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
flow_id TEXT NOT NULL,
kind TEXT,
payload_json TEXT,
at INTEGER
);
flows.revisiondrives optimistic concurrency (see FlowManager).flow_eventsis append-only — every transition leaves a trail.flow_steps.(flow_id, run_id)UNIQUE catches duplicate observations at the DB layer, not in a race-prone managerial check.
Wait conditions
Persisted in wait_json while status = Waiting.
#![allow(unused)] fn main() { enum WaitCondition { Timer { at: DateTime<Utc> }, // auto-resume at time ExternalEvent { topic: String, correlation_id: String }, // resume when matching event arrives Manual, // resume only via explicit call } }
| Condition | Resumed by |
|---|---|
Timer | WaitEngine::tick() when now >= at |
ExternalEvent | try_resume_external(flow_id, topic, correlation_id, payload) |
Manual | FlowManager::resume(id, patch) — typically via CLI or a deliberate LLM turn |
There is no timeout built into the wait itself — you timeout by
pairing any wait with a Timer fallback (e.g. fan out "wait for
approval OR 24 h elapsed") via orchestration in the flow's step
logic.
Audit trail
Every transition writes a flow_events row with:
kind:created,started,waiting,resumed,finished,failed,cancelled,state_updated,step_observed, ...payload_json: contextual data (wait condition, result, reason, step info)at: timestamp
The audit append happens inside the same SQLite transaction as the state update — you can never see a flow state that doesn't have a matching audit event, even after a crash mid-operation.
Mirrored flows
Beyond Managed flows (owned by FlowManager), you can create Mirrored flows that just observe externally-driven work:
create_mirrored(input)inserts a flow already inRunningstaterecord_step_observation(StepObservation)upserts intoflow_stepsby(flow_id, run_id)— new observations merge with existing rows- Emits
step_observedaudit events
Useful for tracking tasks executed elsewhere — a delegation to another agent, a subprocess spawned out-of-band — while keeping one unified audit surface.
Next
- FlowManager — the mutation API, revision retry, and agent-facing tools
FlowManager, tools, and CLI
FlowManager owns the mutation API for flows. It wraps the
FlowStore with revision-checked atomic updates, the agent-facing
taskflow tool, the WaitEngine, and the agent flow CLI.
Source: crates/taskflow/src/manager.rs,
crates/taskflow/src/engine.rs,
crates/core/src/agent/taskflow_tool.rs.
Responsibilities
flowchart LR
subgraph FM[FlowManager]
CREATE[create_managed<br/>create_mirrored]
RUN[start_running<br/>set_waiting<br/>resume<br/>finish<br/>fail<br/>cancel]
PATCH[update_state<br/>request_cancel]
QUERY[get / list_by_owner / list_by_status / list_steps]
OBS[record_step_observation]
end
FM --> STORE[FlowStore<br/>SQLite]
FM --> ENG[WaitEngine]
TOOL[taskflow tool<br/>agent-facing] --> FM
CLI[agent flow CLI] --> FM
ENG --> STORE
One manager per store — typically one per process. Same database file can be opened by multiple managers safely as long as each goes through the revision protocol.
Optimistic concurrency
Every mutation follows this loop:
flowchart TD
START[mutation requested] --> FETCH[fetch current flow]
FETCH --> APPLY[apply closure:<br/>transition, patch, etc.]
APPLY --> SAVE[store.update_and_append<br/>WHERE id=? AND revision=?]
SAVE --> RES{result}
RES -->|ok| DONE([return updated flow])
RES -->|RevisionMismatch| REFETCH[refetch + retry]
REFETCH --> LIMIT{attempts >= 2?}
LIMIT -->|no| APPLY
LIMIT -->|yes| ERR([surface RevisionMismatch])
revisionis a monotonic integer on every flow- Update runs
UPDATE ... WHERE id=? AND revision=?— only one writer wins per revision - Retry budget is 2 attempts (1 fetch + 1 refetch); persistent conflict bubbles up to the caller
- Update and audit-event append happen inside a single SQLite transaction — crash mid-operation cannot produce a desync between state and audit trail
WaitEngine
Broker-agnostic scheduler. Pull-based tick() advances any flow
whose wait condition has fired.
flowchart LR
TICK[WaitEngine::tick_at] --> SCAN[scan all Waiting flows]
SCAN --> EVAL{evaluate wait}
EVAL -->|Timer expired| RESUME1[resume]
EVAL -->|still future| STAY1[stay waiting]
EVAL -->|ExternalEvent / Manual| STAY2[stay waiting]
EVAL -->|cancel_requested| CAN[transition to Cancelled]
EXT[try_resume_external<br/>topic + correlation_id] --> MATCH{wait condition<br/>matches?}
MATCH -->|yes| RESUME2[resume + merge payload into<br/>state.resume_event]
MATCH -->|no| NOOP[no-op]
tick_at(now)— a single scan. Returns aTickReportwith counters: scanned, resumed, cancelled, still waiting, errors.run(interval, shutdown_token)— long-running loop; drive from heartbeat or a dedicated tokio task.try_resume_external(flow_id, topic, correlation_id, payload)— called by a NATS subscriber or the CLI when an external event arrives; matches against the flow's persistedwait_jsonand resumes if it fits.
Correlation ids are caller-chosen strings. Typical pattern: when a
flow delegates to another agent via agent.route.<target_id>,
include the flow's id or a fresh UUID as the correlation id, and
have the receiver echo it on reply.
Agent-facing tool
Single taskflow tool with dispatch by action:
| Action | Params | Result |
|---|---|---|
start | controller_id, goal, optional current_step (default "init"), optional state | {ok, flow} — auto-transitions Created → Running |
status | flow_id | {ok, flow} or {ok:false, error:"not_found"} |
advance | flow_id, optional patch, optional current_step | {ok, flow} with merged state |
cancel | flow_id | {ok, flow} |
list_mine | — | {ok, count, flows: [...]} |
Session tenancy
Every call derives owner_session_key = "agent:<id>:session:<session_id>".
The manager rejects any mutation whose owner does not match the
flow's — "belongs to a different session" error. Cross-session
access from the LLM is not possible.
Revision hidden from the LLM
The tool fetches the flow before every mutation and uses the live revision internally. The LLM never sees or reasons about revision numbers — fewer tokens, fewer mistakes.
CLI
agent flow list [--json]
agent flow show <id> [--json]
agent flow cancel <id>
agent flow resume <id>
listprints a table sorted byupdated_at DESCshowprints the flow plus every recorded stepcancelcallsmanager.cancel(id)resumeis a manual unblock forManualorExternalEventwaits — useful in ops / testing when an expected event never arrived
All commands honor TASKFLOW_DB_PATH (default ./data/taskflow.db).
End-to-end example
From crates/taskflow/tests/e2e_test.rs:
#![allow(unused)] fn main() { // 1. Create + run + park. let f = manager.create_managed(input).await?; let f = manager.start_running(f.id).await?; let f = manager.set_waiting(f.id, json!({"kind": "manual"})).await?; // 2. Process exits. Reopen the SAME database file from a fresh manager. let reloaded = manager.get(f.id).await?.unwrap(); assert_eq!(reloaded.status, FlowStatus::Waiting); assert_eq!(reloaded.state_json["verses_done"], 10); // partial work survived // 3. Resume picks up where we left off. let resumed = manager.resume(reloaded.id, None).await?; assert_eq!(resumed.status, FlowStatus::Running); }
Shipped shape of CreateManagedInput:
{
"controller_id": "kate/inbox-triage",
"goal": "triage inbox",
"owner_session_key": "agent:kate:session:abc",
"requester_origin": "user-1",
"current_step": "classify",
"state_json": { "messages": 10, "processed": 0 }
}
There is no YAML flow-definition format — flows are built in code
(or driven by the taskflow tool's start action).
Garbage collection
store.prune_terminal_flows(retain_days) deletes flows whose
terminal state is older than the retention window. Wire this into a
scheduled job when your flows pile up — audit trails accumulate
forever otherwise.
Gotchas
state_jsonis shallow-merged. Nested updates require the caller to build the full replacement object for the key being changed.revisionconflicts retry only twice. If two callers are fighting over a flow continuously, the second persistently surfacesRevisionMismatch— treat that as a signal that you should either serialize at a higher level, or have the loser retry at the app layer.- No flow-level mutex. The DB-level
UNIQUE (flow_id, run_id)on steps keeps step-observation races safe; revision checks keep mutation races safe. But two observers can read a flow simultaneously — don't rely on read-time consistency for decisions. wait_jsonis cleared on resume. If you need to remember the wait condition for audit purposes, theflow_eventstable has it.
Wait / resume
Durable flows can park themselves between steps. The runtime drives
parked flows back to Running either on a wall-clock deadline (timer),
when an external signal arrives (NATS), or when an operator resumes
them by hand (manual).
Two pieces wire this together:
WaitEngine— single global tokio task. Everytick_intervalit scansWaitingflows and resumes any whose timer has fired or whose cancel intent has been set.taskflow.resumebridge — single broker subscriber that translates incoming events intoWaitEngine::try_resume_externalcalls.
Source: crates/taskflow/src/engine.rs, src/main.rs::spawn_taskflow_resume_bridge.
Wait conditions
The wait_json column on a flow stores one of:
| Kind | Shape | Resumed by |
|---|---|---|
timer | {kind:"timer", at:"<RFC3339>"} | WaitEngine.tick() once now >= at |
external_event | {kind:"external_event", topic:"…", correlation_id:"…"} | taskflow.resume bridge with matching (topic, correlation_id) |
manual | {kind:"manual"} | Explicit manager.resume(...) (CLI / ops) |
Timer.at is validated by the tool against taskflow.timer_max_horizon
(default 30 days). Past deadlines and topics/correlation_ids that are
empty are rejected before the flow ever enters Waiting.
Tool actions
The taskflow tool exposes the LLM-facing surface. Beyond the existing
start | status | advance | cancel | list_mine, three actions drive
the wait/resume lifecycle:
wait
{
"action": "wait",
"flow_id": "…uuid…",
"wait_condition": {"kind": "timer", "at": "2026-04-26T09:00:00Z"}
}
Move flow Running → Waiting. Validates wait_condition shape and
guardrails before persisting.
finish
{
"action": "finish",
"flow_id": "…uuid…",
"final_state": {"result": "ok"}
}
Move flow → Finished. final_state (optional) is shallow-merged
into state_json before transition.
fail
{
"action": "fail",
"flow_id": "…uuid…",
"reason": "downstream-error"
}
Move flow → Failed. reason is required. The reason is stamped
under state_json.failure.reason and recorded in the audit event.
NATS resume bridge
A single subscriber lives at taskflow.resume. Anything that wants to
wake a parked flow publishes a JSON message there:
{
"flow_id": "f5e0…",
"topic": "agent.delegate.reply",
"correlation_id": "corr-42",
"payload": {"answer": 42}
}
The bridge calls WaitEngine::try_resume_external(flow_id, topic, correlation_id, payload). If the flow is Waiting with a matching
external_event condition, it resumes; the payload (if any) is
merged into state_json.resume_event. Mismatches and unknown flow
ids are silent debug logs.
Example with the nats CLI:
nats pub taskflow.resume '{
"flow_id": "f5e0…",
"topic": "agent.delegate.reply",
"correlation_id": "corr-42",
"payload": {"answer": 42}
}'
Single subject (no flow_id in suffix) is intentional — it keeps the subject namespace flat and avoids per-flow subscription churn. Volume is expected to be low (<10/s); if that ever changes, the bridge can shard internally without protocol changes.
Configuration
config/taskflow.yaml (optional; absent → defaults):
tick_interval: 5s # WaitEngine cadence
timer_max_horizon: 30d # max future Timer.at allowed by tool
db_path: ./data/taskflow.db # also honored via TASKFLOW_DB_PATH
agents.yaml enables the tool per agent:
agents:
- id: kate
plugins: [taskflow, memory]
Without taskflow in plugins, the agent does not see the tool —
the engine and bridge still run process-wide.
Tick interval guidance
5s(default) is plenty for human-scale timers.- Bring it down to
1sonly if you have sub-minute timers and care about the worst-case lag. - The tick is idempotent and pull-based; missing a tick is harmless.
Telemetry
Each tick logs at debug level when scanned > 0:
DEBUG wait engine tick scanned=3 resumed=1 cancelled=0 still_waiting=2 errors=0
The bridge logs at info on each successful resume:
INFO taskflow resumed via NATS flow_id=… topic=…
Identity & workspace
Every agent has a workspace directory — a small set of markdown files that describe who it is, what it knows, and how it's meant to behave. The runtime loads those files at session start and injects them into the system prompt. The agent reads them; some of them, the agent also writes back to.
Source: crates/core/src/agent/workspace.rs,
crates/core/src/agent/self_report.rs.
Workspace files
<workspace>/
├── IDENTITY.md # 10.1 — persona facts (name, vibe, emoji)
├── SOUL.md # 10.2 — prompt-like character document
├── USER.md # who the human is (if single-user)
├── AGENTS.md # peers this agent knows about
├── MEMORY.md # 10.3 — self-curated facts index
├── DREAMS.md # dreaming diary (10.6)
├── notes/ # per-day notes
└── .git/ # 10.9 — per-agent repo for forensics
Configured per agent:
agents:
- id: kate
workspace: ./data/workspace/kate
workspace_git:
enabled: true
IDENTITY.md (phase 10.1)
Short, structured. Five optional fields parsed from a markdown bullet list:
- **Name:** Kate
- **Creature:** octopus
- **Vibe:** warm but sharp
- **Emoji:** 🐙
- **Avatar:** https://.../kate.png
The parser:
- Silently skips template placeholders in parens (e.g.
_(pick something)_) so the bootstrap template never leaks into the persona - Produces an
AgentIdentity { name, creature, vibe, emoji, avatar }struct, all fieldsOption<String>
Rendered into the system prompt as a single line:
# IDENTITY
name=Kate, emoji=🐙, vibe=warm but sharp
SOUL.md (phase 10.2)
Free-form markdown. No parsing. Injected verbatim after the IDENTITY block. This is where long-form character, operating principles, tone, and hard rules live.
Loaded on every session start. Main and shared sessions both see SOUL.md — the privacy boundary is MEMORY.md, not SOUL.md (shared groups should never leak private memories, but the persona is fine to surface).
MEMORY.md (phase 10.3)
The agent's self-curated index of things it remembers. Markdown sections with bullet lists — no special schema:
## People
- Luis prefers Spanish but is fine switching to English.
- Ana uses a Samsung, not an iPhone.
## Dreamed 2026-04-23 03:00 UTC
- User's timezone is America/Bogota _(score=0.42, hits=5, days=3)_
- Prefers short replies on WhatsApp _(score=0.38, hits=4, days=2)_
## Open questions
- What phone carrier does Luis use?
Scope rules:
- Loaded only in main (DM-style) sessions. Group and broadcast sessions never see MEMORY.md — per-user facts must not leak into multi-user chats.
- Appended automatically by dreaming sweeps (Phase 10.6)
- Truncation: 12 000 chars per file cap (whole workspace total
budget: 60 000 chars). Exceeding files get a
[truncated]marker.
USER.md and AGENTS.md
- USER.md — who this agent is talking to. Loaded in main sessions only.
- AGENTS.md — which peers this agent can delegate to. Pairs
with
allowed_delegatesin agents.yaml.
Both are free-form markdown read into the prompt.
Transcripts (phase 10.4)
Per-session, append-only JSONL files in transcripts_dir:
{"type":"session","version":1,"id":"<uuid>","timestamp":"2026-04-24T...","agent_id":"kate","source_plugin":"telegram"}
{"type":"entry","timestamp":"...","role":"user","content":"hello","message_id":"...","source_plugin":"telegram","sender_id":"user123"}
{"type":"entry","timestamp":"...","role":"assistant","content":"hello Luis","source_plugin":""}
- One file per session at
<transcripts_dir>/<session_id>.jsonl - No time-based rotation (session close = file close)
- First line is a session header with metadata, every subsequent line is a turn
Transcripts are write-only from the runtime's point of view — they're for replay, audit, and human review, not read-back into the prompt.
Self-report tools (phase 10.8)
Four tools let the agent inspect its own state:
| Tool | Returns | Use |
|---|---|---|
who_am_i | {agent_id, model, workspace_dir, identity{…}, soul_excerpt} | When asked "who are you?" |
what_do_i_know | {sections: [{heading, bullets}], truncated} with optional filter | Search MEMORY.md by section name |
my_stats | {sessions_total, memories_stored, memories_promoted, last_dream_ts, recall_events_7d, top_concept_tags_7d, workspace_files_present} | Meta-awareness |
session_logs | {ok, sessions/entries/hits, …} — actions: list_sessions, read_session, search, recent | Inspect own JSONL transcripts for self-reflection, debugging, cross-session search |
The first three return concise JSON designed for the LLM to consume in
one turn. Soul excerpt in who_am_i is truncated to 2 048 chars;
what_do_i_know caps at 6 144 bytes serialized with at most 10
bullets per section.
session_logs is registered automatically when the agent has a non-empty
transcripts_dir. It is scoped to that directory — agents cannot read
each other's transcripts. Default limits: 50 entries per call (max 500),
200 chars per content preview (max 4 000). When recent is invoked
without session_id, it defaults to the current session. If the agent's
allowed_tools patterns exclude session_logs, it is filtered after
registration like every other tool.
Load flow
flowchart TD
SESSION[new session] --> LOADER[WorkspaceLoader.load scope]
LOADER --> SCOPE{scope}
SCOPE -->|Main| FULL[load IDENTITY + SOUL + USER +<br/>AGENTS + daily notes + MEMORY]
SCOPE -->|Shared| SHARED[load IDENTITY + SOUL +<br/>AGENTS only]
FULL --> TRUNC[enforce 12k/file, 60k total]
SHARED --> TRUNC
TRUNC --> RENDER[render_system_blocks<br/>into prompt]
RENDER --> PROMPT[# IDENTITY<br/># SOUL<br/># USER<br/># AGENTS<br/># MEMORY]
Next
- MEMORY.md — write cadence and promotion rules
- Dreaming — how sleeps turn recall signals into MEMORY.md entries
MEMORY.md + recall signals + workspace-git
This page covers everything about how what the agent knows evolves over time: the MEMORY.md index, the recall signals that drive dreaming, how concept tags are derived, and how the workspace-git repo captures a full audit history.
For the underlying storage mechanics (tables, queries, vector index), see Memory — long-term.
What goes where
flowchart LR
subgraph DB[SQLite data/memory.db]
MEM[memories]
FTS[memories_fts]
REC[recall_events]
PROM[memory_promotions]
end
subgraph WS[workspace dir]
MD[MEMORY.md]
DRM[DREAMS.md]
GIT[.git]
end
TOOL[memory.remember] --> MEM
TOOL --> FTS
MEM -. recall hits .-> REC
REC --> DRM2[dream sweep]
DRM2 --> PROM
DRM2 --> MD
DRM2 --> DRM
CHK[forge_memory_checkpoint] --> GIT
DRM2 --> GIT
Three layers, each with a different update cadence:
| Layer | Write trigger | Consumer |
|---|---|---|
memories table | Agent calls memory.remember | Next turn's memory.recall |
recall_events table | Every memory.recall hit | Dream sweep (10.6) |
memory_promotions table | Promotion during dream | Prevents double-promote across sweeps |
MEMORY.md | Dream sweep (10.6) | Next session's system prompt (main scope only) |
DREAMS.md | Dream sweep (10.6) | Historical diary for humans + my_stats |
.git | Dream finish, session close, forge_memory_checkpoint | memory_history tool, post-mortem via git log |
Recall signals (phase 10.5)
The recall_events table captures every hit of memory.recall:
CREATE TABLE recall_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
agent_id TEXT,
memory_id TEXT,
query TEXT, -- the search string that surfaced this memory
score REAL, -- relevance score from the recall call
ts_ms INTEGER
);
Aggregation over a per-memory window produces the signals struct consumed by dreaming:
| Signal | Meaning |
|---|---|
frequency | Log-normalized count of hits |
relevance | Mean score across hits |
recency | Exponential decay from last-hit timestamp |
diversity | Distinct query strings, normalized (saturates at 5+) |
recall_count | Raw hit count — used by gates |
unique_days | Distinct UTC days the memory was surfaced |
Each weighted and summed into the score that drives promotion (see Dreaming).
Concept tags (phase 10.7)
Every memory row has a concept_tags JSON column populated at insert
time — not via TF-IDF but via a deterministic pipeline:
- Glossary match. Hard-coded list of protected tech terms
(multilingual) —
backup,openai,migration, etc. - Compound tokens. Regex preserves file paths and identifiers
(
src/main.rs,camelCaseNames). - Unicode word segmentation.
UAX #29word boundaries split the rest. - Per-token rules:
- NFKC normalization + lowercase
- 32-char max; 3-char min for Latin, 2-char min for CJK
- Reject pure digits, ISO dates, and 100+ shared stop-words across English, Spanish, and path noise
- Underscores converted to dashes
Output capped at 8 tags per memory. Stored as JSON array on the
memories row; expanded into keyword recall searches as part of the
FTS5 MATCH query.
Dream sweeps backfill tags for older memories that were created before the tagging pipeline existed.
MEMORY.md write cadence
Dreaming sweeps append blocks:
## Dreamed 2026-04-24 03:00 UTC
- Luis lives in Bogota and prefers Spanish _(score=0.42, hits=5, days=3)_
- Kate should default to short WhatsApp replies _(score=0.38, hits=4, days=2)_
- One block per sweep
- Promoted memories shown as bullets with score, hit count, unique days
- Existing sections preserved; the file is only ever appended to (manual editing by humans is fine — the dream sweep appends a new block rather than rewriting anything)
Privacy rules:
- MEMORY.md is injected into main-scope sessions only. Groups / broadcasts never see it.
transcripts_diris separate from workspace and is not committed to workspace-git by default.
Workspace-git (phase 10.9)
When workspace_git.enabled: true, the agent's workspace
directory is a git repo. Commits happen automatically at three
moments:
flowchart LR
T1[dream sweep finishes] --> C[commit_all promote]
T2[session close<br/>on_expire callback] --> C2[commit_all session-close]
T3[forge_memory_checkpoint<br/>tool call] --> C3[commit_all checkpoint:note]
C --> LOG[.git history]
C2 --> LOG
C3 --> LOG
Mechanics (crates/core/src/agent/workspace_git.rs):
- Staged: every non-ignored file (respects auto-generated
.gitignore) - Skipped: files larger than 1 MiB (
MAX_COMMIT_FILE_BYTES) - Idempotent: no-op commit when the tree is clean
- Author:
{agent_id} <agent@localhost>(configurable viaworkspace_git.author_name/author_email) - Auto
.gitignoreexcludestranscripts/,media/,*.tmp,*.swp,.DS_Store - No remote configured by default; operators add one if forensic archival matters
Tools that touch git
| Tool | Purpose | Returns |
|---|---|---|
forge_memory_checkpoint(note) | Commit right now with checkpoint: <note> subject | {ok, oid(short), subject, skipped} |
memory_history(limit?, include_diff?) | git log of the last limit commits (max 100); optional unified diff oldest→HEAD | {commits: [...], diff?} |
Good uses of explicit checkpoints:
- Before a risky update sequence the agent is about to perform
- After receiving a non-obvious instruction from the user
- As bookends around a
taskflowstep boundary
Gotchas
- MEMORY.md can grow unbounded over years. Workspace-git keeps
the history; but the in-prompt view is truncated at 12 KB. Keep an
eye on size, prune old
## Dreamedblocks if they stop being useful. - Concept-tag derivation is deterministic per content. Editing a memory's content in-place does not re-derive tags — the tags that were computed at insert stick. Re-insert to refresh.
git logreplays tell the truth. If you're debugging a surprising agent behavior,memory_history --include-diffis the fastest way to see what the agent wrote to itself and when.
Dreaming
"Dreaming" is a scheduled offline sweep that consolidates an agent's memory. It reads recall signals, scores each memory that was recently surfaced, promotes the strongest ones into MEMORY.md, and commits the workspace-git repo.
Source: crates/core/src/agent/dreaming.rs.
When it runs
# agents.yaml
agents:
- id: kate
heartbeat:
enabled: true
interval: 30s
dreaming:
enabled: false
interval_secs: 86400 # 24 h
min_score: 0.35
min_recall_count: 3
min_unique_queries: 2
max_promotions_per_sweep: 20
weights:
frequency: 0.24
relevance: 0.30
recency: 0.15
diversity: 0.15
consolidation: 0.10
Dreaming is heartbeat-driven: it ticks inside the heartbeat loop
and actually sweeps when interval_secs has elapsed since the last
sweep. Disable the heartbeat and dreaming stops firing.
Default interval_secs: 86400 (24 hours). Run nightly or tune down
for high-throughput agents.
Three phases (Light / REM / Deep)
Conceptually borrowed from the OpenClaw design, nexo-rs ships Light → Deep:
flowchart LR
START[sweep tick] --> LIGHT[Light:<br/>gather memories with<br/>>=1 recall event]
LIGHT --> DEEP[Deep:<br/>score + gate + promote]
DEEP --> WRITE[append MEMORY.md block]
WRITE --> DIARY[append DREAMS.md entry]
DIARY --> GIT[commit workspace]
(REM — thematic summarization with an LLM — is intentionally deferred.)
Scoring
For each candidate memory:
score = w.frequency × frequency
+ w.relevance × relevance
+ w.recency × recency
+ w.diversity × diversity
+ w.consolidation × consolidation
Where the signals come from recall_events.
Consolidation is a modest bias toward memories that recurred in diverse queries over multiple days — taking the memory from "hit once" to "actually load-bearing."
Gates
A candidate is promoted only if all of these hold:
| Gate | Default | Meaning |
|---|---|---|
recall_count >= min_recall_count | 3 | Surfaced at least 3 times |
unique_days >= 1 | 1 | Not all hits on the same day |
distinct_queries >= min_unique_queries | 2 | More than one query style hit it |
score >= min_score | 0.35 | Weighted composite over the threshold |
!is_promoted(memory_id) | — | Not already promoted in a prior sweep |
Up to max_promotions_per_sweep (default 20) promoted per run;
ordered by descending score.
Outputs
MEMORY.md append
## Dreamed 2026-04-24 03:00 UTC
- Luis lives in Bogota and prefers Spanish _(score=0.42, hits=5, days=3)_
- Kate should default to short WhatsApp replies _(score=0.38, hits=4, days=2)_
Only memories promoted this sweep appear in the block.
DREAMS.md diary
A longer-form diary entry the agent can read back in
my_stats().last_dream_ts context. One per sweep.
Side effects
memory_promotionsrow per promoted memory (prevents double-promote across sweeps)concept_tagsbackfilled on older memories that were created before the tagging pipeline landedworkspace_git.commit_all("promote", <body with delta>)captures the full change
Idempotency
Re-running a sweep during the same interval is a no-op:
- Promotions consult
memory_promotionsbefore writing - MEMORY.md is appended to, not rewritten
- Git commit returns cleanly with
skipped: truewhen the tree is unchanged
You can safely call a manual "dream now" during a stuck session
(currently via restart with a lowered interval_secs) without
corrupting state.
Safety rails
- Shutdown cancellation. Dream sweeps run under a cancellation
token tied to the shutdown sequence. Partial sweeps don't leave
inconsistent state — the atomic trio (DB row + MEMORY.md append
- git commit) runs after all candidates are scored and gated.
- Heartbeat-only. Dreaming never fires from a user message turn, so a long sweep cannot block a user response.
- Read-mostly. Sweep reads from
recall_events; the only writes arememory_promotions, MEMORY.md append, DREAMS.md append, and git commit. Existing memory rows are untouched except for tag backfill.
What dreaming is not
- Not a summarizer. It does not rewrite content.
- Not a deduplicator. Two similar memories remain two memories; the recall layer will simply surface both and let the LLM pick.
- Not an LLM call. The whole sweep is deterministic — no model inference, no per-sweep cost.
Tuning
| Situation | Change |
|---|---|
| Memories stay too cold to promote | Lower min_score (e.g. 0.25) |
| Too many noise promotions | Raise min_recall_count to 5 |
| MEMORY.md grows too fast | Lower max_promotions_per_sweep |
| Very chatty agent | Increase interval_secs — 24 h is already safe |
Observability
Every sweep emits a summary log line with:
- candidates scanned
- candidates promoted
- skipped (already promoted)
- score range of the promoted set
- workspace-git commit OID (or "clean tree")
Wire it into Prometheus via log scraping if you want time-series counters — no dedicated metric is exposed yet.
Gotchas
- Turning dreaming on with
min_scoredefault produces a long first sweep. If the agent has been running for weeks without dreaming, there are a lot of candidates. Expect the first sweep to promote near the cap and subsequent sweeps to tail off. - Concept-tag backfill is O(candidates). Large backlogs will show first-sweep latency proportional to the candidate count. Not a bug — run the first sweep in a maintenance window if the backlog is large.
interval_secsis measured from last completed sweep. A failed sweep does not reset the clock — a retry will fire on the next heartbeat tick regardless.
Two-tier consolidation: light + deep (Phase 80.1)
Everything above describes the light pass — a deterministic scoring sweep that runs on the heartbeat. Phase 80.1 adds a deep pass: a forked subagent that periodically scans transcripts and rewrites the memory directory in-depth. The two pillars complement each other.
| Dimension | Light pass (scoring) | Deep pass (fork) |
|---|---|---|
| Crate | crates/core/src/agent/dreaming.rs | crates/dream/ |
| Cadence | Every heartbeat tick | Every 24 h, ≥ 5 transcripts |
| Cost | ~1 SQLite query + ranking | A forked LLM goal, up to 30 turns |
| Writes | Append to MEMORY.md | Rewrite top-level *.md files in memory_dir |
| Failure mode | Returns empty DreamReport | Fails the audit row, rolls back the lock |
| Coordination | Defers when deep pass holds the lock | Acquires lock for the duration of the fork |
| Reference | Phase 10.6 (existing) | Phase 80.1 (this) |
You can run either alone or both together. Both alone are
production-safe; both together share the same memory_dir and the
deep pass briefly suspends the light pass while it runs (see
Coordination below).
Deep pass via fork (Phase 80.1)
The deep pass spawns a forked subagent — a fresh ChatRequest with
skip_transcript: true and a 4-phase consolidation prompt — to
rewrite memory under a constrained tool whitelist.
Gates (cheapest first)
A turn fires the fork only when all of these hold:
kairos_active == false(KAIROS uses a disk skill, skip to avoid double-fire).is_remote_mode() == false.is_auto_memory_enabled() == true.auto_dream.enabled == true(per-binding YAML).- Time gate:
hours_since(last_consolidated_at) ≥ min_hours(default 24 h). - Scan throttle: bail if a scan ran in the last 10 min.
- Session gate:
≥ min_sessionstranscripts touched since last fork (default 5). - Lock acquire:
try_acquire_consolidation_lock()succeeds.
If any gate rejects, the runner returns RunOutcome::Skipped { gate }
without firing.
ConsolidationLock
The lock file lives at <memory_dir>/.consolidate-lock. Single
instance per binding (one fork at a time). Properties:
- mtime IS
lastConsolidatedAt— onestat()per turn is cheaper than reading a separate state file. - Body is the holder's PID. The lock is stale if the PID is dead
OR
now - mtime ≥ holder_stale(default 1 h). - No heartbeat. If a fork legitimately runs longer than 1 h,
raise
holder_stale. try_acquire: write our PID, re-read; if matches → acquired.rollback(prior_mtime): rewind mtime to pre-acquire.prior == 0→ unlink.
The path is canonicalized at construction so a later symlink swap cannot redirect the lock target.
4-phase consolidation prompt
The forked subagent runs through:
- Orient — read existing
MEMORY.md, top-level*.mdfiles, recent transcripts. - Gather — extract candidate facts, decisions, patterns from the sessions since the last consolidation.
- Consolidate — rewrite the memory files, merging duplicates, refining wording.
- Prune — drop stale entries, keep the index lean.
See crates/dream/src/consolidation_prompt.rs for the full prompt template.
AutoMemFilter (Phase 80.20)
The fork only sees memory-safe tools:
FileRead,Glob,Grep,REPL— unrestricted.Bash— only whenbash_security::is_read_onlyreturns true (~45 read-only utilities:ls,find,grep,cat,stat,wc,head,tail, ...).FileEdit,FileWrite— only when the path resolves under the agent's canonicalmemory_dir. Paths outside trigger a structured denial.
Provider-agnostic — the filter runs at the dispatch layer, not the LLM provider layer.
Post-fork escape audit
After a fork completes, the runner re-scans for any FileEdit/Write
that landed outside memory_dir (e.g. via a Bash redirect that
slipped through). If found, the outcome flips to
RunOutcome::EscapeAudit { run_id, escapes, prior_mtime } and the
audit row is updated. This is defense-in-depth on top of
AutoMemFilter.
Cap
MAX_TURNS = 30. Server-side enforced. The fork is bounded; if the
prompt explodes, the cap closes the run with RunOutcome::TimedOut.
Coordination: skip pattern (Phase 80.1.e)
When both passes are enabled, the light pass checks the
consolidation lock at the start of run_sweep. If a live PID is
holding the lock, the light pass skips entirely:
#![allow(unused)] fn main() { if let Some(probe) = &self.consolidation_probe { if probe.is_live_holder() { return Ok(DreamReport { deferred_for_fork: true, candidates_considered: 0, promoted: vec![], .. }); } } }
The light pass logs:
INFO dreaming agent_id=kate dream sweep deferred — autoDream fork holds consolidation lock
Trade-off: a memory that would have been promoted during the fork window is deferred to the next turn. Memories that score high still score high next turn — recoverable. The cost is at most one turn of latency vs the complexity of a buffer pattern (which we considered and rejected).
The pattern is mutually-exclusive-per-turn: when one writer is active, the other defers entirely. Recoverable on the next turn.
If the light pass runs without the deep pass enabled, the probe is
None and the skip arm never fires — original behaviour preserved.
Audit trail
Two artifacts let you reconstruct what every fork did:
SQLite dream_runs table (Phase 80.18)
<state_root>/dream_runs.db carries one row per fork run:
| Column | Type | Notes |
|---|---|---|
id | UUID | Primary key, also the run_id echoed to git commits |
goal_id | UUID | The driver-loop goal that triggered the fork |
status | enum | Running → Completed / Failed / Killed / LostOnRestart |
phase | enum | Starting → Updating (flips on first FileEdit) |
sessions_reviewing | int | Count of transcripts the fork looked at |
prior_mtime_ms | int? | Lock mtime before acquire (for rollback). Some(0) is distinct from None. |
files_touched | JSON | Array of PathBuf — paths the fork wrote to (deduplicated) |
turns | JSON | Last MAX_TURNS = 30 assistant turns. Trimmed server-side. |
started_at | TS | When the fork acquired the lock |
ended_at | TS? | When the run reached terminal status |
fork_label | string | auto_dream, away_summary, eval, ... |
fork_run_id | UUID? | Optional pointer to nexo_fork::ForkHandle::run_id |
Defenses: server-side MAX_TURNS = 30 cap, tail clamped at
TAIL_HARD_CAP = 1000, idempotent insert on (goal_id, started_at).
Git commits (Phase 80.1.g)
When workspace_git.enabled = true for the binding, every successful
fork that touched files lands a commit:
auto_dream: 3 file(s) consolidated
audit_run_id: 7a3b2f00-deaf-cafe-beef-001122334455
- MEMORY.md
- decisions/2026-04.md
- followups.md
Cross-link from git log back to the SQLite row:
$ git -C <workspace> log --grep "auto_dream" --pretty=oneline
<oid> auto_dream: 3 file(s) consolidated
$ nexo agent dream status 7a3b2f00-deaf-cafe-beef-001122334455
The Phase 77.7 secret guard runs transparently before each commit —
a fork that somehow wrote a credential lands Err, the warning is
logged, and the audit row stays intact (the audit row is the source
of truth; the commit is bonus forensics).
Operator CLI: nexo agent dream (Phase 80.1.d)
Three sub-commands. None require a running daemon — they read the
SQLite store directly. Read paths use a read-only pool; kill uses
a read-write pool plus a filesystem lock-file rewind.
tail — list recent runs
$ nexo agent dream tail
# Dream Runs (db: /home/.../state/dream_runs.db)
| ID | Goal | Status | Phase | Sessions | Files | Started | Ended | Label |
|----------|----------|-----------|----------|----------|-------|---------------------|---------------------|------------|
| 7a3b2f00 | b91c2d3a | Completed | Updating | 5 | 3 | 2026-04-30T10:12:01 | 2026-04-30T10:13:45 | auto_dream |
| f88e1100 | b91c2d3a | Failed | Starting | 7 | 0 | 2026-04-30T08:00:01 | 2026-04-30T08:00:42 | auto_dream |
2 rows shown (last 20).
Filter by goal, change page size, or get JSON for scripting:
$ nexo agent dream tail --goal=b91c2d3a-... --n=5
$ nexo agent dream tail --json | jq '.[] | select(.status == "Failed") | .id'
Empty / missing DB returns a friendly message and exit 0:
$ nexo agent dream tail
(no dream runs recorded yet — db not found at /home/.../state/dream_runs.db)
status — single run detail
$ nexo agent dream status 7a3b2f00-deaf-cafe-beef-001122334455
# Dream Run 7a3b2f00-deaf-cafe-beef-001122334455
- **goal_id**: b91c2d3a-...
- **status**: Completed
- **phase**: Updating
- **sessions_reviewing**: 5
- **fork_label**: auto_dream
- **started_at**: 2026-04-30T10:12:01Z
- **ended_at**: 2026-04-30T10:13:45Z
- **prior_mtime_ms**: 1745939518000
## Files touched (3):
- MEMORY.md
- decisions/2026-04.md
- followups.md
kill — abort a running fork
$ nexo agent dream kill 7a3b2f00-... --force --memory-dir=/path/to/memory
[dream-kill] run_id=7a3b2f00-... status was Running, transitioning to Killed
[dream-kill] lock rollback: prior_mtime=1745939518000 → memory_dir=/path/to/memory
[dream-kill] done
Without --force on a Running row, the command warns and exits 2:
[dream-kill] run_id=7a3b2f00-... is still Running. Pass --force to abort.
Without --memory-dir, status flips but the lock is NOT rewound —
the next fork tick may see the stale mtime as if a consolidation
just completed:
[dream-kill] WARN: status flipped but lock not rolled back. Pass --memory-dir <path> next time to rewind the consolidation lock.
Already-terminal rows are no-op:
[dream-kill] run_id=7a3b2f00-... already in terminal state Completed; nothing to do
Database path resolution
The CLI resolves the dream-runs DB in three tiers:
--db <path>(explicit override, beats everything).NEXO_STATE_ROOTenv →<state_root>/dream_runs.db.- XDG default
~/.local/share/nexo/state/dream_runs.db.
The YAML tier is intentionally absent — agents.state_root is not
a config field today (state_root flows into BootDeps directly per
Phase 80.1.b.b.b). Set NEXO_STATE_ROOT to align the CLI with your
daemon's actual data dir.
LLM tool: dream_now (Phase 80.1.c)
When enabled, the agent itself can force a memory consolidation mid-turn:
{
"name": "dream_now",
"description": "Force a memory consolidation pass now, bypassing time/session gates. Use when you've just learned a lot and want it consolidated into long-term memory before continuing.",
"parameters": {
"type": "object",
"properties": {
"reason": {
"type": "string",
"description": "Optional human-readable reason recorded in the audit row."
}
},
"additionalProperties": false
}
}
The tool returns a structured envelope across all six RunOutcome
variants:
{
"outcome": "completed",
"run_id": "7a3b2f00-deaf-cafe-beef-001122334455",
"files_touched": ["MEMORY.md"],
"duration_ms": 12450,
"reason": "user just locked in 4 architectural decisions"
}
Other outcomes: skipped (with gate field), lock_blocked
(another fork in progress), errored, timed_out, escape_audit.
Capability gate (Phase 80.1.c.b)
Two layers must both allow the tool for it to register on a binding's surface:
- Host-level: operator must
export NEXO_DREAM_NOW_ENABLED=true. Default is deny — without the env var,register_dream_now_toolshort-circuits withtracing::info!("dream_now: host-level capability gate closed; tool not registered"). - Per-binding: Phase 16
allowed_tools: ["dream_now", ...]must include the tool name on the binding's allowlist.
Verify with:
$ nexo setup doctor capabilities
# ... capability table ...
| dream | NEXO_DREAM_NOW_ENABLED | enabled | Medium | Allow the LLM to force a memory-consolidation pass via the `dream_now` tool. ... |
The capability listing is provider-agnostic. Same gate semantics under Anthropic, MiniMax, OpenAI, Gemini, DeepSeek, xAI, Mistral.
Configuration (Phase 80.1)
# agents.yaml
agents:
- id: kate
workspace_git:
enabled: true # required for auto_dream → git commits (Phase 80.1.g)
author_name: "kate"
author_email: "kate@nexo.local"
dreaming:
enabled: true
interval_secs: 86400
# ... existing scoring-sweep config from sections above
auto_dream:
enabled: true
min_hours: 24h
min_sessions: 5
scan_interval: 10min
holder_stale: 1h
fork_timeout: 5min
memory_dir: null # null = default <workspace>/.nexo-memory/<agent_id>
Boot logging confirms wiring:
INFO boot.auto_dream agent=kate auto_dream runner registered git_checkpoint_wired=true
Setting auto_dream.enabled = false (or omitting the block
entirely) disables the deep pass — the light pass keeps running
under dreaming.enabled = true. Setting dreaming.enabled = false
turns off the light pass but leaves the deep pass independent.
See also
- Phase 10.9 git-backed memory —
crates/core/src/agent/workspace_git.rs::MemoryGitRepo - Phase 18 hot-reload —
auto_dreamconfig changes apply without restart viaArcSwap - Phase 77.7 secret guard — auto-applied to all git commits, blocks credentials before they land
- Phase 80.18 audit row —
crates/agent-registry/src/dream_run.rs - Phase 80.20 AutoMemFilter —
crates/fork/src/auto_mem_filter.rs
The autonomous agent — capabilities overview
This page is the bird's-eye map of what an agent running on
nexo can actually do without you holding its hand. Every
sub-feature has its own page (linked at the end of each
section); this page exists so you can see the whole picture
without piecing it together from individual reference docs.
"Autonomous" here doesn't mean "AGI". It means: the agent runs in the background, decides when to act on its own schedule, remembers what it has learned, talks to the user through every channel the operator wired (Slack, Telegram, iMessage, email, WhatsApp), approves or escalates risky actions through curated gates, and survives daemon restarts without losing context.
The agent never executes anything the operator didn't authorise in YAML. Every autonomous behaviour is a knob the operator flips on with explicit consent — there are no implicit defaults that ship a user from "ran nexo for the first time" to "the agent is texting my boss".
1. Living in the background
The agent doesn't need a foreground TTY to run.
- Session kinds — every running goal carries a
SessionKindenum:Interactive(default, attached to a terminal),Bg(detached background goal —nexo agent run --bg <prompt>),Daemon(a long-running goal supervised by the daemon process itself), orDaemonWorker(a child of a daemon). nexo agent run --bg "<prompt>"spawns a goal, returns thegoal_idimmediately, detaches. The agent keeps running even after you close the terminal.nexo agent pslists running goals filtered by kind;--allincludesInteractive. RO SQLite — works without a daemon up.nexo agent attach <goal_id>renders a markdown snapshot of any goal: kind, status, phase, started_at, finished_at, diff_stat, last decision, last event. Useful to check progress without interrupting.nexo agent discoverlists Running goals filtered to detached / daemon kinds. Pass--include-interactiveto broaden.- Reattach on restart — boot flips prior-run
Runningrows toLostOnRestartand firesnotify_originonce per goal so the originating chat sees a clean[abandoned]closure instead of silence. - Drain on SIGTERM —
drain_running_goalsruns BEFORE plugin teardown so[shutdown]notify_originactually leaves the channel before the daemon dies. Per-hook 2 s timeout prevents stuck publishers from hanging shutdown.
→ See Background agents (agent run --bg / ps / attach)
2. Memory + self-improvement
The agent learns. Three tiers, each with a different cost / durability trade-off.
- Short-term memory — per-session, in RAM, scoped to the current goal. Cheap; gone on goal completion.
- Long-term memory — SQLite + sqlite-vec embeddings
(
crates/memory/src/long_term.rs). Survives restarts; searchable by semantic + lexical query. - Git-backed
MEMORY.md— every memory promotion writes a markdown file and commits it to a per-agent git repo. Full history; operator cangit log MEMORY.mdto audit what the agent decided to remember.
Three self-improvement loops the agent runs without operator intervention:
- Light-pass dreaming — scoring-based consolidation runs every N turns. Cheap, no LLM call, just promotes warm memories via decay × access × recency.
- Deep-pass autoDream (Phase 80.1) — heavier consolidation
via a forked sub-agent with its own 4-phase prompt, runs
behind 7 gates: kairos active, time-since-last (default 24 h),
session count ≥ 5 transcripts, scan throttle (10 min), live
consolidation lock (PID + mtime), force bypass, post-fork
escape audit. Deferred for fork (
deferred_for_fork: true) when another process holds the lock — promotions land on the next turn rather than racing. extract_memories(Phase 77.5) — post-turn LLM-driven extraction. After each turn, a small LLM call asks "what surprised you, what did you learn, what should we remember?" and writes structured memory rows.
Defenses:
- Secret scanner (Phase 77.7) — regex set blocks Anthropic / OpenAI / GitHub / AWS / Stripe / Google / JWT key shapes before any memory commit. Fails the commit loud.
AutoMemFilter(Phase 80.20) — when a forked sub-agent writes memory, thecan_use_toolwhitelist locksFileEdit/FileWriteto paths undermemory_dir,Bashto read-only classifier (Phase 77.8/77.9 destructive + sed-in-place defenses still apply),REPLunrestricted. Defense-in-depth.- Memdir relevance scorer (Phase 77.6) —
relevance × recency × accessranking with age decay so old / unused memories don't inflate the working-memory cost.
→ See Dreaming, Memdir scanner
3. Self-driving execution loop
When the agent receives work, what runs the loop?
- Driver-loop (Phase 67) — replaces a single LLM
request/response with a multi-turn execution:
read context → plan → propose tool calls → run permission gate → execute → inspect results → loop. Goal-scoped, with budget caps on turns + time + tokens. Persists toagent_handlesSQLite so every turn survives a daemon restart. - Acceptance autodetect (Phase 75) — at goal completion the
loop runs an autodetect pass:
cargo buildfor Rust,pyproject.tomlbuild for Python,npm testfor Node,cmake --buildfor CMake,cargo test --no-runfor cargo. Mismatch fails the goal — the agent doesn't claim "done" on a broken build. - Plan mode (Phase 79.1) —
EnterPlanModetoggle puts the agent into a read-only mode where it can only call read tools- planning advisors (no
Bash, noWrite).ExitPlanModeresolves the plan with operator approval and re-enters the full surface.
- planning advisors (no
Sleep { duration_ms, reason }tool (Phase 77.20) — the agent can decide "no work to do for now, wake me in 20 min" without holding a shell process. The runtime intercepts the sentinel result, pauses the goal, and schedules a wake-up with cache-aware timing (≤ 270 s keeps prompt cache warm, ≥ 1200 s amortises a cache miss; avoids the 270-1200 s window that pays the miss without benefit).- Forked sub-agent infra (Phase 80.19) —
delegation_toolwithmode: { Sync | ForkAndForget }. Cache-safe parameters (system_prompt+user_context+system_context+tool_use_context+fork_context_messagesall five must match parent for cache hit).skipTranscript: truekeeps the fork's messages out of the parent's history.
→ See Acceptance autodetect (deferred), Self-driving guide (deferred)
4. Time-based action
The agent can fire on its own schedule.
- Heartbeat (Phase 7) — config-time, per-agent. Every N
seconds invoke
on_heartbeat(). Used for proactive messages, reminders, periodic state sync. - Cron (Phase 79.7) — LLM-time scheduled fires. The agent
itself can call
cron_createto schedule a future task; the runtime fires it viaLlmCronDispatcher. Up to 50 entries per binding. - Cron jitter cluster (Phase 80.2-80.6) — six knobs:
enabled— global killswitch.recurring_frac— fraction of next-fire interval used as jitter window.recurring_cap_ms— absolute cap (5 min default).one_shot_max_ms/one_shot_floor_ms— backward lead for one-shots.one_shot_minute_mod— modulus gate (mod=0= never jitter one-shots).recurring_max_age_ms— auto-expire old recurring entries (permanent: trueexempt). All hot-reloadable viaArc<ArcSwap>.jitter_frac_from_entry_idderives the offset from the UUID hex prefix so retries don't move the target.
- Boot-time missed-task quarantine —
sweep_missed_entries(skew_ms)rewrites overduenext_fire_attoi64::MAXso a long-down daemon doesn't stampede on the next tick. agent_turnpoller (Phase 20) — config-time scheduled LLM turn → channel publish. Provider-agnostic; primary use case is "every morning at 7am, summarise the inbox and post to Slack".- Proactive mode (Phase 77.20) —
proactive: { enabled: true, tick_interval_secs, jitter_pct, max_idle_secs }injects a periodic<tick>message into the agent's session. The agent decides whether to act on it or callSleep. Mutually exclusive withrole: coordinator.
→ See Cron jitter (deferred), Proactive mode
5. Communication — every surface the agent can reach
5.1. Inbound from the user
- Pairing (Phase 26) — every
(channel, account_id)inbound goes through a pairing gate. Senders that haven't been allowlisted vianexo pair seedget a pairing challenge. Per-bindingpairing_policy+auto_challengeknobs. Seeded senders survive daemon restarts viaPairingStore::list_allow. - WhatsApp / Telegram / email / browser — first-party
plugins (Phases 6, 22, plus email + browser CDP). Each is a
Channelimpl that maps inbound platform events toagent.intake.<binding>broker subjects. - MCP channels (Phase 80.9) — any MCP server that declares
experimental['nexo/channel']can push user messages into the agent. Provider-agnostic: write a Slack adapter as an MCP server and the agent gets Slack inbound for free.- 5-step gate: capability + killswitch + per-binding allowlist + plugin source verification + approved allowlist.
- SQLite-backed session registry — Slack threads survive daemon restarts.
- Token bucket rate limit per server.
- Audit marker
source: "channel:<server>"in the turn-log. - Operator CLI
nexo channel list / doctor / test.
5.2. Outbound to the user
notify_origin/notify_channelhooks —Phase 67.Fcallback shape so the agent can surface mid-goal updates back to the originating channel without holding the request open.send_user_messagetool (Phase 80.8) — when brief mode is active, the agent's visible output flows through this tool.status: "normal"for replies,"proactive"for unsolicited surfacings. Free text outside the tool stays visible in the detail view.channel_sendtool (Phase 80.9) — invoke any MCP channel server's outbound tool by name. Configurableoutbound_tool_nameper server (defaultsend_message).- Reminder tool (Phase 7.3) — schedule a future message to any channel.
5.3. Inbound from the world
- Generic webhook receiver (Phase 80.12) — HTTP receiver
behind a tunnel. Configure each source by YAML:
signature_spec(HMAC-SHA256/SHA1/raw token) +event_kind_from(header or body json-path) +publish_to(subject NATS). Constant-time signature compare viasubtle::ConstantTimeEq. Provider-agnostic: GitHub, Stripe, Calendly, Zapier all in YAML. - Pollers (Phase 19) — config-time external endpoint polls. Fan-out to per-source NATS subjects.
5.4. Multi-agent coordination
- Peer inbox (Phase 80.11) — every running goal has a NATS
subject
agent.inbox.<goal_id>.list_peersreturns reachable peers (filtered byallowed_delegates);send_to_peersends a typedInboxMessagewithcorrelation_id. InboxRouter(Phase 80.11.b) — single broker subscriber onagent.inbox.>, dashmap per-goal buffers (MAX_QUEUE=64, FIFO eviction). Renders<peer-message from="..." sent_at="..." correlation_id="...">block into the agent's next turn.- Teams (Phase 79.6) — N parallel coordinated agents with a
shared scratchpad directory. Distinct from
Agent1-to-1 delegation — suited to research fan-out + massive refactors. - Delegation tool (Phase 8) — agent-to-agent routing on
agent.route.{target_id}withcorrelation_id. Sync mode awaits the response; ForkAndForget (Phase 80.19) fires the delegate without blocking.
→ See MCP channels, Multi-agent coordination, AWAY_SUMMARY
6. Permission + safety
The agent has powerful tools. The safety story is layered.
- Per-binding capability override (Phase 16) — each binding
has its own
EffectiveBindingPolicythat filtersallowed_tools, rate limits, outbound allowlists, and capability gates. Same agent can have a public WhatsApp binding (locked-down tool set) AND a private Telegram binding (full power). - Auto-approve dial (Phase 80.17) —
auto_approve: trueflips skipping the prompt for read-only / scoped-write tools while destructive Bash + writes outside workspace + ConfigTool + REPL + remote_trigger always ask.is_curated_auto_approvedecision table 25 entries with symlink-escape defense + parent-canonicalize fallback for new files.mcp_/ext_prefix default-ask. Default arm_ => false. - Capability inventory —
crates/setup/src/capabilities.rs::INVENTORYregisters every dangerous env toggle (NEXO_DREAM_NOW_ENABLED,NEXO_KAIROS_REMOTE_CONTROL, etc).nexo doctor capabilitiessurfaces every armed knob. - Bash safety (Phase 77.8-77.10):
- Destructive command warning — flags
rm -rf /-shaped invocations. - Sed-in-place + path validation — rejects
sed -iagainst paths outside the workspace. shouldUseSandboxheuristic withbwrap/firejailprobe.
- Destructive command warning — flags
- Channel permission relay (Phase 80.9.b) —
ChannelRelayDeciderdecorator races the local approval prompt against any channel reply (yes <id>/no <id>from the user's phone). First decision wins. 5-letter ID alphabet a-z minusl(anti-confusable); substring blocklist for offensive combos. Local prompt always runs in parallel — channel approval is a second surface, never a replacement. - Setup doctor —
nexo setup doctoraudits(channel, account_id)tuples, capability gates, dispatch policy consistency, pairing allowlist coverage.
→ See Auto-approve dial, Capability toggles, Bash safety knobs
7. Audit + observability
Everything the agent does leaves a trail.
- Turn-level audit log (Phase 72) — every driver-loop
AttemptResultwrites a row togoal_turnsSQLite table: outcome, decision text, summary, diff_stat, error, raw_json, plus the channelsourcemarker. 1000-row tail cap. Idempotent on(goal_id, turn_index)so a replay doesn't corrupt history. agent_turns_tail goal_id=<uuid> [n=20]tool — read tool that surfaces the last N turns of a goal as a markdown table. Post-mortem debug surface.- DreamTask audit (Phase 80.18) —
dream_runsSQLite table joined togoal_idwithstatus,phase,sessions_reviewing,files_touched (JSON),prior_mtime_ms,started_at,ended_at.dream_runs_tailLLM tool.nexo agent dream tail/status/killCLI. - Agent registry persistence (Phase 71) —
agent_handlesSQLite table tracks every Running / completed / aborted goal. Survives daemon restarts. - Channel turn-log marker (Phase 80.9.h) — channel-driven
turns write
source: "channel:<server>". Single SQL filter answers "what came in via Slack today?". - Prometheus metrics (Phase 9.2) — counters + gauges per
agent / per binding / per tool / per channel.
health.bindYAML key wires the scrape endpoint. - Tracing logs — every gate / every dispatch / every retry
emits a
tracing::info!orwarn!with structured fields (server, binding, kind, reason, error). Operator-readable. - Config-changes log (Phase 79.10) — when ConfigTool
mutates YAML, a row lands in
config_changestable with patch_id, actor_origin, allowed paths.
→ See Logging, Metrics, Turn-level audit log (deferred)
8. Operator surface
The CLI commands a human runs to drive / debug / observe the agent:
| Command | What it does |
|---|---|
nexo run --config config/agents.yaml | Daemon entrypoint |
nexo agent run [--bg] "<prompt>" | Spawn a goal |
nexo agent ps [--all] [--kind=...] | List running goals |
nexo agent attach <goal_id> | Snapshot of a goal |
nexo agent discover [--include-interactive] | List discoverable goals |
nexo agent dream tail/status/kill | DreamTask audit + control |
nexo channel list/doctor/test | MCP channels surface |
nexo pair list/seed/start/revoke | Pairing gate management |
nexo flow list/show/cancel/resume | TaskFlow runtime |
nexo setup | Interactive wizard |
nexo setup doctor | Configuration audit |
nexo setup migrate --dry-run/--apply | Schema migrations |
nexo doctor capabilities | Env toggle inventory |
nexo ext install/list/uninstall/run | Extension management |
nexo mcp-server | Run nexo as an MCP server |
→ See CLI reference
9. End-to-end use case
This is the kind of workflow the autonomous agent is built for.
Scenario: a marketing-agent named kate runs as a daemon
process, paired with the operator's Slack workspace + Telegram
account. It manages the editorial calendar and replies to user
queries during business hours.
agents:
- id: kate
model:
provider: anthropic
model: claude-sonnet-4-5
plugins: [memory, browser, web_search]
assistant_mode:
enabled: true
auto_approve: true
proactive:
enabled: true
tick_interval_secs: 1800 # check in every 30 min
max_idle_secs: 86400
auto_dream:
enabled: true
channels:
enabled: true
approved:
- server: slack
- server: telegram
inbound_bindings:
- plugin: telegram
instance: kate_tg
allowed_channel_servers: [slack, telegram]
auto_approve: true
dispatch_policy:
mode: full
What happens at runtime:
- Boot —
nexo runspawnskateas a daemon. The daemon reads the YAML, validates, opens broker, opens SQLite stores (memory, agent registry, dream runs, turn log, channel sessions, pairing). Connects the configured MCP servers. Spawns aChannelInboundLoopper(binding, server)plus a singleChannelBridgeper process. Wraps the inner permission decider inChannelRelayDecider. - First Slack DM —
alicewrites "¿qué publicamos hoy?" in Slack thread1700000000.000. The Slack MCP server emitsnotifications/nexo/channel. The runtime parses, derivessession_key = "slack|thread_ts=1700000000.000", resolves a freshsession_uuid, persists it inmcp_channel_sessions.sqlite, hands off the<channel source="slack" thread_ts="1700000000.000">to the intake. Pairing gate verifiesaliceis allowlisted (or challenges her). - Agent decides — the LLM reads recent context (long-term
memory + transcripts), decides to look up the calendar.
Calls
Bash(python check_calendar.py). Auto-approve flips the prompt away because the path is read-only and inside the workspace. - Reply — agent calls
channel_send(server: "slack", content: "Tenemos pendiente el blog post de Q2", arguments: { thread_ts: "1700000000.000" }). The runtime resolves the outbound tool name from the registered server's snapshot and invokes it through the MCP runtime. Slack MCP server posts to the Slack API. - Cron fires at 8 PM —
cron_createfrom a previous turn scheduled a daily summary. Cron runner picks it up, dispatches an LLM turn throughLlmCronDispatcher. Output goes to the operator's Telegram vianotify_channel. - Risky tool prompt — the agent decides to schedule an
email blast. The local approval prompt opens; in parallel
the runtime emits
notifications/nexo/channel/permission_requestto both Slack and Telegram. Operator's phone showsApprove "Schedule email blast?" — yes abcde / no abcde. Operator typesyes abcdein Telegram; Telegram MCP server parses, emitsnotifications/nexo/channel/permission.ChannelRelayDeciderwins the race, returnsAllowOnce. Email sends. - Operator sleeps — agent keeps running. Receives Slack
DMs from team members; replies through the same threads.
Cron tasks fire on schedule. Memory consolidates at midnight
via
auto_dream. - Daemon restart — operator pushes a new YAML, the watcher
detects, validates, swaps via Phase 18
ArcSwap. TheChannelRegistry::reevaluatepass evicts handlers that no longer pass the gate. SQLite stores survive. When alice writes again in the same Slack thread, the agent reattaches to the same session — the bot doesn't re-introduce itself. - Operator returns after 12 h silence — first inbound triggers the AWAY_SUMMARY digest. Agent composes a markdown report of the past 12 h: 14 channel messages handled, 2 permission prompts approved, 1 cron fire completed. Sent before processing the operator's actual message.
- Operator audits —
agent_turns_tail goal_id=<uuid> n=50shows every decision the agent made in the last 50 turns.nexo channel doctorvalidates the YAML against the gate.nexo agent dream tailshows last consolidations.
The operator never sat at a terminal during steps 5-9. The agent is autonomous within the bounds of the YAML.
10. Provider-agnostic by design
Every autonomous behaviour works against any LLM provider:
- MiniMax M2.5 (primary)
- Anthropic Claude (subscription OAuth, API key, or Claude Code import)
- OpenAI-compat providers
- Gemini
- Local llama.cpp (Phase 68 backlog — model-agnostic GGUF loader for tier-0 inference)
The LlmClient trait is the abstraction. No autonomous feature
hard-codes a provider; everything routes through the registry +
binding-level provider selection.
Channels work the same way: any MCP server that follows the protocol becomes a channel, regardless of which platform it adapts.
Pollers, webhooks, and channel adapters are all data-driven via YAML — operators don't write per-provider Rust to add a new external surface.
11. Code map — where each capability lives
| Capability | Crate / file | Tests |
|---|---|---|
| Driver-loop | crates/driver-loop/ | + integration tests |
| Permission decider | crates/driver-permission/src/decider.rs | inline |
| Auto-approve dial | crates/driver-permission/src/auto_approve.rs | 27 |
| Channel relay decorator | crates/driver-permission/src/channel_relay.rs | 8 |
| Bash safety | crates/driver-permission/src/bash_destructive.rs | 19 |
| Long-term memory | crates/memory/src/long_term.rs | inline |
| Memdir relevance scorer | crates/memory/src/memdir/ | inline |
| Secret guard | crates/memory/src/secret_guard.rs | inline |
| autoDream runner | crates/dream/ | 67 |
| Cron schedule + jitter | crates/core/src/cron_schedule.rs | 80 |
| Channels gate + parser + bridge | crates/mcp/src/channel*.rs | 109 |
| Channel session store | crates/mcp/src/channel_session_store.rs | 9 |
| Channel permission relay | crates/mcp/src/channel_permission.rs | 27 |
| Channel boot helpers | crates/mcp/src/channel_boot.rs | 5 |
| Channel LLM tools | crates/core/src/agent/channel_*_tool.rs | 21 |
| Pairing | crates/pairing/ | inline |
| TaskFlow | crates/taskflow/ | inline |
| Agent registry persistence | crates/agent-registry/ | 51 |
| Turn-level audit log | crates/agent-registry/src/turn_log.rs | 9 |
| Inbox router | crates/core/src/agent/inbox*.rs | 17 |
| Webhook receiver | crates/webhook-receiver/ | 33 |
| Forked sub-agent | crates/fork/ | 42 |
| Driver / runtime hookup | src/main.rs | smoke |
Total channel-related lib tests: 168 verde spread across 5 crates. Workspace-wide tests count is much larger; see the phase-specific docs for the per-feature breakdown.
12. What's NOT done yet
Honest list of polish items still backlogged:
- Sample MCP channel server fixture —
extensions/sample-channel-server/reference impl so operators can wire a fake channel quickly without writing an MCP server from scratch. ~200 LOC, high educational value, no functional impact. - Setup wizard panel for channels —
nexo setup → Configurar agente → Channelsinteractive opt-in. UX nice-to-have. - Live-runtime channel doctor — current
nexo channel doctoris static against YAML. Live version that consults the activeChannelRegistryvia NATS to show what's actually registered in the running daemon. channel_historyLLM tool — tail of the turn-log filtered bysource: "channel:<server>", useful for the agent to ask itself "what did Slack send today".- Phase 67.10–67.13 — escalation-to-channel paths for
driver-loop are largely subsumed by
notify_origin/notify_channelalready. Remaining tickets inPHASES.md. - Phase 68 Local LLM tier (llama.cpp) — 15 sub-phases for tier-0 inference (PII / embeddings / poller pre-filter / classifiers / fallback). Planned to run on Termux ARM CPU + desktop CPU/GPU.
None of these block the autonomous agent's current capabilities.
13. Where to go next
- Setting up your first autonomous agent → Quick start + Setup wizard.
- Deep dive on assistant mode + auto-approve → Assistant mode overview.
- MCP channels specifically → MCP channels.
- Multi-agent coordination patterns → Multi-agent coordination.
- Audit + observability stack → Logging
- Metrics + Turn-level audit log (deferred).
- Phase tracking —
PHASES.mdat repo root has the exhaustive sub-phase status (✅ MVP / ⬜ open / DEFERRED).
Assistant mode
Assistant mode is a per-binding behavioural toggle that flips an agent into a proactive posture: it can act on its own when no user is in the chat, run long-lived background goals, coordinate with peers, and summarise activity when the user re-connects after a silence. Default is disabled — bindings without the block keep their conventional request-response behaviour.
Quick start
# agents.yaml
agents:
- id: kate
workspace_git:
enabled: true # required for `auto_dream` git commits
dreaming:
enabled: true
interval_secs: 86400
assistant_mode:
enabled: true
# Operator override (optional). When omitted, the bundled
# default text is appended to the system prompt.
system_prompt_addendum: null
# Auto-spawn teammates at boot. Wired in 80.15.b follow-up;
# accepted at parse time so YAML doesn't need migration later.
initial_team: []
auto_approve: true # see `auto-approve.md`
away_summary:
enabled: true
threshold_hours: 4
max_events: 50
Boot logging confirms wiring:
INFO boot.assistant agent=kate assistant_mode runner registered
What changes when assistant mode is on
1. Proactive system prompt addendum
The binding's effective system prompt picks up an addendum that nudges the agent toward proactive behaviour:
You are running in assistant mode. Your default posture is proactive: when the user is away, you may use scheduled triggers (cron) and channel inbound to drive your own actions, including spawning teammates, calling tools to gather context, and waiting on external events. When you have something useful to report, surface it succinctly through the configured outbound channel; otherwise stay quiet rather than narrating idle time. Only block on user input when you genuinely need a decision they can supply.
Operator can override the text via system_prompt_addendum: "...".
Empty strings are rejected — omit the field to use the default.
2. Boot-immutable flag
The enabled flag is captured at boot. Toggling requires a daemon
restart so a single turn never sees a half-flipped state. The
addendum content itself IS hot-reloadable through the Phase 18
config-watcher path — operators can iterate on the prompt text
without bouncing the daemon.
3. Curated auto-approve dial (companion feature)
assistant_mode: true is most useful paired with auto_approve: true
(see auto-approve.md). Without the dial, the agent hangs on every
tool call waiting for interactive approval — the proactive
posture dies the first time it tries to run ls /tmp. With the
dial, safe read-only / scoped-write tools auto-allow while
destructive Bash, writes outside workspace, and self-config-edit
tools always ask.
nexo setup doctor warns when these are misaligned (assistant_mode
on but auto_approve off — see 80.17.c follow-up for the audit).
4. Always-on lifecycle
Bindings in assistant mode typically pair with:
- BG sessions (
agent run --bg) — long-lived goals that survive shell exit. Seecli/agent-bg.md. - AWAY_SUMMARY — re-connection digest after silence. See
away-summary.md. - Multi-agent coordination —
list_peers+send_to_peerfor in-process peer messaging. Seemulti-agent-coordination.md. - Heartbeat / cron — for time-driven proactive triggers (existing Phase 7 + future Phase 80.2 jitter cluster).
Reading the flag from code
Boot-time helpers resolve the configured value through a single view:
#![allow(unused)] fn main() { use nexo_assistant::ResolvedAssistant; use nexo_config::types::assistant::AssistantConfig; // At boot, per binding: let resolved = ResolvedAssistant::resolve(cfg.assistant_mode.as_ref()); // AgentContext.assistant: ResolvedAssistant — read by: // - llm_behavior (system prompt addendum injection) // - cron defaults (80.15.c follow-up) // - brief mode auto-on (80.15.d follow-up) // - dream context kairos signal (Phase 80.1) // - remote-control auto-tier (80.17.b.b follow-up) }
Status (Phase 80 cluster)
The assistant-mode cluster ships across multiple sub-phases. As of the most recent Phase 80 sweep:
| Sub-phase | Feature | Status |
|---|---|---|
| 80.15 | assistant_mode flag + addendum + ResolvedAssistant | ✅ MVP |
| 80.10 | SessionKind enum + agent run --bg + agent ps | ✅ MVP |
| 80.16 | agent attach + agent discover (DB-only viewer) | ✅ MVP |
| 80.17 + 80.17.b | auto_approve dial + decorator | ✅ MVP |
| 80.14 | AWAY_SUMMARY digest helper | ✅ MVP |
| 80.11 | Agent inbox + list_peers / send_to_peer tools | ✅ MVP |
| 80.11.b | Receive side router + per-goal buffer + render | ✅ MVP |
| 80.1 cluster | auto_dream fork-style consolidation | ✅ MVP |
| 80.16.b | Live event streaming via NATS for attach | ⬜ |
| 80.2-80.6 | Cron jitter cluster | ⬜ |
| 80.8 | Brief mode + SendUserMessage tool | ⬜ |
| 80.9 | MCP channels routing (7-step gate) | ⬜ |
| 80.12 | Generic webhook receiver | ⬜ |
| 80.21 | Docs + admin-ui sweep | ✅ (this page) |
Each sub-phase ships its infrastructure standalone (testable in isolation, opt-in). Wiring the whole cluster end-to-end requires the operator to thread the deferred main.rs hookup snippets when their daemon dirty state allows.
See also
- Auto-approve dial
- AWAY_SUMMARY digest
- Multi-agent coordination
- Background agents (
agent run --bg/ ps / attach) - Dreaming — autoDream consolidation
- Proactive mode — Phase 77.20 tick loop
Auto-approve dial
The auto_approve per-binding flag flips the approval gate from
"always ask the operator" to "auto-allow a curated subset of safe
tools". It is the missing piece that makes assistant mode
practical: a proactive agent running cron-driven goals can't block
on interactive approvals at every tool call.
Default is disabled — current interactive-approval behaviour preserved unchanged for every existing binding.
What auto-approves
When auto_approve: true AND the tool is in the curated subset
AND its call passes the per-tool conditional checks, the approval
prompt is skipped and the tool runs as AllowOnce. Otherwise the
existing approval pipeline takes over.
| Bucket | Tools | Notes |
|---|---|---|
| Read-only / info | FileRead, Glob, Grep, LSP, WebFetch, WebSearch, list_agents, agent_status, agent_turns_tail, memory_history, dream_runs_tail, list_mcp_resources, read_mcp_resource, list_followups, list_peers, task_get | Always auto when dial on |
| Bash conditional | Bash | Only when is_read_only AND not destructive_command AND not sed_in_place |
| Scoped writes | FileEdit, FileWrite | Only when path canonicalises under workspace_path; new-file case canonicalises parent then re-attaches filename; symlink-escape resistant |
| Notify + memory | notify_origin, notify_channel, notify_push, forge_memory_checkpoint, dream_now, ask_user_question | Always auto |
| Coordination | delegate, team_create, team_delete, send_to_peer, task_create, task_update, task_stop | Always auto |
What ALWAYS asks (regardless of dial)
| Tool | Why |
|---|---|
ConfigTool / config_self_edit | Self-editing YAML is too dangerous |
REPL | Stateful subprocess side-effects |
remote_trigger | Outbound webhook to arbitrary URL |
schedule_cron | Persistent state mutation |
| Bash with destructive / sed-in-place | Phase 77.8/77.9 vetoes ALWAYS apply |
mcp_* / ext_* prefix | Heterogeneous per-server semantics |
| Unknown tool name | Default-deny — new tools must be explicitly added |
Layering
The dial composes with the existing gates rather than replacing them:
┌─────────────────────────────────────────────────────────────┐
│ 1. Phase 16 binding `allowed_tools` (tool name filter) │
│ └── Tool not in list → never even reaches the registry │
│ 2. Capability gate (env vars + cargo features) │
│ └── Host-level dangerous toggle off → tool stripped │
│ 3. Auto-approve dial (THIS layer) │
│ └── Curated subset → AllowOnce │
│ └── Otherwise → fall through to operator prompt │
│ 4. Bash destructive heuristic (Phase 77.8/77.9) │
│ └── ALWAYS vetoes regardless of dial │
│ 5. Operator interactive approval │
│ └── Companion-tui / pairing / chat reply │
└─────────────────────────────────────────────────────────────┘
The dial NEVER widens the tool surface. A tool absent from the
binding's allowed_tools is still absent. The dial only skips
the prompt for tools that are already on the surface AND fall in
the curated subset.
Configuration
agents:
- id: kate
workspace: /home/kate/projects # used to scope FileEdit/Write
allowed_tools: ["FileRead", "Bash", "FileEdit", "delegate"]
auto_approve: true # agent-level default
inbound_bindings:
- plugin: whatsapp
# Per-binding override (optional). None inherits agent default.
auto_approve: false # this binding stays interactive
agent.auto_approve is the agent-level default; per-binding
override at inbound_bindings[].auto_approve is Option<bool>
where None inherits.
Wiring on the operator side (deferred 80.17.b.b/c)
Today's slim MVP ships:
is_curated_auto_approve(tool_name, args, on, workspace_path) -> booldecision table (crates/driver-permission/src/auto_approve.rs)AutoApproveDecider<D>decorator (same module) wrapping anyPermissionDeciderchainAgentConfig.auto_approve: bool+InboundBinding.auto_approve: Option<bool>YAML schemaEffectiveBindingPolicy.auto_approve: bool+workspace_path: Option<PathBuf>resolved per binding
Pending:
- 80.17.b.b — boot-time wrap of the active decider with
AutoApproveDecider::new(...)(1-line snippet) - 80.17.b.c — caller-side
metadatapopulation: the wire that constructsPermissionRequestmust insertmetadata.auto_approveandmetadata.workspace_pathfrom the resolved policy before invoking the decider - 80.17.c —
nexo setup doctorwarn forassistant_mode + !auto_approvemisconfiguration
Until those ship, the decorator is a transparent pass-through —
the helper is called but the metadata never reads true. Test it
locally by hand-populating metadata in your decider wrapper.
Defense-in-depth
Five layers protect against agent misbehaviour even with the dial on:
- Phase 16 binding policy — tool not on the surface = never reachable.
- Default-deny match arm — newly introduced tools never auto-approve until explicitly added to the decision table.
- Phase 77.8/77.9 destructive heuristic —
rm -rf,dd,mkfs,sed -i, fork-bomb shapes always veto. - Workspace-scoped writes — symlink-escape resistant via
Path::canonicalize+starts_with. - Operator restart kill-switch — flipping
auto_approve: falseand restarting takes < 5 seconds.
See also
- Assistant mode
- AWAY_SUMMARY digest — what the agent reports when you re-connect after running auto-approved
- Multi-agent coordination
AWAY_SUMMARY digest
When the user has been silent for a configurable threshold (default 4 hours), the next inbound message triggers a short markdown digest that summarises everything the agent did during the silence: goals completed, aborts, failures, and turn counts. Default is disabled — per-binding opt-in.
Why
In assistant mode, the agent runs proactively in the user's absence. When the user comes back to the chat, they need a quick recap before the agent processes their new request. AWAY_SUMMARY is that recap.
The digest answers "what did you do while I was gone?" with a few counter bullets, NOT a long narrative. If you want a richer LLM-summarised version (1–3 sentences of natural prose), that's the deferred 80.14.b follow-up.
Configuration
agents:
- id: kate
away_summary:
enabled: true
threshold_hours: 4 # default
max_events: 50 # default
| Field | Type | Default | Notes |
|---|---|---|---|
enabled | bool | false | Master toggle |
threshold_hours | u64 | 4 | Hours of silence before next inbound triggers digest. 0 would fire on every inbound — operator-side rate limiting becomes their responsibility. Rejected at validate time when > 30 days (likely operator confusion). |
max_events | usize | 50 | Cap on events included in the digest. Larger windows still fire but truncate with a (showing the most recent N — older events may exist) suffix. Rejected at validate time when 0. |
Output shape
Template-based markdown (no LLM call in the slim MVP — pure-fn render):
**While you were away** (last 6h12m):
- 7 goal turn(s) recorded
- 4 completed
- 1 aborted/cancelled
- 1 failed
- 1 in progress / other
When the event count hits max_events, a truncation suffix is
appended:
_(showing the most recent 50 — older events may exist)_
Wiring (operator-side)
Today's slim MVP ships the digest helper as a pure-async function
in nexo-dispatch-tools::away_summary:
#![allow(unused)] fn main() { use nexo_dispatch_tools::try_compose_away_digest; // Inbound handler — called when a new user message arrives. let digest = try_compose_away_digest( &cfg.away_summary.unwrap_or_default(), last_seen, // Option<DateTime<Utc>> from caller storage chrono::Utc::now(), turn_log_store.as_ref(), ).await?; if let Some(text) = digest { // Deliver via notify_origin BEFORE processing the user inbound. notify_origin(channel, &text).await?; } // Atomically update last_seen = now after composing // (caller-managed storage — pairing store, separate SQLite, in-memory map). update_last_seen(channel, sender_id, chrono::Utc::now()).await?; // Now process the user's actual message. process_inbound(...).await?; }
The helper walks 4 gates cheapest-first:
cfg.enabled(opt-in)last_seen.is_some()—Nonereturns None without firing so the caller can use it as the bootstrap path (setlast_seen = nowwithout burning the threshold)now - last_seen >= threshold— negative elapsed (clock skew) returns None- Turn-log has at least one event since
last_seen— empty digest is not worth sending
When all four pass, returns Some(markdown).
Atomic update pattern
The last_seen storage is operator-managed in the slim MVP: the
helper accepts the timestamp as a parameter, doesn't couple to
nexo-pairing or any specific table. Whatever you choose, update
it atomically AFTER composing the digest so a rapid double-inbound
doesn't fire twice.
Defense-in-depth
| Edge case | Behaviour |
|---|---|
enabled: false | Returns None |
last_seen: None (bootstrap) | Returns None — caller sets last_seen without firing |
now - last_seen < threshold | Returns None |
now < last_seen (clock skew) | Returns None |
| Turn-log empty | Returns None |
max_events == 0 | Validate rejects at boot (not at runtime) |
threshold_hours > 30 days | Validate rejects at boot |
Deferred follow-ups
- 80.14.b — LLM-summarised version: forks a subagent that takes the events list and renders a 1–3 sentence prose summary. Today's MVP is template-based.
- 80.14.c —
last_seen_attracking innexo-pairing::PairingStorewith SQLite migration so operators don't roll their own. - 80.14.d — Per-channel-adapter rendering (whatsapp / telegram render markdown differently).
- 80.14.e — Time-of-day awareness ("don't ping at 3am unless awake_hours covers").
- 80.14.f — Custom prompt template per agent (relevant once 80.14.b ships).
- 80.14.g — main.rs inbound interceptor wire (1-line invocation site, blocked on dirty-state pattern).
See also
- Assistant mode
- Auto-approve dial — pair with AWAY_SUMMARY for the "agent works while user sleeps" workflow
- Multi-agent coordination
Multi-agent coordination
Two LLM tools — list_peers and send_to_peer — let in-process
agents discover each other and exchange messages without going
through a delegate-style RPC. Pairs cleanly with assistant mode:
the agent's plain text output is NOT visible to other agents, so
peer messaging IS the only way to communicate.
Subject contract
Per-goal NATS subject:
agent.inbox.<goal_id>
Wire format is JSON. Payload is InboxMessage:
#![allow(unused)] fn main() { pub struct InboxMessage { pub from_agent_id: String, pub from_goal_id: GoalId, pub to_agent_id: String, pub body: String, pub sent_at: DateTime<Utc>, pub correlation_id: Option<Uuid>, } }
correlation_id is omitted from the wire when None so
request/response patterns can re-use the channel without
accidentally injecting noise on one-shots.
Constants exposed via nexo_core::agent::inbox:
INBOX_SUBJECT_PREFIX = "agent.inbox"MIN_BODY_CHARS = 1MAX_BODY_BYTES = 64 * 1024
list_peers LLM tool
Read-only enumeration. No-arg shape; returns peer summaries
excluding the calling agent. Each entry includes a reachable
flag based on the binding's allowed_delegates filter (Phase 16).
{
"peers": [
{
"agent_id": "researcher",
"description": "research agent",
"reachable": true
},
{
"agent_id": "writer",
"description": "writer agent",
"reachable": false
}
]
}
When the binding has no PeerDirectory configured, returns:
{
"peers": [],
"note": "this agent has no PeerDirectory configured"
}
Use this BEFORE send_to_peer to discover valid to: targets.
send_to_peer LLM tool
Fire-and-forget. Resolves to: to a peer agent_id, looks up the
peer's live goals, publishes the InboxMessage to each goal's
inbox subject, returns the per-goal delivery report.
Tool shape
{
"to": "researcher",
"message": "task #1 ready for handoff",
"correlation_id": "7a3b2f00-..." // optional UUID
}
Output
{
"delivered_to": ["b91c2d3a-...", "f88e1100-..."],
"unreachable_reasons": []
}
Or, on failures:
{
"delivered_to": [],
"unreachable_reasons": [
"unknown agent_id `ghost`"
]
}
Validation gates
The handler walks 6 gates:
tomust be present + non-empty after trimto != ctx.agent_id(self-sends rejected with explicit error)messagemust be present + non-empty- Body ≤
MAX_BODY_BYTES(64 KB cap; rejected with explicit limit) tomust exist inPeerDirectory(fast-path "unknown agent_id" unreachable when not — fail-fast before broker round-trip)- Lookup must return at least one live goal id (empty → "no live goals" unreachable)
When all 6 pass, the handler iterates the live goals, builds an
InboxMessage per goal with from_goal_id = ctx.session_id.map(GoalId).unwrap_or_else(GoalId::new) (best-
effort; provenance preserved via from_agent_id), publishes via
the broker, and accumulates per-goal results.
Per-goal fan-out
A peer with multiple live goals (Bg + Daemon + Interactive) gets
the message at every goal's inbox. Per-goal failures don't cancel
the whole call — unreachable_reasons accumulates while
delivered_to records successful publishes.
Receive side (Phase 80.11.b)
Peer messages are queued in a per-goal in-memory FIFO buffer with
a 64-message cap (FIFO eviction on overflow). The receiving
goal's runtime drains the buffer at next turn start and renders
the messages as a <peer-message from="..."> system block:
# PEER MESSAGES
<peer-message from="researcher" sent_at="2026-04-30T14:00:00+00:00">
task #1 ready for handoff
</peer-message>
correlation_id attribute is added when Some so the receiver
can correlate replies back through the same channel.
Buffer-on-demand
Messages addressed to a goal that hasn't register()'d yet
queue in a fresh buffer. When the goal eventually registers, it
sees the buffered messages — race-safe under fast-spawn-then-
immediate-send.
Wiring
#![allow(unused)] fn main() { use nexo_core::agent::inbox_router::{InboxRouter, render_peer_messages_block}; // Boot — single per-process spawn. let router = InboxRouter::new(broker.clone()); let _handle = router.spawn(cancel.clone()); // Per-goal startup. let buf = router.register(goal_id); ctx.inbox_buffer = Some(buf); // Per-turn loop, adjacent to assistant addendum push site. let drained = ctx.inbox_buffer.as_ref().map(|b| b.drain()).unwrap_or_default(); if let Some(block) = render_peer_messages_block(&drained) { channel_meta_parts.push(block); } // Goal terminal. router.forget(goal_id); }
Defense-in-depth
| Edge case | Behaviour |
|---|---|
| Self-send | Rejected by handler |
| Body > 64 KB | Rejected with explicit limit in error |
| Empty body | Rejected |
| Unknown agent_id | unreachable_reasons: ["unknown agent_id ..."] |
| No live goals for peer | unreachable_reasons: ["no live goals ..."] |
| Per-goal publish fails | Recorded in unreachable_reasons, others continue |
| Buffer full (64 messages) | FIFO eviction with tracing::warn! |
| Subject malformed | Dropped with tracing::debug! |
| Payload garbage | Dropped with tracing::debug! |
Race: peer terminates between list_peers and send_to_peer | Falls through unreachable_reasons not panic |
Deferred follow-ups
- 80.11.b.b — Hook
InboxRouterdrain + render into the per-turn loop inllm_behavior.rs(1-line snippet). - 80.11.b.c — main.rs router spawn + per-goal
register/forgeton goal lifecycle hooks. - 80.11.c — Broadcast
to: "*"with cap (linear in team size, marked expensive in tool description). - 80.11.d — Cross-machine inbox via NATS cluster (works automatically with NATS, documents the operator's broker config requirement).
- 80.11.e — Bridge protocol responses (
shutdown_request/plan_approval_requestJSON shapes — niche, defer). - 80.11.f — main.rs tool registration wire.
See also
- Coordinator mode (Phase 84) —
role-aware system prompt for
role: coordinatorbindings,<task-notification>envelope,SendMessageToWorkercontinuation tool, continue-vs-spawn matrix. - Worker mode (Phase 84.4) — sister persona
for
role: workerbindings. - Assistant mode
- Background agents (
agent run --bg/ ps / attach) - Auto-approve dial
Coordinator mode (Phase 84)
Phase 77.18 introduced role: coordinator | worker as a binding flag
and gated the team-coordination tool surface behind it. Phase 84
closes the gap that remained: until 84.1 shipped, a coordinator
binding only saw the tools — it ran the same system prompt as any
other binding and treated worker results as opaque chat fragments.
What 84.1 ships
When a binding's resolved BindingRole is Coordinator, the runtime
prepends a purpose-built coordinator persona block ahead of the
agent's existing system prompt. The block is deterministic (same
inputs → same bytes) so prompt-cache prefix matching stays warm
across turns.
Order of the rendered system prompt:
# COORDINATOR ROLE
{persona block — sections below}
{agent.system_prompt}
# CHANNEL ADDENDUM
{binding.system_prompt_extra — when set}
Worker, Proactive, and absent role bindings are byte-identical
to today; the persona prefix only kicks in for Coordinator.
Persona block sections
- Role declaration — frames the agent as a coordinator: directs workers, synthesizes results, communicates with the user.
- Tools available to you — the binding's
allowed_toolsfiltered to the curated coordinator surface (TeamCreate,TeamDelete,SendToPeer,ListPeers,SendMessageToWorker,TaskStop,TaskList,TaskGet,TodoWrite). Tools the binding doesn't surface drop out. - Worker result envelope — instruction to treat the
<task-notification>XML envelope (Phase 84.2) as a system event, never as a user message. - Continue-vs-spawn matrix — decision table guiding when to
reuse a finished worker (
SendMessageToWorker, Phase 84.3) vs. spawn fresh (TeamCreate, Phase 79.6) vs. message a live peer (SendToPeer, Phase 80.11). - Synthesis discipline — coordinator must produce implementation specs with file paths and line numbers. Anti-pattern: "based on your findings, fix the bug" (delegates understanding back to the worker).
- Verification rigor — real verification (run failing case first, apply fix, run again, confirm broader suite, read the diff). "The build passed" is not verification.
- Parallelism — independent work fans out via concurrent tool calls in a single assistant message.
- Scratchpad (optional) — appears when the binding has
TodoWrite(Phase 79.4) inallowed_tools. Mandatory for 3+ workers. - Known workers (optional) — only rendered when the
CoordinatorPromptCtx.workersslice is populated; the boot path passes empty by default since peer discovery is dynamic.
Configuring a coordinator binding
agents:
ana:
inbound_bindings:
- plugin: whatsapp
instance: ana_main
role: coordinator
allowed_tools:
- TeamCreate
- TeamDelete
- SendToPeer
- ListPeers
- SendMessageToWorker # Phase 84.3
- TaskStop
- TodoWrite # enables Scratchpad section
The allowed_tools list shapes both the runtime tool surface
(Phase 16) and the persona block's tool list — keeping the two in
sync without a parallel config.
Continue-vs-spawn matrix
| Situation | Action |
|---|---|
| Worker finished; new ask builds on its loaded context | Continue (SendMessageToWorker, Phase 84.3) |
| New work has no overlap with any finished worker | Spawn fresh (TeamCreate) |
| Two unrelated streams of work | Spawn in parallel — one assistant message, both calls |
| Worker still in_progress; want to nudge | Send to peer (SendToPeer) |
| Worker silent past budget | TaskStop, then decide spawn-vs-continue from partial |
Default to continue when the new ask shares >50% of the prior worker's read files / search terms. Default to spawn when in a different subsystem.
<task-notification> envelope (Phase 84.2)
Worker results arrive in the coordinator's session wrapped in a
<task-notification> XML block. The block carries task-id,
status, summary, optional result, and optional usage:
<task-notification>
<task-id>goal-9f3a</task-id>
<status>completed</status>
<summary>Found 3 candidate fixes</summary>
<result>See `crates/auth.rs:142`.</result>
<usage>
<total_tokens>1280</total_tokens>
<tool_uses>4</tool_uses>
<duration_ms>12400</duration_ms>
</usage>
</task-notification>
status is one of completed | failed | killed | timeout.
Optional elements (<result>, <usage>) collapse out when the
producer has no value.
Treat these blocks as system events, not user messages. The
persona prompt explicitly forbids <thank> / <acknowledge>
responses to a notification — read it, factor into synthesis, and
either continue the worker (84.3), spawn the next one, or report
to the user.
Producer surface lives in nexo-fork::fork_handle:
#![allow(unused)] fn main() { let n = fork_result.to_task_notification(task_id, summary, duration_ms); let xml = n.to_xml(); // injected into the coordinator's next user turn }
fork_error_to_task_notification(err, task_id, duration_ms) covers
the failure paths (ForkError::Aborted → killed,
ForkError::Timeout → timeout, others → failed).
Consumer wiring (the producer-to-LLM-context bridge) lands with 84.3 — that's where the fork-pass + TeamCreate completion paths actually exist. The 84.2 work pre-builds the type + producer helpers so 84.3 has one canonical path.
Backwards compatibility: TaskNotification::parse_block(text)
returns None when the input lacks the envelope, so legacy
consumers that read raw final text keep working during the rollout.
SendMessageToWorker continuation tool (Phase 84.3)
The coordinator can re-engage a finished worker by appending a new
user turn to its loaded session context. Distinct from
SendToPeer (peer-to-peer messaging to a live agent) and
TeamCreate (spawn fresh worker with empty context).
{
"tool": "SendMessageToWorker",
"args": {
"worker_id": "w-research", // task-id from prior <task-notification>
"message": "Continue: investigate the token-expiry boundary at auth.rs:142"
}
}
Response shape
| Outcome | kind | Notes |
|---|---|---|
| Worker exists, finished, this binding spawned it | Continued | Returns worker_id, prior_status, messages_count, pipeline_pending: true |
worker_id matches no registry entry in this binding | UnknownWorker | Same error returned for cross-binding probes (defense-in-depth — no existence oracle) |
Worker exists but is Running | WorkerStillRunning | Use SendToPeer for live peer messaging |
| Message > 32 KiB | MessageTooLarge | Hard cap |
| Binding has no resolved channel/account | BindingUnresolved | Synthesised policies (delegation, heartbeat) refuse cleanly |
| Binding role isn't coordinator | RoleRefused | Defense-in-depth — even if allowed_tools: ["*"] |
Cross-binding isolation
The registry keys workers by (coordinator_binding_key, worker_id),
where coordinator_binding_key comes from
EffectiveBindingPolicy.binding_id() — the canonical
<channel>:<account_id|"default"> render. A worker registered
under binding A is invisible to binding B; the lookup returns
Unknown, not WrongBinding, so binding B can't enumerate
binding A's worker ids.
Pipeline pending
The 84.3 sub-phase ships the type, the registry, the tool, and
all four spec error scenarios. The actual transcript-resume
execution (loading the worker's prior messages, appending the
new user turn, running another fork loop, emitting a fresh
<task-notification> on completion) is deferred to the
fork-as-tool spawn pipeline that lives outside this sub-phase.
Today the success path returns pipeline_pending: true so a
coordinator can verify the request was accepted; the resume
itself wires up alongside the worker-spawn pipeline.
Composition with other phases
| Phase | Composition |
|---|---|
16 — EffectiveBindingPolicy | Persona prepend runs inside resolve() after allowed_tools is computed so the tool list reflects the effective binding surface |
77.18 — BindingRole | The role string is parsed via BindingRole::from_role_str; only Coordinator triggers the prepend |
79.4 — TodoWrite | When present in allowed_tools, the Scratchpad section is rendered |
79.6 — TeamCreate / TeamDelete | Listed in the persona's tool section when on the binding's surface |
80.11 — SendToPeer / ListPeers | Same |
84.2 — <task-notification> envelope | The persona instructs the agent to treat these blocks as system events |
84.3 — SendMessageToWorker | Listed in the persona's tool section; the continue-vs-spawn matrix references it |
Inspecting the rendered prompt
For verification on a configured agent, the prompt is visible in
the runtime EffectiveBindingPolicy::resolve(...).system_prompt
field. A cargo test -p nexo-core agent::effective::tests::coordinator
run exercises the full path including a YAML-fixture smoke test.
Worker mode (Phase 84.4)
Complement to coordinator mode. Bindings
with role: worker get a worker-specific system prompt block
prepended to the agent's existing system_prompt. Workers run
self-contained tasks dispatched by a coordinator; the persona
steers them away from user-facing dialogue and toward terse,
verified, on-spec output.
What 84.4 ships
When a binding's resolved BindingRole is Worker, the runtime
prepends the worker persona block. Coordinator bindings get the
coordinator block instead; Proactive and absent role are
byte-identical to today.
# WORKER ROLE
{persona block — sections below}
{agent.system_prompt}
# CHANNEL ADDENDUM
{binding.system_prompt_extra — when set}
Persona block sections
- Role declaration — frames the agent as an executor: do the
work, report results, do not initiate user-facing dialogue.
Scope questions go back to the coordinator via the final answer
(
"blocked: need X"). - Output discipline — the final answer is read by another
agent, not a human. Optimize for parseability:
- Code work: file path + line range + actual diff (or commit hash).
- Research: bullet list with
file_path:linereferences. - Failures: actual error output verbatim, not paraphrased.
- Self-verification — typecheck + test + read the diff before reporting done. False "done" reports poison the synthesis above.
- Tools available to you — the binding's
allowed_toolslist verbatim (workers see exactly the surface the operator granted, no curated subset). Followed by an explicit reminder thatTeamCreate,SendToPeer,SendMessageToWorker, andTaskStopare not worker tools — wanting one means the task scope is too large. - Scratchpad (optional) — appears when
TodoWriteis inallowed_tools. Worker scratchpad is for the worker's own multi-step state, not for cross-worker coordination.
Configuring a worker binding
agents:
ana-worker:
inbound_bindings:
- plugin: whatsapp
instance: ana_worker
role: worker
allowed_tools:
- BashTool
- FileEdit
- WebFetch
- TodoWrite
Composition with other phases
| Phase | Composition |
|---|---|
16 — EffectiveBindingPolicy | Worker prepend runs inside resolve() after allowed_tools is computed |
77.18 — BindingRole::Worker | The role string parses to BindingRole::Worker; only that triggers the prepend |
79.4 — TodoWrite | Scratchpad section appears when TodoWrite is on the binding's surface |
| 84.1 — coordinator persona | The two persona builders are sister modules; the boot-path matcher in apply_persona_prefix dispatches on role |
84.2 — <task-notification> envelope | A worker's final assistant text becomes the result field of the envelope when the spawn pipeline ships |
84.3 — SendMessageToWorker | Workers don't call SendMessageToWorker (it's coordinator-only), but their session is what the tool resumes |
Inspecting the rendered prompt
The full path is exercised in
cargo test -p nexo-core agent::effective::tests::worker_role_loaded_from_yaml_renders_persona_block,
which deserializes a YAML binding fixture and asserts the
# WORKER ROLE prefix lands ahead of the agent's own prompt with
the expected sections.
Proactive Mode (Phase 77.20)
Proactive mode lets an agent run autonomously between user messages.
Instead of waiting for a new inbound event, the runtime injects periodic
<tick> prompts and the model decides whether to do work now or call
Sleep { duration_ms, reason }.
Configuration
Enable at agent level or per binding (inbound_bindings[].proactive):
proactive:
enabled: true
tick_interval_secs: 600
jitter_pct: 25
max_idle_secs: 86400
initial_greeting: true
cache_aware_schedule: true
allow_short_intervals: false
daily_turn_budget: 200
Per-binding override replaces the full proactive block for that binding.
Sleep Tool
Sleep is the canonical way to wait in proactive mode.
Do not use shell sleep for this.
- Bounds:
duration_msis clamped to[60_000, 86_400_000]. - Wake-up: runtime injects a synthetic
<tick>with elapsed time + reason. - Interrupt: real inbound user messages cancel pending sleep immediately.
Inbound Queue Priority
Inbound events can optionally carry priority in payload:
now— highest priority (urgent interrupt)next— default priority (normal user input)later— deferred background notifications
When multiple messages are batched in the same debounce window, runtime
processes them in now > next > later order, preserving FIFO within
each priority class.
now also bypasses debounce delay and flushes immediately.
If now arrives during an in-flight turn, runtime preempts that turn and
runs the now message first.
Cache-Aware Scheduling
When cache_aware_schedule: true, runtime biases sleep duration to avoid
the Anthropic cache dead-zone:
<= 270_000ms: keep as-is (cache warm window).270_001..1_199_999ms: snap to270_000or1_200_000(nearest).>= 1_200_000ms: keep as-is.
Daily Tick Budget
daily_turn_budget limits proactive tick-driven turns per 24h window.
0means unlimited.- When exhausted, wake-ups are suppressed and re-armed using the effective tick interval.
This prevents runaway autonomous loops from burning quota.
Telemetry
Prometheus counter:
nexo_proactive_events_total{agent,event}
Events:
tick.firedsleep.enteredsleep.interruptedcache_aware.snapped
Relation to agent_turn Poller
Phase 20 agent_turn is cron-driven external scheduling.
Proactive mode is model-driven self-pacing inside a live goal.
They are complementary and can coexist across different bindings.
Compact tiers
Context compaction and memory extraction in Nexo currently has four tiers:
Tier 1: micro compact (inline tool-result shrink)
Reduces oversized tool_result payloads before request send, keeping
tool_use_id correlation stable while replacing bulky content with a
compact marker (or provider-summary path when configured).
Operational intent:
- protect prompt budget from one-off large tool outputs
- preserve turn continuity without rewriting full history
Tier 2: auto compact (history folding — Phase 67.9 + 77.2)
When token pressure crosses configured thresholds or session age expires, runtime folds older history into a compact summary while preserving the hot tail.
Two independent triggers (Phase 77.2):
Token-pressure trigger
Fires when estimated_tokens / context_window >= token_pct (default 0.80
when auto block is present, fallback to legacy threshold 0.70 when
absent).
Age trigger
Fires when session_age_minutes >= max_age_minutes (default 120).
Disabled when auto block is absent or max_age_minutes: 0.
Guards
- Anti-storm:
min_turns_between(default 5) turns must elapse between consecutive compactions. - Circuit breaker: after
max_consecutive_failures(default 3) consecutive compaction failures, the policy stops requesting compacts for the remainder of the goal. A successful compact resets the counter. - Buffer tokens:
buffer_tokens(default 13000) safety margin below effective context window.
Operational intent:
- keep long-running sessions inside context window
- age-based trigger catches memory pressure from accumulated tool outputs even when estimated tokens are low
- reduce repeated cost of stale historical turns
Events
| Event | Subject | When |
|---|---|---|
CompactRequested | agent.driver.compact | Policy classifies and schedules a compact turn |
CompactCompleted | agent.driver.compact.completed | Turn after compact, with after_tokens |
Tier 3: session memory compact (Phase 77.3)
Persists compact summaries to long-term memory so resumed sessions can inject the last compact summary into the prompt without re-executing elided turns.
Operational intent:
- survive daemon restart without losing compaction progress
- feed prior summary into resumed goal's first-turn prompt
- avoid redundant re-compaction of the same history
How it works
- After a successful compact turn, the orchestrator extracts the
LLM-generated summary from
result.final_text. - Summary is persisted via
LongTermMemory::remember()with tagcompact_summaryand goal_id embedded in the content for FTS5 recall. - On goal resume (daemon restart),
load()retrieves the most recent summary and injects it intonext_extrasascompact_summary. PostCompactCleanupruns after persistence (no-op placeholder for 77.5+ extractMemories integration).
Events
| Event | Subject | When |
|---|---|---|
CompactSummaryStored | agent.driver.compact.summary_stored | Summary persisted to LTM |
Config
compact_policy:
sm_compact: # Phase 77.3 (optional)
min_tokens: 10000 # min tokens before store (default 10000)
max_tokens: 40000 # max tokens per summary (default 40000)
store_in_long_term_memory: true # default true
sm_compact defaults to None — set it to enable session-memory
persistence. store_in_long_term_memory: false uses the noop store
for testing.
Tier 4: extractMemories (post-turn LLM extraction — Phase 77.5)
After every N eligible turns, a small LLM call reads the recent
conversation transcript and writes durable memories to the persistent
memory directory (~/.claude/projects/<path>/memory/*.md + MEMORY.md).
Four-type taxonomy (user / feedback / project / reference) with an
explicit exclusion list (code patterns, git history, debug recipes,
CLAUDE.md contents, ephemeral task details). Extraction is single-turn:
the existing memory manifest is pre-injected into the system prompt so
the LLM can decide what to update without file-system exploration.
Response is parsed as a JSON array of {file_path, content} objects.
Operational intent:
- complement Phase 10.6 dreaming (offline/recall-signal-based) with an inline/transcript-based path
- keep the memory directory current without manual
rememberinvocations - surface durable context to future sessions without re-reading full conversation history
Guards
- Throttle:
turns_throttle(default 1 = every turn; recommend 3+ in production to limit token cost). - Circuit breaker: after
max_consecutive_failures(default 3) consecutive extraction failures, the breaker opens and extraction is skipped for the remainder of the goal. - Mutual exclusion: at most one extraction in-flight per goal. When a new turn arrives mid-extraction, its context is coalesced and runs as a single trailing extraction.
- Main-agent write detection: extraction is skipped when the main agent already wrote to the memory directory this turn, avoiding clobbering intentional user-directed writes.
- Path sandbox: file paths from the LLM are validated — absolute
paths and
..traversal are rejected.
Events
| Event | Subject | When |
|---|---|---|
ExtractMemoriesCompleted | agent.driver.extract_memories.completed | Extraction succeeded, N memories saved |
ExtractMemoriesSkipped | agent.driver.extract_memories.skipped | Extraction skipped (disabled / throttled / in-progress / circuit-breaker / main-agent-wrote) |
Config
compact_policy:
extract_memories: # Phase 77.5 (optional — default: disabled)
enabled: true # master switch (default false — opt-in)
turns_throttle: 3 # run every N eligible turns (default 1)
max_turns: 5 # max LLM turns per extraction (default 5)
max_consecutive_failures: 3 # circuit breaker (default 3, 0=disabled)
extract_memories defaults to None — set it to enable post-turn
extraction. The LLM backend is wired via the driver orchestrator's
extract_memories() builder method; the binary crate supplies the
LlmClient adapter.
Configuration surface
All tiers are controlled under llm.context_optimization.compaction in
llm.yaml, with per-agent enable switches in agents.yaml.
Driver-side config (config/driver/claude.yaml):
compact_policy:
enabled: true
context_window: 200000 # model context window in tokens
threshold: 0.7 # legacy token-pressure threshold (0.0-1.0)
min_turns_between_compacts: 5
auto: # Phase 77.2 (optional — age trigger disabled when absent)
token_pct: 0.80 # token-pressure threshold (0.0-1.0, default 0.80)
max_age_minutes: 120 # fire age trigger after 2 h (0 disables, default 120)
buffer_tokens: 13000 # safety margin below context window (default 13000)
min_turns_between: 5 # anti-storm gap (default 5)
max_consecutive_failures: 3 # circuit breaker (default 3)
sm_compact: # Phase 77.3 (optional)
min_tokens: 10000
max_tokens: 40000
store_in_long_term_memory: true
extract_memories: # Phase 77.5 (optional — default disabled)
enabled: true
turns_throttle: 3
max_turns: 5
max_consecutive_failures: 3
Agent-side config (agents.yaml or per-binding llm.context_optimization.compaction):
compaction:
enabled: true
compact_at_pct: 0.7 # legacy threshold
auto: # Phase 77.2
token_pct: 0.80
max_age_minutes: 120
buffer_tokens: 13000
min_turns_between: 5
max_consecutive_failures: 3
See:
Telemetry to watch
llm_compaction_triggered_total{agent,trigger,outcome}—triggeristoken_pressureoragellm_compaction_duration_seconds{agent,outcome}agent_driver_compaction_requested_total{trigger}agent_driver_compaction_completed_total{outcome}agent_driver_compact_summary_stored_totalagent_driver_extract_memories_completed_totalagent_driver_extract_memories_skipped_total{reason}- prompt/token drift counters from token counter telemetry
Memdir scanner
memdir scanner support is currently documented through the MCP server
extension flow and OpenClaw-parity references.
Current status:
- scanner-style memory path logic is referenced in
docs/src/extensions/mcp-server.md(teamMemPathsparity notes) - there is no standalone operator CLI page yet for a dedicated
memdir scancommand
What operators should do today
- Use the MCP server extension docs as the canonical path for memory directory layout and exposure behavior.
- Rely on existing memory docs for storage/runtime semantics:
- Track roadmap follow-ups in
PHASES.md/FOLLOWUPS.mdfor an explicit scanner command surface.
Configuration — memory.secret_guard (C5)
The Phase 77.7 secret scanner blocks memory writes that contain API
keys, tokens, or private keys. From C5 onwards, operators control
its behaviour via the memory.secret_guard block in
config/memory.yaml:
memory:
short_term: { ... }
long_term: { ... }
# C5 — secret-scanner policy (provider-agnostic).
# Omit the entire block for the secure default (enabled=true,
# on_secret=block, rules=all, exclude_rules=[]).
secret_guard:
enabled: true # master switch (default true)
on_secret: block # block | redact | warn (default block)
rules: all # "all" or a list of rule IDs
exclude_rules: [] # list of rule IDs to skip (default empty)
| Field | Type | Default | Effect |
|---|---|---|---|
enabled | bool | true | Master switch. false makes every check a no-op. |
on_secret | block | redact | warn | block | What to do on detection. block returns an error and the write is refused; redact replaces matched secrets with [REDACTED:rule_id] and writes; warn writes intact and emits a warn log + event. |
rules | "all" or [rule_id, ...] | "all" | Which rules to apply. List form selects only the named rules. |
exclude_rules | [rule_id, ...] | [] | Rule IDs to silence (false positives). |
YAML-typo values (on_secret: deny, malformed rules, etc.) fail
boot loud — never silent.
Provider-agnostic
The scanner detects API keys for every supported LLM provider
(Anthropic, MiniMax, OpenAI, Gemini, DeepSeek, xAI, Mistral) using
the same regex set. exclude_rules operates on rule IDs (kebab-case
like github-pat, aws-access-token, openai-api-key), not on
providers — silencing one rule narrows by pattern shape, not by
LLM-provider identity.
Common operator workflows
- Switch from block to warn (e.g. dev environment debugging):
Memory writes are not refused; the daemon logs every detection for review.secret_guard: { on_secret: warn } - Suppress a known false positive:
All other 35 rules stay active.secret_guard: exclude_rules: [github-pat] - Hard-disable for an isolated test (NOT recommended in
production):
secret_guard: { enabled: false }
Prior art (validated, not copied)
upstream agent CLI, 596-615,312-324— hardcoded scanner with no YAML knob; activation via build flag (feature('TEAMMEM')) only. Operator override impossible without recompile. We adopt a richer operator-facing config rather than the hardcoded model.research/src/config/zod-schema.ts— OpenClaw uses 2-value enums (redactSensitive: off|tools,mode: enforce|warn). We extend to 3 (block|redact|warn) for richer behaviour without forcing operators to choose between block and disabled.
Bash safety knobs
Nexo's Bash safety model is layered. Even when the Bash tool is
available, execution is constrained by policy and runtime gates.
Main safety layers
- Per-binding tool allowlist
allowed_toolscan removeBashentirely for selected channels.
- Plan mode gating
- mutating paths are blocked until explicit exit/approval workflow.
- Destructive-intent integration
- plan-mode policy can auto-enter on destructive command detection.
- Worker-role curation
- worker bindings run a constrained tool surface by default.
Relevant config knobs
agents[].allowed_toolsagents[].inbound_bindings[].allowed_toolsagents[].plan_mode.*agents[].inbound_bindings[].plan_mode.*agents[].inbound_bindings[].role
Operational guidance
- For user-facing channels, prefer narrowing
allowed_toolsrather than trusting prompt-only behavior. - Keep plan mode enabled for coordinator bindings.
- Use worker role for delegated execution to reduce blast radius.
Related docs
Channel doctor
nexo channel is an operator CLI for debugging the MCP-channels
surface without a running daemon. Three verbs:
nexo channel list [--config=<path>] [--json]
nexo channel doctor [--config=<path>] [--binding=<id>] [--json]
nexo channel test <server> [--binding=<id>] [--content=...]
[--config=<path>] [--json]
All three read from the operator's YAML directly. They never spin up the daemon, never connect to a live MCP server, and never publish on the broker. Safe to run on production configs from any operator workstation.
nexo channel list
Walks every agent and surfaces (enabled, approved_servers, bindings) per agent. When --json is passed the output is
machine-readable; otherwise the renderer groups by agent for
human reading.
$ nexo channel list
## agent kate — channels.ENABLED (2 approved)
approved: slack
approved: telegram
binding telegram:kate_tg: 2 server(s) — slack, telegram
When an agent has no channels.approved entries the
(no approved servers) placeholder makes the gap obvious. When
no binding lists allowed_channel_servers, (no binding has allowed_channel_servers) highlights the configuration is
incomplete.
nexo channel doctor
Runs the static half of the 5-step gate against every
(agent, binding, server) triple in the YAML. The doctor
cannot probe a live MCP server, so gate 1 (capability declared)
is assumed true; gates 2/3/5 run normally; gate 4 (plugin
source) reads from the approved entry. Each row carries one of
three outcomes:
WOULD REGISTER— every static gate passes; the only thing the live daemon will check is whether the server actually declares the capability.SKIP { kind, reason }— typed reason.disabled=channels.enabled: false.session= binding doesn't list the server.marketplace=plugin_sourcemismatch.allowlist= server isn't inapproved.NOT BOUND— the server appears inapprovedbut no binding lists it. Surfaces a half-configured state where the operator vetted the server but forgot to bind it.
Filter to one binding with --binding=<plugin>:<instance>. The
binding id format mirrors what the runtime registers — the same
string that shows up in agent logs.
$ nexo channel doctor --binding=telegram:kate_tg
| Agent | Binding | Server | Outcome | Skip | Reason |
|-------|--------------------|----------|----------------|------------|--------|
| kate | telegram:kate_tg | slack | WOULD REGISTER | - | all static gates pass; live runtime must declare the capability |
| kate | telegram:kate_tg | telegram | WOULD REGISTER | - | all static gates pass; live runtime must declare the capability |
nexo channel test
Synthesises a notifications/nexo/channel payload (with sample
chat_id and user meta) and runs it through
parse_channel_notification + wrap_channel_message. Prints
the model-facing <channel> block plus the derived
session_key. Cheap dry-run for tuning meta-key whitelists or
verifying content-cap behaviour.
$ nexo channel test slack
# Channel test — server=slack
session_key: slack|chat_id=C_TEST
--- rendered XML (model-facing) ---
<channel source="slack" chat_id="C_TEST" user="operator">
hello from slack — channel test payload
</channel>
Override the body with --content="..." to test how the
content cap (agents.channels.max_content_chars) clips long
payloads. The output flags [content truncated by max_content_chars] when the cap fired.
When to use which
- Setting up channels for the first time →
listto verify the YAML structure, thendoctorto confirm the gate would let the binding register, then start the daemon. - A server stopped delivering messages →
doctorto see if the gate would still register it. Common causes:channels.enabledflipped off; binding'sallowed_channel_serversdoesn't include the server (typo); approved entry got renamed. - Tuning meta-key whitelists / content caps →
test <server>with various--contentpayloads.
Live-runtime checks
doctor is intentionally static. To check live state — what's
actually registered in the running daemon — the agent calls
channel_list / channel_status from inside a turn, or the
operator inspects the mcp.channel.> NATS subjects directly.
Live-runtime CLI is on the roadmap.
See also
- MCP channels concept — the full picture including threading, permission relay, and the hot-reload re-evaluation pass.
Webhook receiver
Inbound HTTP webhook surface for any third-party provider that signs payloads with HMAC-SHA256 / HMAC-SHA1 / a raw shared token and exposes the event kind in a header or JSON body field. Provider-agnostic by construction: declare sources in YAML, no Rust code change per provider.
Successful requests are published to a NATS subject; downstream pollers, agent turns, or microapps subscribe and react.
Quick start
# config/webhook_receiver.yaml
enabled: true
bind: "0.0.0.0:8081"
body_cap_bytes: 1048576
request_timeout_ms: 15000
# (optional) defense for floods — token-bucket per (source, ip).
default_rate_limit:
rps: 10
burst: 20
# (optional) max in-flight requests per source. 0 = unbounded.
default_concurrency_cap: 32
# (optional) honour `X-Forwarded-For` only when the socket peer
# is in one of these CIDR blocks.
trusted_proxies:
- "10.0.0.0/8"
allow_realip_fallback: false
sources:
- id: "github_main"
path: "/webhooks/github"
signature:
algorithm: "hmac-sha256"
header: "X-Hub-Signature-256"
prefix: "sha256="
secret_env: "WEBHOOK_GITHUB_MAIN_SECRET"
publish_to: "webhook.github_main.${event_kind}"
event_kind_from:
kind: "header"
name: "X-GitHub-Event"
# (optional) per-source overrides
rate_limit:
rps: 20.0
burst: 40
concurrency_cap: 8
Set the secret in the environment before starting the daemon:
export WEBHOOK_GITHUB_MAIN_SECRET='your-shared-secret'
Pipeline
Every accepted POST goes through six gates in order. Failure at any gate short-circuits the request; the dispatcher only fires when every gate passes.
| Gate | Reject status | What it checks |
|---|---|---|
| 1. Method | 405 | Only POST <path> matches the route. |
| 2. Body cap | 413 | tower_http::limit::RequestBodyLimitLayer enforces per-source body_cap_bytes. |
| 3. Concurrency | 503 + Retry-After: 1 | Per-source semaphore. 0 = unbounded. |
| 4. Rate limit | 429 | Token bucket per (source_id, client_ip). LRU-evicts at 4096 keys to defend against IP-flood OOM. |
| 5. Signature | 401 / 422 / 500 | HMAC verify (constant-time) + event-kind extract from header or JSON body path. 500 only when secret_env is unset. |
| 6. Dispatch | 502 / 422 | BrokerWebhookDispatcher publishes the envelope. 502 = broker unavailable; 422 = envelope serialise rejected. |
Successful dispatch returns 204 No Content.
NATS envelope
The dispatcher publishes a typed WebhookEnvelope (JSON):
{
"schema": 1,
"source_id": "github_main",
"event_kind": "pull_request",
"body_json": { "action": "opened", "...": "..." },
"headers_subset": {
"x-github-delivery": "abc-123",
"user-agent": "GitHub-Hookshot/..."
},
"received_at_ms": 1746147600000,
"envelope_id": "0c4a...-uuid",
"client_ip": "1.2.3.4"
}
Subscribers can filter on topic == "webhook.<source_id>.<event_kind>"
or on the broker Event.source field (which doubles as
source_id).
Headers forwarded vs stripped
Forwarding every header would leak Authorization / Cookie /
the signature itself to NATS subscribers. The receiver allowlists
just the non-secret correlation headers downstream consumers
actually need:
x-github-deliveryx-stripe-event-idx-event-idx-request-ididempotency-keyuser-agent
Operating behind a reverse proxy
If the daemon is behind nginx / Cloudflare / a load balancer:
- Set
trusted_proxiesto the proxy's source CIDR. - Optionally enable
allow_realip_fallbackif your proxy usesX-Real-IPinstead ofX-Forwarded-For.
Untrusted peers always have their forwarded headers ignored —
clients claiming to be a proxy from outside the trusted CIDR
get their socket address used for rate-limit keying. This is the
correct defensive posture; tighten trusted_proxies until only
your real proxies fit.
Reserved ports
8080— health server (Kubernetes liveness)9091— admin server (loopback only)
The webhook bind address must not collide with either; validation
rejects collisions at boot with a typed
WebhookConfigError::ReservedBind.
Secret rotation
Secrets are read fresh per request via std::env::var — no
caching. To rotate:
- Set the new value in the environment.
- Restart the daemon (env reads happen on every request, but the original env at start time wins; safest is restart).
- Verify with a known-good signed request.
Troubleshooting
- All requests 401:
tracing::warn!showssignature mismatch. Re-check that the operator-sideWEBHOOK_<SOURCE>_SECRETenv matches what the provider signs with. - All requests 500:
secret_envis unset. Check the environment for the configured variable name. - Bursts get 429s: tighten the provider's retry/backoff or
raise
default_rate_limit.burst. Token-bucket allows bursts up toburstthen drops atrps— design for steady-state load- a margin.
- Bursts get 503s:
default_concurrency_capreached. Raise the cap, or lower the per-sourceconcurrency_capfor noisy sources to keep them from starving the rest.
Validation errors at boot
| Error | Cause | Fix |
|---|---|---|
BodyCapZero | body_cap_bytes: 0 | Raise to a positive value (default 1 MiB). |
RequestTimeoutZero | request_timeout_ms: 0 | Raise to a positive value (default 15 000 ms). |
DuplicateId | Two sources share an id. | Rename one. |
DuplicatePath | Two sources share a path. | Pick distinct paths. |
ReservedBind | bind port is 8080 or 9091. | Pick a free port. |
Source { id, detail } | Per-source schema invalid. | Read detail — typically empty path or empty secret_env. |
DefaultRateLimit | rps negative or > 1000. | Use a sane positive value. |
ConcurrencyCapZero | Per-source concurrency_cap: 0 | Use null to inherit the global cap. |
Event subscribers
Per-agent NATS subject patterns that, when matched, fire an agent turn. Covers the gap between webhook receivers / pollers / microapps publishing events and the agent runtime consuming them. Provider-agnostic by construction.
Quick start
# In agents.yaml under each agent:
agents:
- id: marketing
event_subscribers:
- id: github_main
subject_pattern: "webhook.github_main.>"
synthesize_inbound: synthesize # synthesize | tick | off
inbound_template: "GitHub {{event_kind}}: {{body_json.repository.full_name}} — {{body_json.action}}"
max_concurrency: 4
max_buffer: 64
overflow_policy: drop-oldest # drop-oldest | drop-newest
When a NATS event matches subject_pattern, the runtime
synthesises an inbound message and fires an agent turn. The
agent receives the rendered template (or raw JSON fallback)
in its turn context and decides what to do.
Synthesis modes
| Mode | Behaviour |
|---|---|
synthesize (default) | Render inbound_template against the event payload via mustache-lite ({{path.to.field}}). Fallback to JSON-stringify when no template. |
tick | Fire an agent turn with a <event subject="..." envelope_id="..."/> marker as the body. Cheap on context window — agent can ignore or fetch payload via tooling. |
off | Subscriber inactive. Useful for staging YAML before flipping it on (requires daemon restart at v0). |
Auto-synthesised binding
When you declare an event_subscribers entry, the boot
supervisor automatically synthesises a matching
inbound_bindings entry:
# Implied automatically from event_subscribers above:
inbound_bindings:
- plugin: event
instance: github_main
If you declare the binding manually (e.g. to override
allowed_tools or sender_rate_limit for that source), your
manual entry survives — the auto-synth is idempotent.
Template syntax
Mustache-lite. Only {{path.to.field}} substitution; no
conditionals or loops.
{{event_kind}}— top-level field.{{body_json.action}}— nested object access.{{tags.0}}— array index.- Missing path →
<missing>placeholder (does not crash). - Object/array at the leaf →
<missing>(avoids leaking struct shape into the agent body).
Buffer + concurrency
Each binding gets its own:
- Bounded buffer (
max_buffer, default 64) — absorbs bursts without blocking the broker. - Concurrency cap (
max_concurrency, default 1 = serial) — enforces ordering and limits in-flight turns. - Overflow policy (
drop-oldestdefault — recent events more relevant;drop-newestfor conservative buffering).
Drops emit tracing::warn! with binding_id + drop counter.
Defensive guards
- Loop guard: if a binding's
subject_patternaccidentally matches its own re-publish topic (plugin.inbound.event.<id>), the producer drops the self-event with a warn — never blows the buffer. idvalidation at boot: rejects.,*,>, or whitespace inid(would mis-parse the re-publish topic).- Pattern validation:
>,plugin.>,plugin.*.>,plugin.inbound.*are all rejected as loop-risk patterns at boot. - Per-binding cancel token: SIGTERM drains all subscribers within ≤1s.
Worked example: GitHub webhook → marketing agent
- Phase 82.2 webhook receiver verifies a GitHub webhook → publishes
webhook.github_main.pull_requestto NATS with the typedWebhookEnvelope. - Phase 82.4 event_subscriber matches
webhook.github_main.>→ renders"GitHub pull_request: anthropic/repo — opened"→ re-publishes toplugin.inbound.event.github_main. - The existing inbound resolver matches the auto-synthesised
{ plugin: "event", instance: "github_main" }binding → constructsBindingContextwithevent_source: Some({ subject: "webhook.github_main.pull_request", synthesis_mode: "synthesize", ... }). - The agent fires a turn; tools see the metadata in
params._meta.nexo.binding.event_source.
Microapps consuming nexo-tool-meta parse the event_source
field via parse_binding_from_meta(args._meta).event_source.
Hot-reload
v0 spawns subscribers at boot only. Adding/removing
event_subscribers requires a daemon restart. Hot-reload via
Phase 18 reload coordinator is the deferred 82.4.c follow-up.
Operator notes
- Subscribers are independent of the agent's session — they feed events into the standard inbound flow, which the runtime routes per session like any other inbound.
- The
_nexo_event_sourceextension field on the re-published payload is the canonical seam between the EventSubscriber and the inbound resolver. Microapps should read it from the agent-side_meta.nexo.binding.event_source, not from the raw broker payload. tracing::info!summary at boot: look forevent subscribers online: count=Nto confirm wiring took.- Validation failures are non-fatal: an invalid binding logs an error and skips; the daemon stays up.
Per-binding tool rate-limits
Phase 82.7 lets operators declare per-binding tool rate-limits on top of the per-agent ones from Phase 9.2. Same agent + same tool, two bindings → two independent buckets with independent caps. Use it to enforce SaaS tier policies (free / pro / enterprise) without spinning up separate agent processes.
When to use
- Same agent answers a free-tier WhatsApp account AND an
enterprise account; the enterprise tenant must not be starved
by free-tier traffic on the shared
marketing_send_driptool. - An event-subscriber binding ingests cron tickers — these should run unlimited regardless of how the agent's other bindings are configured.
- A
webhookbinding receives burstygithubevents; you want a cap so a runaway CI pipeline can't spam the LLM.
The agent-level tool_rate_limits from Phase 9.2 still applies
when no per-binding override is declared. When an override IS
declared on the matched binding, it FULLY REPLACES the global
decision for that binding (no fall-through to global patterns).
Wire shape
agents:
- id: ana
inbound_bindings:
- plugin: whatsapp
instance: free_tier
tool_rate_limits:
patterns:
marketing_send_drip:
rps: 0.167 # 10 per minute
burst: 10
essential_deny_on_miss: true
"memory_*":
rps: 1.0
burst: 5
_default:
rps: 5.0
burst: 20
- plugin: whatsapp
instance: enterprise
# no override → unlimited (or global default if defined)
- plugin: webhook
instance: github
tool_rate_limits:
patterns:
"*": # everything on this binding
rps: 2.0
burst: 10
Field reference
| Field | Type | Default | Meaning |
|---|---|---|---|
patterns.<glob>.rps | f64 | required | Tokens added per second. 0.167 ≈ 10/min. |
patterns.<glob>.burst | u64 | ceil(rps).max(1) | Initial bucket capacity. Higher burst = more leniency for bursty workloads. |
patterns.<glob>.essential_deny_on_miss | bool | false | When true, the bucket is fail-closed: if LRU pressure evicts the bucket and the key is reallocated, the next call denies once before allocating fresh. Use for paid / quota-bound tools where you'd rather drop a single call than risk leaking quota. |
patterns._default | object | none | Reserved key matched when no explicit pattern catches the tool. Same shape as other entries. |
Glob matching
Same minimal glob as the agent-level patterns:
*alone matches anything.foo*matches strings starting withfoo.*barmatches strings ending withbar.foo*barmatches strings startingfooand endingbar.
Patterns evaluate in deterministic alphabetical order; first
match wins. _default is always last.
Per-binding fully replaces global
Important semantic — different from how allowed_tools /
outbound_allowlist overrides work in some other crates:
- Binding declares
tool_rate_limits: Some(map)→ ONLY the patterns inmapapply. Tools that don't match any pattern in the override (and don't match_defaulteither) become unlimited on that binding, regardless of any global agent-level config. - Binding declares
tool_rate_limits: None(or the field is omitted) → fall through to agent-levelagents.<id>.tool_rate_limitsfrom Phase 9.2.
Operators wanting "binding tighter, with global fallback for tools the binding doesn't mention" must explicitly include those global patterns in the binding map. The full-replace semantic is documented this way to keep the resolution path unambiguous and predictable in audit logs.
Free / pro / enterprise example
- id: ana
inbound_bindings:
# Free tier — strict caps on paid tools
- plugin: whatsapp
instance: free_tier
tool_rate_limits:
patterns:
marketing_send_drip:
rps: 0.167 # 10/min
burst: 10
essential_deny_on_miss: true
web_search:
rps: 0.083 # 5/min
burst: 5
_default:
rps: 1.0
burst: 5
# Pro tier — relaxed caps
- plugin: whatsapp
instance: pro
tool_rate_limits:
patterns:
marketing_send_drip:
rps: 1.667 # 100/min
burst: 100
_default:
rps: 10.0
burst: 50
# Enterprise — unlimited (no override)
- plugin: whatsapp
instance: enterprise
A single marketing_send_drip flood from free_tier cannot
deny calls on pro or enterprise; their buckets are
independent.
Bucket lifecycle + LRU eviction
Buckets are allocated lazily — the first call for a given
(agent, binding_id, tool) triple allocates a TokenBucket.
Bucket cardinality is capped (default 10_000); the cap fires
only when allocating a new bucket would push the count past the
limit. Eviction picks the stalest bucket by last_touch (a
monotonic counter stamped on every try_acquire). Steady-state
traffic amortises eviction cost to near zero.
When the evicted bucket's config had essential_deny_on_miss = true, the key is stamped into a separate "recently evicted
essentials" set. The next call for that key consumes the entry
and denies once, then allocates a fresh bucket. This adapts the
fail-open + ESSENTIAL deny opt-in pattern from upstream
production agent CLIs to the LRU eviction context.
Phase 72 audit log marker
Every denial emits a tracing::info! event with the canonical
marker:
rate_limited:tool=<name>,binding=<id|none>,rps=<f64>
Example:
rate_limited:tool=marketing_send_drip,binding=whatsapp:free_tier,rps=0.167
binding=none indicates a denial on the legacy single-tenant
path (delegation receive, heartbeat, pre-Phase-82.7 callers).
Operator audit pipelines parse this format for billing / SaaS
fair-use metrics. The format is wire-shape stable —
format_rate_limit_hit in nexo-tool-meta is the source of
truth.
Hot-reload behaviour
Per-binding overrides participate in the existing Phase 18 config snapshot path. After a yaml reload:
- Existing buckets keep their state until naturally aged out by LRU.
- New buckets allocated post-reload use the new config.
- Worst case is a single turn of slack while the snapshot swap propagates.
For an immediate cold start of all buckets, restart the daemon.
Admin RPC integration
The limiter exposes drop_buckets_for_agent(agent: &str) so the
admin RPC delete-agent path (Phase 82.10) can clear (agent, *, *) cells when an operator removes an agent. Without this,
buckets would leak until LRU eviction.
Observability
Useful tracing fields when investigating denials:
agent_id— which agent ran the callmarker— canonicalrate_limited:...string (parse for binding/tool/rps)tool— tool name as the LLM saw it
Tracking metrics:
nexo_rate_limit_buckets_active— total live buckets across all agents (TODO; not yet emitted as Prometheus)
Limitations
- Bucket evictions during a sustained burst can briefly allow a
burst's worth of extra calls before the new bucket settles.
Use
essential_deny_on_miss: trueon tools where this is unacceptable. - The marker's
rps=field reflects the configured rate at the time of denial. After a hot-reload that changes the rate, the marker may show the old value for buckets that haven't been re-resolved yet. - The
_defaultpattern only applies within its own scope: a per-binding_defaultdoes not fall through to the global_default.
See also
- Rate limiting & retry (LLM provider) — different layer, applies to outbound LLM calls.
- Sender rate limit — drop-at-intake guard, runs before this limiter (see
crates/core/src/agent/sender_rate_limit.rs). - Capability toggles — env-var-driven feature toggles separate from per-binding policy.
Context optimization
Four independent mechanisms reduce the number of tokens sent to the LLM
on every request, without changing the agent's behavior. They live
under llm.context_optimization in llm.yaml and can be flipped per
agent under agents.<id>.context_optimization.
# config/llm.yaml
context_optimization:
prompt_cache:
enabled: true # default
long_ttl_providers: [anthropic, vertex]
compaction:
enabled: false # default off — opt in per agent
compact_at_pct: 0.75
tail_keep_tokens: 20000
tool_result_max_pct: 0.30
summarizer_model: "" # empty = reuse the agent's main model
lock_ttl_seconds: 300
token_counter:
enabled: true # default
backend: auto # auto | anthropic_api | tiktoken
cache_capacity: 1024
workspace_cache:
enabled: true # default
watch_debounce_ms: 500
max_age_seconds: 0 # 0 = never force refresh (notify is authoritative)
1. Prompt caching
Materializes the system prompt as a list of cache_control blocks on
the Anthropic wire so the stable prefix (workspace + skills + tool
catalog + binding glue) is billed at 0.1× input cost on every cache
hit. OpenAI / DeepSeek paths surface their automatic
prompt_tokens_details.cached_tokens field through the same
CacheUsage struct. Gemini and MiniMax flatten the blocks into the
legacy system slot today (warned once per process).
Block layout (4 cache breakpoints, the Anthropic max):
workspace— IDENTITY / SOUL / USER / AGENTS / MEMORY (Ephemeral1h)skills— per-binding skill catalog (Ephemeral1h)binding_glue— peer directory + per-binding system prompt + language directive (Ephemeral1h)channel_meta— sender id + per-turn context (Ephemeral5m)
Tools array is sorted alphabetically by name (the registry iterates a
non-deterministic DashMap) and the last tool gets a 1h
cache_control marker when cache_tools=true.
What to watch
llm_cache_read_tokens_total{agent, provider, model}— should dominatellm_cache_creation_tokens_totalafter the first turn of a warm session.llm_cache_hit_ratio{agent}— target >0.7 on multi-turn agents; <0.3 means you're paying the write premium without the discount.
When to flip off
- Provider rejects the request with a 400 mentioning
cache_control(very old model). Mitigation: the framework already strips markers forclaude-2.x; if Anthropic adds another exception, overrideANTHROPIC_CACHE_BETA="..."to disable the beta header. - A custom-built LLM gateway in front of Anthropic doesn't pass the
cache_controlfield through.
2. Compaction (online history folding)
When the pre-flight token estimate crosses compact_at_pct * effective_window, the agent runs a secondary LLM call to fold
history[..tail_start] into a single summary string. The summary
replaces the head; the last tail_keep_tokens worth of turns ride
forward verbatim. Subsequent turns prepend the summary as a synthetic
user/assistant pair so Anthropic's role-alternation rule stays valid.
Defaults are intentionally conservative: off by default. Roll out
per agent via agents.<id>.context_optimization.compaction: true.
agents:
- id: ana
context_optimization:
compaction: true # ana opts in early, others stay off
What to watch
llm_compaction_triggered_total{agent, outcome}— outcomes areok,failed,lock_held,no_boundary,tool_result_truncated.llm_compaction_duration_seconds{agent, outcome="ok"|"failed"}— a rising p99 means the summarizer model is overloaded; lowercompact_at_pctso triggers are smaller (cheaper) and more frequent.
When to flip off
- Quality regression in long sessions — the summary may be losing
active-task state. Inspect
compactions_v1rows in the SQLite store to see what was folded; bumptail_keep_tokensso more verbatim context survives. - Lock contention spikes — multiple processes (NATS multi-node) racing on the same session. The lock is per-session so this only happens with sticky-session misrouting; fix at the broker level rather than disabling compaction.
Safety nets
compaction_locks_v1carries TTL (lock_ttl_seconds) — a crashed compactor doesn't deadlock the session; the next acquire after the TTL wins automatically.- Audit log: every successful compaction inserts a row in
compactions_v1with the summary text + token cost. Inspect withsqlite3 memory.db "SELECT * FROM compactions_v1 WHERE session_id = ? ORDER BY compacted_at DESC". - Failure path: 3 retries with backoff; on total failure the original history goes to the LLM unchanged (graceful degradation, never silent data loss).
3. Token counting (pre-flight sizing)
TokenCounter trait with two backends:
- AnthropicTokenCounter — calls
POST /v1/messages/count_tokens. Exact (matches billing). LRU-cached onblake3(payload): the stable tools+identity prefix hashes the same on every turn, so the network round-trip happens ~once per process lifetime. - TiktokenCounter — offline
cl100k_baseapproximation. Drift vs Anthropic billing measured at 5–15%. Fine for budget gating, not for hard limits.
The cascade wraps the primary in a CircuitBreaker
(failure_threshold=3, 30s→300s backoff): on count_tokens outage the
agent loop falls back to tiktoken so the request still goes through.
Once the breaker has opened at least once, is_exact() flips to false
for the rest of the process so dashboards don't conflate sample
populations.
What to watch
llm_prompt_tokens_estimated{agent, provider, model}— compare againstllm_prompt_tokens_drift{...}(histogram in percent).- A drift p99 climbing past 20% means the active backend is wrong for
your model — switch from
tiktokentoanthropic_api(or vice versa for non-Anthropic providers).
When to flip off
- The agent runs against a self-hosted gateway that doesn't honor
count_tokens. Setbackend: tiktokento skip the round-trip.
4. Workspace bundle cache
Reads of IDENTITY / SOUL / USER / AGENTS / MEMORY MDs go through an
in-memory Arc<WorkspaceBundle> cache keyed by (root, scope, sorted extras). A notify-debouncer-full watcher (default 500ms) drops
every entry under a workspace root when any *.md changes. Non-MD
file changes are ignored.
What to watch
workspace_cache_hits_total{path}should dominateworkspace_cache_misses_total{path}once the cache is warm.workspace_cache_invalidations_total{path}rising without operator edits points to a tool that writes to the workspace too aggressively.
When to flip off
- NFS / FUSE filesystems where
notify(7)drops events. Setworkspace_cache.max_age_seconds: 60(or similar) to force a refresh after the absolute TTL even without a watch event.
Per-agent overrides
The four enables — and only the enables — can be flipped per agent in
agents.yaml. The numeric knobs (compact_at_pct, tail_keep_tokens,
watch_debounce_ms, …) stay global to keep the surface narrow.
agents:
- id: ana
context_optimization:
prompt_cache: true
compaction: true
token_counter: true
workspace_cache: true
- id: bob
context_optimization:
prompt_cache: false # bob runs against a gateway that strips cache_control
Hot-reload behavior
Changing global knobs (llm.yaml) takes effect on the next request
once the reload coordinator picks up the file change (Phase 18). For
per-agent enables, the override rides on Arc<AgentConfig> inside
RuntimeSnapshot and is observed on the next
policy_for(...) lookup. The LlmAgentBehavior struct itself still
caches its compactor / prompt_cache_enabled fields at construction —
toggling those without a process restart requires the future
ArcSwap<CompactionRuntime> refactor noted in proyecto/FOLLOWUPS.md.
Rollout playbook
- Deploy with everything at defaults —
prompt_cache=true,compaction=false,token_counter=true,workspace_cache=true. - Watch
llm_cache_hit_ratiofor 24h. Expect it to climb to >0.7 on chatty agents; if it stays low, check that the workspace bundle is stable across turns (no MD writes mid-session). - Pick one agent, opt it into compaction (
agents.<id>.context_optimization.compaction: true), reload config, watch for a week. - If
llm_compaction_triggered_total{outcome="ok"}> 0 and quality feedback is positive, roll compaction out to the rest of the fleet. - If drift on
llm_prompt_tokens_driftis consistently <10%, leavetoken_counter.backend: auto. If higher, considerbackend: tiktokenfor non-Anthropic providers — saves the round-trip without losing accuracy you didn't have anyway.
Link understanding
When a user message contains URLs, the runtime can fetch them, extract
the main text, and inject a # LINK CONTEXT block into the system
prompt for that turn. The agent stops saying "I can't see what's at
that link" and starts answering against the actual page content.
The feature is off by default. Opt in per agent (and optionally override per binding).
Per-agent config
# config/agents.yaml
agents:
- id: ana
link_understanding:
enabled: true # default: false
max_links_per_turn: 3 # cap URLs fetched per message
max_bytes: 262144 # 256 KiB per response, streamed
timeout_ms: 8000 # per-fetch HTTP timeout
cache_ttl_secs: 600 # 0 disables cache
deny_hosts: # appended to built-in denylist
- internal.corp
Built-in denylist (always applied, cannot be removed):
localhost, 127.0.0.1, ::1, metadata.google.internal,
169.254.169.254. Defense against SSRF to internal endpoints.
Per-binding override
Per-binding link_understanding overrides the agent default. Useful
to disable on a noisy channel:
agents:
- id: ana
link_understanding: { enabled: true }
bindings:
- inbound: plugin.inbound.whatsapp.*
link_understanding: { enabled: false } # narrow on WA
- inbound: plugin.inbound.telegram.*
# inherits agent default (enabled: true)
null / omitted = inherit. Any object = full replace.
What gets injected
For each fetched URL, one bullet:
# LINK CONTEXT
- https://example.com/post — Title of the page
First paragraphs of main text, collapsed to ~max_bytes characters,
HTML stripped, scripts and styles dropped.
The block lands inside the system prompt for that turn only. Cache hits skip the fetch but still render the block.
Hard caps (cannot be raised by config)
| Cap | Value |
|---|---|
| URL length | 2048 chars |
| Redirect chain | 5 hops |
| User-Agent | nexo-link-understanding/0.1 |
| Response stream cutoff | max_bytes (drops the rest) |
| Newlines / control chars in extracted text | sanitised (prompt-injection guard) |
Operations
- A single shared
LinkExtractor(HTTP client + LRU cache, capacity 256) is built at boot and reused by every agent runtime in the process. - Cache is in-process only. Restarts cold.
- Telemetry exported on
/metrics:nexo_link_understanding_fetch_total{result="ok|blocked|timeout|non_html|too_big|error"}— counter, one increment per fetch attempt.nexo_link_understanding_cache_total{hit="true|false"}— counter, incremented on every TTL-cached lookup so dashboards can compute hit-rate without instrumenting the agent loop.nexo_link_understanding_fetch_duration_ms— histogram (single series, no labels). Only observed for attempts that actually issued an HTTP request — cache hits and host-blocked URLs skip it so latency percentiles reflect real network work.
When to leave it off
- Agents talking to untrusted senders where the agent must not be pivoted into fetching attacker-controlled URLs.
- Channels with strict latency budgets — a fetch can add up to
timeout_msto the turn. - Privacy-sensitive deployments where outbound HTTP from the agent host is not allowed.
Web search
The web_search built-in tool lets an agent query the web through one
of four providers: Brave, Tavily, DuckDuckGo, Perplexity.
The runtime owns provider selection, caching, sanitisation, and circuit
breaking — agents only see results.
The feature is off by default. Operators opt in per agent (and optionally override per binding).
Per-agent config
# config/agents.yaml
agents:
- id: ana
web_search:
enabled: true # default false
provider: auto # "auto" | "brave" | "tavily" | "duckduckgo" | "perplexity"
default_count: 5 # 1..=10
cache_ttl_secs: 600 # 0 disables cache
expand_default: false # default value of `expand` arg
provider: auto
Picks the first credentialed provider in this order:
brave(envBRAVE_SEARCH_API_KEY)tavily(envTAVILY_API_KEY)perplexity(envPERPLEXITY_API_KEY, requires theperplexityfeature)duckduckgo(no key — bundled by default; the always-available fallback)
DuckDuckGo scrapes html.duckduckgo.com and is rate-limited / captcha-prone;
the runtime detects bot challenges and trips the breaker so the next call
rotates to a different provider.
Per-binding override
Same shape as link_understanding: null (default) inherits the agent
value, any object replaces it.
agents:
- id: ana
web_search: { enabled: true }
bindings:
- inbound: plugin.inbound.whatsapp.*
web_search: { enabled: false } # silent on WA
- inbound: plugin.inbound.telegram.*
# inherits agent default
Tool surface
The LLM sees this signature:
{
"name": "web_search",
"parameters": {
"query": "string (required)",
"count": "integer (1-10, optional)",
"provider": "string (optional override)",
"freshness": "day | week | month | year (optional)",
"country": "ISO-3166 alpha-2 (optional)",
"language": "ISO-639-1 (optional)",
"expand": "boolean (optional)"
}
}
Return shape:
{
"provider": "brave",
"query": "rust async runtimes",
"from_cache": false,
"results": [
{
"url": "https://example.com/post",
"title": "Title",
"snippet": "First 4 KiB of the description, sanitised.",
"site_name": "example.com",
"published_at": "2026-04-20T00:00:00Z"
}
]
}
When expand: true and Phase 21 link understanding is enabled, the
top three hits also get a body field populated by the shared
LinkExtractor. Bodies obey the same denylist + size caps that
Link understanding describes.
Cache
In-process SQLite cache shared across every agent. Key format:
sha256(SCHEMA_VERSION || provider || query || canonical_params)
canonical_params excludes provider (router decides) and expand
(post-processing). cache_ttl_secs: 0 disables caching entirely.
Operators that want a separate cache file or schema migration set
web_search.cache.path in web_search.yaml (planned — see
FOLLOWUPS).
Circuit breaker
Every provider call goes through nexo_resilience::CircuitBreaker
keyed web_search:<provider>. Default config: 5 consecutive failures
trip the breaker, exponential backoff up to 120 s. Open-state calls
return ProviderUnavailable(provider) immediately and the router
rotates to the next candidate (when called via auto-detect).
Sanitisation
Every title, url, and snippet returned by a provider passes
through sanitise_for_prompt:
- control chars stripped,
- CR / LF / tab collapsed to single spaces,
- runs of whitespace collapsed,
- byte-capped at 4 KiB (snippet) / 512 B (title) / 2 KiB (URL),
- truncation respects UTF-8 char boundaries.
This is the same defence-in-depth Phase 19 (language directive) and
Phase 21 (# LINK CONTEXT) apply: SERPs are attacker-controlled input.
Telemetry
Exported on /metrics:
nexo_web_search_calls_total{provider,result}— counter, one increment per provider attempt.resultisok(provider returned hits),error(network / HTTP / parse failure), orunavailable(the breaker short-circuited the call before it left the process).nexo_web_search_cache_total{provider,hit}— counter, every TTL-cached lookup.provideris the first candidate (the one the cache key is built from). Compute hit rate ascache_total{hit="true"} / sum(cache_total).nexo_web_search_breaker_open_total{provider}— counter; one increment per request the breaker rejected. Pair withcircuit_breaker_state{breaker="web_search:<provider>"}to alert on sustained open state vs a flap.nexo_web_search_latency_ms{provider}— histogram. Only observed for attempts that issued an HTTP request, so the percentile reflects real provider latency (cache hits and breaker short-circuits would pull p50 down to 0 and hide regressions).
When to leave it off
- Privacy-sensitive deployments where outbound HTTP from the agent host is not allowed.
- Channels where the cost of a noisy SERP in the prompt outweighs the
agent's value (use per-binding
enabled: false). - Agents that already have
link_understandingfor the URLs the user shares — no need for SERP duplication.
Web fetch
The web_fetch built-in tool lets an agent retrieve the cleaned
body text + title for one or more URLs the agent already knows.
Companion to Web search: web_search finds
URLs, web_fetch retrieves them.
Distinct from web_search.expand=true because the agent often
knows the URL up-front (skill output, RSS poll, calendar
attachment, user message) and would otherwise have to either
hallucinate a search query or shell out to a fetch-url
extension.
When to use which
| Scenario | Tool |
|---|---|
| Agent needs to find content matching a query | web_search |
Agent has a URL from a web_search hit and wants the body | web_search(expand=true) |
| Agent has a URL from a poller / skill / user message | web_fetch |
| Agent has a list of URLs to triage | web_fetch(urls=[...]) |
Tool signature
{
"name": "web_fetch",
"parameters": {
"urls": ["https://example.com/article", "https://other.com/page"],
"max_bytes": 65536 // optional; clamped to deployment cap
}
}
Response shape:
{
"results": [
{
"url": "https://example.com/article",
"title": "Example article",
"body": "First paragraph...",
"ok": true
},
{
"url": "https://internal.intranet.local/private",
"ok": false,
"reason": "fetch failed (host blocked, timeout, non-HTML, oversized, or transport error). Check `nexo_link_understanding_fetch_total{result}` for the bucket."
}
],
"count": 2
}
A bad URL returns a {ok: false, reason} row instead of bailing
the whole call, so the agent can still consume the successful
ones. Per-call cap of 5 URLs; longer lists get trimmed with a
warn log.
Configuration
web_fetch has no dedicated config. It rides on
Link understanding:
link_understanding.enabled— gates the tool entirely. With itfalse, every fetch returns{ok: false, reason: "disabled by policy"}.link_understanding.max_bytes— deployment-wide ceiling. The tool'smax_bytesarg can shrink but never grow past this.link_understanding.deny_hosts— host blocklist (loopback, private subnets, internal cloud metadata endpoints, plus whatever the operator added).link_understanding.timeout_ms— per-fetch HTTP timeout.link_understanding.cache_ttl_secs— cache TTL. Successful fetches are cached so a secondweb_fetchof the same URL inside the TTL is free.
Per-binding overrides via EffectiveBindingPolicy::link_understanding
(see Per-binding capability override).
Telemetry
web_fetch reuses every counter the auto-link pipeline emits.
There's no separate dashboard:
nexo_link_understanding_fetch_total{result}—ok/blocked/timeout/non_html/too_big/error.nexo_link_understanding_cache_total{hit}—true/false.nexo_link_understanding_fetch_duration_ms— histogram, only populated when an HTTP request actually went out (cache hits and host-blocked URLs skip it so percentiles reflect real fetch work).
The bundled Grafana dashboard
(ops/grafana/nexo-llm.json)
already plots all three.
Why a per-call cap of 5 URLs
A runaway agent given the prompt "fetch every link in this 10k
RSS dump" would otherwise queue thousands of HTTP requests
synchronously, blowing the prompt budget and hammering the
target hosts. 5 covers every realistic agentic workflow
(read 3 candidates, pick the best two, summarise) while leaving
a clear ceiling. Operators who want batch behaviour should
spawn a TaskFlow that calls web_fetch
in chunks with cursor persistence.
Comparison to extensions
The fetch-url Python extension does roughly the same thing.
web_fetch differs in three ways:
- In-process — no subprocess spawn, no Python interpreter, no extension wire protocol. Sub-100ms cold path on the happy case.
- Shared cache + telemetry — links the user shares (auto-
expanded by Phase 21 link-understanding) AND links the
agent fetches via
web_fetchpopulate the same LRU. The second access is always free. - Same security defaults — same deny-host list, same size cap, same timeout. Operators tune one knob, two surfaces honour it.
Use the extension when the runtime path is wrong shape (custom
auth, post-only endpoints, non-HTML responses you want raw).
Use web_fetch for the standard "give me the article" case,
which is most of them.
Implementation
The tool lives at
crates/core/src/agent/web_fetch_tool.rs::WebFetchTool and is
registered for every agent unconditionally in src/main.rs.
The per-binding link_understanding.enabled policy gates
whether the underlying fetch happens; the tool itself is always
visible in the agent's tool list so operators can write
"call web_fetch on URL X" prompts without needing a per-agent
web_fetch.enabled flag.
Source of truth for FOLLOWUPS W-2 closure.
Pairing protocol
Two coexisting protocols ship in nexo-pairing:
- DM-challenge inbound gate — opt-in per binding. Unknown senders on WhatsApp / Telegram receive a one-time human-friendly code; the operator approves them via CLI. Existing senders pass through unchanged.
- Setup-code QR — operator-initiated.
nexo pair startissues a short-lived HMAC-signed bearer token + a gateway URL, packs them into a base64url payload, and renders a QR. A companion app scans, presents the token to the daemon, and gets a session token in return.
The feature is off by default. Existing setups see no behaviour
change until the operator flips pairing_policy.auto_challenge on a
binding.
DM-challenge gate
Per-binding config
# config/agents.yaml
agents:
- id: ana
inbound_bindings:
- plugin: whatsapp
instance: personal
pairing_policy:
auto_challenge: true # default false
The gate runs before the plugin publishes to the broker. Three outcomes per inbound message:
| Outcome | When | Plugin action |
|---|---|---|
Admit | sender in pairing_allow_from (or policy off) | publish as normal |
Challenge { code } | unknown sender, auto_challenge: true, slot free | reply with code, drop message |
Drop | max-pending exhausted (3 per channel/account) | silent drop |
Operator workflow
$ nexo pair list
CODE CHANNEL ACCOUNT CREATED SENDER
K7M9PQ2X whatsapp personal 2026-04-25T13:21:00Z +57311...
$ nexo pair approve K7M9PQ2X
Approved whatsapp:personal:+57311... (added to allow_from)
The next message from +57311... admits through the gate.
pair list only shows pending challenges by default. Use
--all to also dump every active row in pairing_allow_from
(approved + seeded), and --include-revoked to keep soft-deleted
entries in the listing for audit:
$ nexo pair list --all
No pending pairing requests.
CHANNEL ACCOUNT SENDER VIA APPROVED REVOKED
telegram cody_nexo_bot 1194292426 seed 2026-04-26 17:52:10 UTC -
whatsapp personal +57311... cli 2026-04-25 13:21:00 UTC -
$ nexo pair list --all --include-revoked --json | jq '.allow[0]'
{
"channel": "whatsapp",
"account_id": "personal",
"sender_id": "+57311...",
"approved_via": "cli",
"approved_at": "2026-04-25T13:21:00Z"
}
--json always returns { "pending": [...], "allow": [...] } so
consumers get a stable shape regardless of --all.
Cache + revoke
The gate caches decisions for 30 s to keep SQLite off the hot path. Revokes (and freshly-seeded admits) are eventually consistent within that window:
$ nexo pair revoke whatsapp:+57311...
Revoked whatsapp:+57311...
For an immediate effect, trigger a hot-reload — the coordinator
runs PairingGate::flush_cache as a post-reload hook (Phase 70.7),
so nexo reload (or any file-watched config edit) drops the cache
and the next inbound message re-queries the store:
$ nexo reload
A daemon restart still works as a hammer when reload is disabled.
Migrating an existing bot
If you already have known senders, seed them so the gate doesn't
challenge mid-conversation when you flip auto_challenge: true:
$ nexo pair seed whatsapp personal +57311... +57222... +57333...
Seeded 3 sender(s) into whatsapp:personal allow_from
seed is idempotent; running it twice is safe and re-activates any
sender that was previously revoked.
Setup-code QR
Issuing
$ nexo pair start --public-url wss://nexo.example.com --qr-png /tmp/p.png --json
{
"url": "wss://nexo.example.com",
"url_source": "pairing.public_url",
"bootstrap_token": "eyJwcm9maWxlIjoi...",
"expires_at": "2026-04-25T13:32:00Z",
"payload": "eyJ1cmwi..."
}
payload is what goes in the QR. The companion decodes it to recover
{url, bootstrap_token, expires_at}, opens the WebSocket, and
presents the token as Authorization: Bearer <bootstrap_token>.
URL resolution
Priority chain (first non-empty wins):
-
--public-url(CLI flag) -
tunnel.url(Phase tunnel — TODO: wire when accessor lands) -
gateway.remote.url -
LAN bind address (when
gateway.bind=lan) -
fail-closed: the daemon refuses to issue a code on a loopback-only gateway. As of Phase 70.5 the CLI also prints a ready-to-run
nexo pair seed <channel> <account> <SENDER>for every plugin instance configured underconfig/plugins/, so a dev-machine operator can skip the QR flow entirely:$ nexo pair start --ttl-secs 300 Pairing-start needs a non-loopback gateway URL. For local testing you usually don't need the QR flow at all — seed the operator's chat into the allowlist directly: nexo pair seed telegram cody_nexo_bot <YOUR_TELEGRAM_USER_ID> nexo pair seed whatsapp default <YOUR_WHATSAPP_NUMBER> Or, to keep using the QR flow, set one of: - `pairing.public_url` in config/pairing.yaml - `--public-url <wss://…>` flag - run `nexo` with the tunnel enabled (writes tunnel.url)
ws/wss security policy
Cleartext ws:// is allowed only on hosts the operator can
reasonably trust to be private:
127.0.0.1/::1(loopback)- RFC1918 (10/8, 172.16/12, 192.168/16)
- link-local (169.254/16)
*.localmDNS hostnames10.0.2.2(Android emulator)- Any host listed in
pairing.ws_cleartext_allow_extra
Everything else exigirá wss://. This matches OpenClaw's posture in
research/src/pairing/setup-code.ts.
Token format
b64u(claims_json) + "." + b64u(hmac_sha256(secret, claims_json))
claims_json={"profile":"companion-v1","expires_at":"...","nonce":"<32 hex>","device_label":"..."}secret= 32 bytes in~/.nexo/secret/pairing.key(auto-generated on first boot with 0600 perms; rotate by deleting + restarting).
Verification is constant-time (subtle crate) so timing leaks don't
discriminate between "wrong sig" and "wrong claims".
Threat model
| Concern | Mitigation |
|---|---|
| Brute-force pairing code | 32^8 ≈ 10^12 keyspace; 60 min TTL; max 3 pending per (channel, account) |
| Token replay after expiry | TTL on expires_at (default 10 min); HMAC verify fails closed |
| Token forgery | HMAC-SHA256 with 32-byte secret; constant-time compare |
| Secret leak | Rotate via rm ~/.nexo/secret/pairing.key && restart; all in-flight tokens invalidate |
| TOCTOU on approve | Single SQL transaction (approve reads + insert + delete in one tx) |
| ws cleartext on hostile network | Refuse to issue cleartext URL outside private-host allowlist |
| DoS via flood of pending requests | Max 3 per (channel, account); TTL 60 min auto-prunes |
Storage layout
Two SQLite tables in <memory_dir>/pairing.db:
pairing_pending (channel, account_id, sender_id PRIMARY KEY,
code, created_at, meta_json)
pairing_allow_from (channel, account_id, sender_id PRIMARY KEY,
approved_at, approved_via, revoked_at)
Soft-delete (revoked_at) keeps historical context: an operator can
later see "+57311 was approved on X, revoked on Y" for audit.
When to leave it off
- Single-user setups where the operator is the only sender — the gate adds a SQL hit per message for no security gain.
- Bots that take public input by design (e.g. a self-service support bot) — the gate would block every customer.
- Until you have an
agent setup web-search-style wizard, manualpair seedis the only friendly migration path.
Adapter registry
Each channel that participates in pairing implements
PairingChannelAdapter in its plugin crate. The adapter owns three
channel-specific decisions the runtime cannot make on its own:
normalize_sender(raw)— canonicalise inbound sender ids before the gate hits the store. WhatsApp strips@c.us/@s.whatsapp.netand prepends+; Telegram lower-cases@usernameand passes numeric chat ids through.format_challenge_text(code)— render the operator-facing pairing message. The default is plain UTF-8; the Telegram adapter overrides it to escape MarkdownV2 reserved characters and wrap the code in backticks so the user can long-press to copy.send_reply(account, to, text)— publish the challenge through the channel's outbound topic (plugin.outbound.{whatsapp,telegram}[.<account>]) using the payload shape that channel's dispatcher expects.
The bin (src/main.rs) constructs a PairingAdapterRegistry at boot
and registers the WhatsApp + Telegram adapters. The runtime consults
the registry on every inbound event whose binding has
pairing.auto_challenge: true. Channels with no registered
adapter fall back to a hardcoded broker publish that mirrors the
legacy text on plugin.outbound.{channel} — operators still see the
challenge in their channel, but without per-channel formatting.
Telemetry lives under
pairing_inbound_challenged_total{channel,result} with result one of
delivered_via_adapter, delivered_via_broker, publish_failed,
no_adapter_no_broker_topic, so dashboards can split adapter vs.
fallback delivery rates per channel.
CLI reference
nexo pair start [--for-device <name>] [--public-url <url>]
[--qr-png <path>] [--ttl-secs <n>] [--json]
nexo pair list [--channel <id>] [--all] [--include-revoked] [--json]
nexo pair approve <CODE> [--json]
nexo pair revoke <channel>:<sender_id>
nexo pair seed <channel> <account_id> <sender_id> [<sender_id>...]
nexo pair help
Benchmarks
The workspace ships criterion benchmark suites for every hot path
that runs on the data plane. CI executes them on every PR + weekly
on main so regressions are visible before merge.
Quick run
# Single crate:
cargo bench -p nexo-resilience
# Single bench within a crate:
cargo bench -p nexo-broker --bench topic_matches
# Single group within a bench:
cargo bench -p nexo-broker --bench topic_matches -- 'topic_matches/wildcard'
Output goes to target/criterion/. Open index.html under that
directory in a browser for the full HTML report.
Coverage matrix
| Crate | Bench | What it measures | Run target |
|---|---|---|---|
nexo-resilience | circuit_breaker | CircuitBreaker::allow (closed + open), on_success, on_failure, 8-task concurrent allow contention | sub-100ns per call |
nexo-broker | topic_matches | NATS-style pattern matching (exact, single-wildcard *, multi-wildcard >, 50-pattern storm) | sub-100ns per match |
nexo-broker | local_publish | End-to-end LocalBroker::publish with 0 / 1 / 10 / 50 subscribers (DashMap scan + try_send + slow-consumer drop counter) | sub-10µs at 50 subs |
nexo-llm | sse_parsers | OpenAI / Anthropic / Gemini SSE parsers, 50-chunk fixtures (typical short answer) | chunks/sec scales linearly |
nexo-taskflow | tick | WaitEngine::tick at 10 / 100 / 1 000 active waiting flows | sub-millisecond at single-host scale |
What's NOT benched yet
These are tracked under Phase 35.5 follow-up:
nexo-coretranscripts FTS search — needs SQLite fixture seed before the bench is meaningful.nexo-coreredaction pipeline — wait for the local-LLM redaction backend (Phase 68.7) so we measure the real path operators ship.nexo-mcpencode_request/parse_notification_method— cheap to add; will land alongside an MCP-stdio round-trip bench.nexo-memoryvector-search recall — needs a public dataset baseline.
Add a bench by following the patterns in crates/<x>/benches/:
[dev-dependencies]addscriterion = "0.5"(withasync_tokioif you need a runtime).[[bench]]registersname = "<bench>"andharness = false.- Bench file uses
Throughput::Elements(N)so output is ops/sec, not rawns/iter. - Each
criterion_group!covers a distinct conceptual path — don't bundle unrelated paths.
CI integration
.github/workflows/bench.yml runs the matrix on:
- every PR that touches
crates/**,Cargo.lock, orCargo.toml - weekly on Sunday 04:00 UTC against
main - manual
workflow_dispatch
Each run uploads target/criterion/ as an artifact retained 30
days. PR runs save with --save-baseline pr-<number>; main runs
save as main. Compare locally with:
# Pull the artifact for PR #42
gh run download <run-id> --name bench-nexo-broker-<run-id>
# Compare against the local main baseline
cargo bench -p nexo-broker -- --baseline main
Today the CI job is informational — a regression doesn't
fail the PR. Once we have ~10 main runs of baseline data per
crate, the workflow gates on >10% regression per group. That's
Phase 35.6 done-criteria.
Known limitations
- GitHub Actions runners are noisy. The
ubuntu-latestshared runner tier shows ±5-10% variance on microbenchmarks. This is why we don't gate on small regressions yet — the baseline noise floor is itself ~5%. - Benches don't measure cold cache.
cargo bench's warm-up phase reaches steady-state CPU caches; first-call latency on a cold runtime is not captured. Add a separatebench_cold_*group when this matters (it usually doesn't — hot path is what matters at scale). - No cross-crate end-to-end benchmark yet. Phase 35.3 (load test rig) covers that; today's suites are per-crate microbenchmarks.
Reading criterion output
A typical run prints:
publish/mixed_50_subs time: [12.347 µs 12.451 µs 12.567 µs]
thrpt: [3.9786 Melem/s 4.0153 Melem/s 4.0494 Melem/s]
change: time: [-0.4% +0.3% +1.1%] (p = 0.62 > 0.05)
thrpt: [-1.1% -0.3% +0.4%]
No change in performance detected.
timeis the per-iteration latency (lower better).thrptis throughput (higher better) — only present when the bench declaredThroughput::Elements(N).changecompares against the previous run on the same hardware.p > 0.05means the difference is within noise.
Look for change reporting "Performance has regressed" with a
red bar — that's the signal a PR introduced a regression.
Native install (no Docker)
If you'd rather run nexo-rs directly on a Linux / macOS host — development loop, single-machine deploy, restricted container environment — this page walks through every step and names the bootstrap script that automates it.
Fast path
git clone git@github.com:lordmacu/nexo-rs.git
cd nexo-rs
./scripts/bootstrap.sh
scripts/bootstrap.sh verifies prerequisites, installs a local
NATS, creates the runtime directories, stages example configs, and
builds the agent binary. Re-runnable — each step is idempotent.
Keep reading for what it actually does (and what to do when a step needs manual intervention).
Prerequisites
| Tool | Required for | Notes |
|---|---|---|
| Rust (stable, edition 2021) | building the binaries | rust-toolchain.toml pins the channel |
| Git | cloning + per-agent workspace-git | default on most hosts |
| NATS ≥ 2.10 | the broker | binary or dev docker container is fine |
| SQLite ≥ 3.38 | memory + broker disk queue | ships with most distros |
| Chrome / Chromium | browser plugin (optional) | skip if you don't use the browser plugin |
| ffmpeg + ffprobe | media-related skills (optional) | skip if you don't ship those skills |
| yt-dlp / tesseract / tmux / ssh | individual skills (optional) | each skill declares its requires.bins |
On Ubuntu / Debian:
sudo apt update
sudo apt install -y build-essential pkg-config libsqlite3-dev git curl
On macOS:
xcode-select --install
brew install sqlite git
Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"
rustup component add rustfmt clippy
The repo's rust-toolchain.toml pins the channel; no manual version
pick is needed.
Install NATS
Pick one path:
Option A — native NATS server
# Linux x86_64
curl -L -o /tmp/nats.tar.gz \
https://github.com/nats-io/nats-server/releases/download/v2.10.20/nats-server-v2.10.20-linux-amd64.tar.gz
tar -xzf /tmp/nats.tar.gz -C /tmp
sudo mv /tmp/nats-server-*/nats-server /usr/local/bin/
For macOS: brew install nats-server.
Start it:
nats-server -js # foreground
nats-server -js -D # foreground with debug
# or, as a systemd service: see below
Option B — dev throwaway via Docker
Even on a "no-Docker" box, a single short-lived container for the broker is often fine:
docker run -d --name nexo-nats --restart unless-stopped \
-p 4222:4222 -p 8222:8222 nats:2.10-alpine
This is the same broker the compose stack would use; only the broker itself runs in a container.
Systemd unit (Linux, production)
/etc/systemd/system/nats-server.service:
[Unit]
Description=NATS Server
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/nats-server -js
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now nats-server
Build nexo-rs
git clone git@github.com:lordmacu/nexo-rs.git
cd nexo-rs
cargo build --release
The output is ./target/release/agent. Symlink it into $PATH if
you want:
sudo ln -sf "$(pwd)/target/release/agent" /usr/local/bin/agent
Prepare runtime directories
mkdir -p ./data/{queue,workspace,media,transcripts}
mkdir -p ./secrets # gitignored; holds API keys, nkey files, etc.
chmod 700 ./secrets # restrictive — the credential gauntlet checks this
Stage config
The repo ships config/*.yaml with safe defaults. Override whatever
you need:
# Optional: copy the ana sales agent template into the gitignored dir
cp config/agents.d/ana.example.yaml config/agents.d/ana.yaml
# Add an API key:
export MINIMAX_API_KEY=...
export MINIMAX_GROUP_ID=...
# or write to secrets/ files referenced from config/llm.yaml via ${file:...}
See Configuration — layout for the full reference.
Pair channels and set secrets
./target/release/agent setup
The wizard pairs WhatsApp / Telegram / Google / LLM credentials interactively. See Setup wizard.
First run
./target/release/agent --config ./config
Watch the startup summary — it tells you exactly which plugins loaded, which extensions were skipped and why, and whether the broker is reachable. If anything's missing, the log line names the specific file or env var to fix.
Run as a systemd service
/etc/systemd/system/nexo-rs.service:
[Unit]
Description=nexo-rs agent
Requires=nats-server.service
After=nats-server.service
[Service]
Type=simple
User=nexo
Group=nexo
WorkingDirectory=/srv/nexo-rs
Environment=RUST_LOG=info
Environment=AGENT_ENV=production
ExecStart=/usr/local/bin/agent --config /srv/nexo-rs/config
Restart=on-failure
RestartSec=5
# Optional: restrict where the agent can write
ReadWritePaths=/srv/nexo-rs/data /srv/nexo-rs/secrets
[Install]
WantedBy=multi-user.target
sudo useradd -r -s /bin/false -d /srv/nexo-rs nexo
sudo chown -R nexo:nexo /srv/nexo-rs
sudo systemctl daemon-reload
sudo systemctl enable --now nexo-rs
Logs:
journalctl -u nexo-rs -f
macOS launchd
~/Library/LaunchAgents/dev.nexo-rs.agent.plist:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key> <string>dev.nexo-rs.agent</string>
<key>WorkingDirectory</key><string>/Users/you/nexo-rs</string>
<key>ProgramArguments</key>
<array>
<string>/Users/you/nexo-rs/target/release/agent</string>
<string>--config</string><string>/Users/you/nexo-rs/config</string>
</array>
<key>EnvironmentVariables</key>
<dict>
<key>RUST_LOG</key><string>info</string>
</dict>
<key>RunAtLoad</key> <true/>
<key>KeepAlive</key> <true/>
</dict>
</plist>
launchctl load -w ~/Library/LaunchAgents/dev.nexo-rs.agent.plist
launchctl start dev.nexo-rs.agent
Verify
agent status # lists running agents
curl localhost:8080/ready # readiness
curl localhost:9090/metrics # Prometheus metrics
See Metrics + health.
Upgrading
cd nexo-rs
git pull
cargo build --release
sudo systemctl restart nexo-rs # Linux
# or: launchctl kickstart -k gui/$UID/dev.nexo-rs.agent # macOS
The graceful shutdown sequence drains in-flight work and persists the disk queue before exit.
Uninstalling
sudo systemctl disable --now nexo-rs nats-server
sudo rm /etc/systemd/system/{nexo-rs,nats-server}.service
sudo rm /usr/local/bin/{agent,nats-server}
sudo userdel nexo
rm -rf /srv/nexo-rs
See also
- Quick start — the five-minute dev loop
- Docker — container path for comparison
- Setup wizard
- Configuration
Debian / Ubuntu (.deb)
Fedora / RHEL / Rocky (.rpm)
Termux (Android) install
Run nexo-rs directly on an Android phone under Termux. No Docker, no server — a self-hosted agent in your pocket.
Use this path for a personal agent (one phone, one WhatsApp, one Telegram). For multi-tenant / multi-process deployments the regular Linux setup on a server is the right shape.
Quickest path — pre-built .deb
Once a v* release is published (recipe lives in
packaging/termux/build.sh), download the asset and install with
one command:
# Inside Termux on the phone:
curl -LO https://github.com/lordmacu/nexo-rs/releases/latest/download/nexo-rs_aarch64.deb
pkg install ./nexo-rs_aarch64.deb
The deb pulls the runtime deps Termux already ships (libsqlite,
openssl, ffmpeg, tesseract, python, yt-dlp). Its
postinst scaffolds ~/.nexo/{data,secret} and prints the next
steps. Skip the build-from-source section below if this works.
Root vs non-root
Everything in this guide runs without root. You do not need to root your phone to self-host nexo-rs on it.
Root only unlocks extras:
| Scenario | Needs root? |
|---|---|
| Build + run the agent daemon | ❌ no |
| Pair WhatsApp, Telegram, Google | ❌ no |
Local broker (broker.type: local) | ❌ no |
| Native NATS Go binary | ❌ no (installs to $PREFIX/bin) |
termux-wake-lock, Termux:Boot autostart | ❌ no |
Install skills from pkg (ffmpeg, tesseract, yt-dlp) | ❌ no |
| MCP client / server mode | ❌ no |
Browser plugin via cdp_url to a chromium you launched yourself | ❌ no |
Docker compose stack (via proot-distro or Linux Deploy) | ✅ yes |
| SELinux permissive (if Chromium sandbox misbehaves) | ✅ yes |
| Running multiple proot-distro containers side by side | ✅ yes |
| Bypass Android's battery optimizer more aggressively | ✅ yes |
Short version: don't root just for nexo-rs. Root if you want the full compose stack in a Linux-Deploy chroot, otherwise skip it.
What works
| Area | Status |
|---|---|
| Core runtime, memory, TaskFlow, dreaming | ✅ full |
Broker: type: local (in-process) or native NATS Go binary | ✅ full |
| LLM providers (MiniMax / Anthropic / OpenAI-compat / Gemini) | ✅ all rustls-based |
| WhatsApp plugin (pure Rust + Signal Protocol) | ✅ pairing via Unicode QR |
| Telegram plugin | ✅ Bot API over HTTP |
| Gmail / Google plugin + gmail-poller | ✅ OAuth over HTTP |
| Extensions (stdio + NATS) | ✅ spawn works |
| Skills: fetch-url, dns-tools, rss, weather, wikipedia, pdf-extract, brave-search, wolfram-alpha, summarize, translate | ✅ pure Rust |
| MCP client + server | ✅ stdio + HTTP |
| Health / metrics / admin HTTP servers (8080 / 9090 / 9091) | ✅ unprivileged ports |
What needs a tweak
| Thing | Workaround |
|---|---|
| Service manager (no systemd) | termux-services (runit) or tmux + nohup |
| Run at boot | install the Termux:Boot app + drop a script in ~/.termux/boot/ |
| Survives screen-off | termux-wake-lock (from the Termux:API add-on) before running the agent |
| Browser plugin (Chrome/Chromium) | use cdp_url: to a chromium you start manually with --no-sandbox --disable-dev-shm-usage; or disabled: [browser] if you don't need it |
| Secrets file permission gauntlet | export CHAT_AUTH_SKIP_PERM_CHECK=1 (Android filesystem perms model differs) |
| WhatsApp public tunnel (cloudflared) | skip the public tunnel; pair locally via Unicode QR rendered on the terminal |
| Docker / compose | use broker.type: local or native NATS binary — no containers involved |
Prerequisites
From a fresh Termux install:
pkg update
pkg install -y rust git curl sqlite openssl clang pkg-config
Optional (enables specific skills):
pkg install -y ffmpeg tesseract yt-dlp tmux openssh
Optional (browser plugin):
pkg install -y tur-repo
pkg install -y chromium
Optional (run in background without the terminal session alive):
pkg install -y termux-services termux-api
# install the companion app "Termux:API" from F-Droid
Fast path — bootstrap script
The repo's scripts/bootstrap.sh auto-detects Termux and picks the
right defaults:
git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
./scripts/bootstrap.sh --yes
What it does on Termux:
- Verifies
rust,git,curl,sqlitefrompkg - Downloads the static
nats-serverGo binary (arm64), drops it in$PREFIX/bin/— or skip with--nats=skipto use the local broker - Creates
./data/**and./secrets/(with Termux-compatible perms) - Stages
config/agents.d/*.example.yaml→*.yamlif missing - Runs
cargo build --release(grab a coffee — ~20–40 min on phone hardware) - Optionally launches
agent setupto pair channels
Expect a ~60–100 MB final binary.
Manual install
1. Install Rust and deps
pkg install -y rust git curl sqlite openssl clang pkg-config
2. Clone and build
git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
cargo build --release --bin agent
3. Broker
Option A — local (simplest):
# config/broker.yaml
broker:
type: local
persistence:
enabled: true
path: ./data/queue
No NATS binary needed. All pub/sub stays in-process.
Option B — native NATS binary:
curl -L -o /tmp/nats.tar.gz \
https://github.com/nats-io/nats-server/releases/download/v2.10.20/nats-server-v2.10.20-linux-arm64.tar.gz
tar -xzf /tmp/nats.tar.gz -C /tmp
install -m 0755 "$(find /tmp -name nats-server -type f | head -1)" \
$PREFIX/bin/nats-server
nats-server -js &
Go binaries are static and work on Termux without libc surprises.
4. Runtime directories and secrets
mkdir -p ./data/{queue,workspace,media,transcripts} ./secrets
Termux stores files under /data/data/com.termux/files/home by
default. Avoid pointing config paths at /sdcard — Android's
scoped-storage model breaks directory permissions there.
5. Relax the credentials perm check
Android's filesystem doesn't support the same permission bits as Linux in the same way. The credentials gauntlet would refuse to boot with false-positive warnings:
export CHAT_AUTH_SKIP_PERM_CHECK=1
Add it to ~/.termux/termux.properties or a wrapper shell script
so it's set every time.
6. Launch the wizard
./target/release/agent setup
For the WhatsApp pairing step, the wizard renders the QR as Unicode blocks directly in the terminal — scan from the phone's WhatsApp app (Settings → Linked Devices). No public tunnel needed.
7. Run the agent
termux-wake-lock # keep CPU awake even with screen off
./target/release/agent --config ./config
Staying alive in the background
Android's aggressive task killing is the biggest operational surprise. Pick one:
A — termux-wake-lock + foreground notification
termux-wake-lock
# agent in foreground:
./target/release/agent --config ./config
The wake-lock persists until you run termux-wake-unlock or kill
the session. Minimum friction, most reliable.
B — termux-services (runit)
pkg install -y termux-services
sv-enable termux-services
mkdir -p ~/.config/service/nexo-rs
cat > ~/.config/service/nexo-rs/run <<'EOF'
#!/data/data/com.termux/files/usr/bin/sh
cd /data/data/com.termux/files/home/nexo-rs
export CHAT_AUTH_SKIP_PERM_CHECK=1
exec ./target/release/agent --config ./config 2>&1
EOF
chmod +x ~/.config/service/nexo-rs/run
sv up nexo-rs
sv status nexo-rs
C — Termux:Boot (start on device boot)
Install the Termux:Boot app from F-Droid, then:
mkdir -p ~/.termux/boot
cat > ~/.termux/boot/start-agent <<'EOF'
#!/data/data/com.termux/files/usr/bin/sh
termux-wake-lock
cd /data/data/com.termux/files/home/nexo-rs
export CHAT_AUTH_SKIP_PERM_CHECK=1
exec ./target/release/agent --config ./config
EOF
chmod +x ~/.termux/boot/start-agent
Disabling the browser plugin
If you don't need headless browser control (most phone-hosted
agents don't), drop it from config/extensions.yaml:
extensions:
disabled: [browser]
Or, if you have tur-repo chromium installed and want nexo-rs to
spawn it, use the browser.args field to forward the flags Termux
needs:
# config/plugins/browser.yaml
browser:
headless: true
executable: /data/data/com.termux/files/usr/bin/chromium
args:
- --no-sandbox
- --disable-dev-shm-usage
- --disable-gpu
The built-in launch flags still apply; args is appended after
them so you can also override any of the built-ins (Chrome's CLI
parser uses last-wins).
Alternative: launch chromium yourself and attach via cdp_url:
# config/plugins/browser.yaml
browser:
# Start chromium yourself with:
# chromium --headless --no-sandbox --disable-dev-shm-usage \
# --disable-gpu --remote-debugging-port=9222 &
cdp_url: http://127.0.0.1:9222
When cdp_url is set, args is ignored — nexo-rs doesn't spawn
Chrome, only connects to yours.
Verify
curl localhost:8080/ready
curl localhost:9090/metrics
./target/release/agent status
Upgrading
cd ~/nexo-rs
git pull
cargo build --release
# restart under whichever method you picked (wake-lock / runit / Boot)
Android's graceful shutdown still runs on SIGTERM — closing the Termux session or killing the process drains the disk queue cleanly.
See also
- Installation — shared prerequisites
- Native install (no Docker) — the Linux/macOS path the Termux recipe is a sibling of
- Plugins — Browser
- Config — broker.yaml
Install — Nix
Nexo ships a Nix flake that pins the toolchain (Rust 1.80, MSRV) and the native build deps so a contributor or operator can go from clean shell to working binary without touching the host system.
Run without installing
nix run github:lordmacu/nexo-rs -- --help
First invocation builds from source (~3-5 min on cold cache); subsequent runs hit the local Nix store.
Build a local binary
nix build github:lordmacu/nexo-rs
./result/bin/nexo --help
The binary is the same nexo produced by cargo build --release --bin nexo. Outputs a result/ symlink the operator can link
into /usr/local/bin/ or copy elsewhere.
Contributor dev shell
git clone https://github.com/lordmacu/nexo-rs
cd nexo-rs
nix develop
Drops you into a shell with:
rustc1.80 +cargo+clippy+rustfmt+rust-srccargo-edit,cargo-watch,cargo-nextest,cargo-denymdbook+mdbook-mermaid(formdbook build docs)sqlite,pkg-config,openssl,libgit2(build deps)
RUST_LOG=info is exported by default. The toolchain version is
pinned in flake.nix — bump in lockstep with
[workspace.package].rust-version in Cargo.toml.
What the flake does NOT install
The nexo binary alone is not enough for full functionality.
Runtime tools the channel plugins shell out to live at the system
level, not in the flake:
- Chrome / Chromium — required by the browser plugin
cloudflared— used by the tunnel pluginffmpeg— media transcoding for WhatsApp voice notestesseract-ocr— OCR skillyt-dlp— theyt-dlpextension
Operators install these via their distro's package manager. The native install guide lists the apt / pacman / brew commands. The Docker image bundles all of them — that's the path of least friction for a "just works" deploy.
Pinning a release
Once v* tags are published, pin to a specific release:
nix run github:lordmacu/nexo-rs/v0.1.1 -- --help
Or in a flake input:
{
inputs.nexo-rs.url = "github:lordmacu/nexo-rs/v0.1.1";
}
Verifying the build
nix flake check
Runs nix flake check — verifies the flake metadata, evaluates
all outputs (packages, apps, devShells, formatter) without
actually building. Useful in CI to catch flake regressions early.
Troubleshooting
- "experimental feature 'flakes' is disabled" — add to
~/.config/nix/nix.conf:experimental-features = nix-command flakes - First build is very slow — the build re-fetches and re-compiles
every cargo dependency in the sandbox. Subsequent builds are
cached. A future Phase 27.x will publish a
cachixcache sonix buildpulls the binary directly. - Build fails on macOS arm64 —
git2-rsoccasionally lags on Apple silicon. Workaround: build the binary inside the Docker image instead (see Docker).
Agent-centric setup wizard
The hub menu's Configurar agente (canal, modelo, idioma, skills)
entry drops the operator into a per-agent submenu. Where the rest of
the wizard groups actions by service (Telegram, OpenAI, the
browser plugin), this submenu groups them by agent: pick one agent
up front, then mutate its model, language, channels, and skills from
a single dashboard. Every action reuses the existing channel / LLM /
skill flows underneath, so behavior stays in lockstep with the rest
of the wizard.
./target/release/agent setup
# → Configurar agente (canal, modelo, idioma, skills)
Dashboard
Agente: kate
Modelo: anthropic / claude-haiku-4-5 [creds ✔]
Idioma: es
Canales: ✔ telegram:default (bound)
✗ whatsapp:default (unbound)
Skills: 8 / 24 attached
The dashboard is recomputed from disk on every loop iteration, so the screen always reflects the most recent YAML state.
Action menu
After the dashboard renders, the operator picks one of:
| Action | Effect |
|---|---|
Modelo | Attach / detach / change the LLM provider + model name. Re-uses the LLM credential form when secrets are missing. |
Idioma | Pick from es / en / pt / fr / it / de, or clear the directive. |
Canales | Auth/Reauth, Bind, or Unbind a channel for this agent. Auth flows are the same services_imperative dispatchers the legacy menu uses. |
Skills | Multi-select against the skill catalog. Newly added skills with required secrets prompt for creds. |
← volver | Exit the submenu, return to the hub. |
YAML mutations
| Action | YAML path | Operation |
|---|---|---|
| Attach model | agents[<id>].model.provider, …model.model | upsert_agent_field |
| Detach model | agents[<id>].model | remove_agent_field |
| Set language | agents[<id>].language | upsert_agent_field |
| Clear language | agents[<id>].language | remove_agent_field |
| Bind channel | agents[<id>].plugins[], agents[<id>].inbound_bindings[] | append_agent_list_item (idempotent) |
| Unbind channel | agents[<id>].plugins[], agents[<id>].inbound_bindings[] | remove_agent_list_item by predicate |
| Replace skills | agents[<id>].skills | upsert_agent_field (full sequence) |
All mutations land atomically (tempfile + rename) and are gated by the same process-wide YAML mutex the legacy upsert path uses, so concurrent wizard sessions don't corrupt the file.
Hot-reload
After every successful mutation, the wizard fires a best-effort
nexo --config <dir> reload so a running daemon picks up the YAML
edit without a manual restart. The call is fire-and-forget: when
the binary isn't on PATH or the daemon isn't running, the wizard
keeps going silently.
Where the code lives
crates/setup/src/agent_wizard.rs— submenu + dashboard.crates/setup/src/yaml_patch.rs—read_agent_field,upsert_agent_field,remove_agent_field,append_agent_list_item,remove_agent_list_item.crates/setup/tests/agent_wizard_yaml.rs— schema-roundtrip tests that re-parse the mutated YAML throughnexo_config::AgentsConfig.
Reproducible builds + SBOM
Every Nexo release ships with two artefacts that let an operator verify provenance and exact composition:
- CycloneDX SBOM (
sbom-cyclonedx.json) — every cargo dependency at the exact version + hash that was compiled into the binary. - SPDX SBOM (
sbom-spdx.json) — full filesystem scan viasyft, captures anything that wasn't a cargo dep (bundled binaries, generated assets, vendored data files).
Both SBOMs are Cosign-signed (*.bundle) using the same keyless
OIDC chain documented in Verifying releases.
Reading the SBOMs
# Pretty-print the CycloneDX dep tree:
jq '.components | map({name, version, purl})' sbom-cyclonedx.json | less
# Find a specific crate:
jq '.components[] | select(.name == "tokio")' sbom-cyclonedx.json
# Audit the cargo deps with `cargo-audit` (run against the SBOM,
# without rebuilding):
cargo audit --db ~/.cargo/advisory-db --json | \
jq -r '.vulnerabilities.list[].advisory.id'
Reproducible build claim
The release workflow targets bit-identical binary between two
runs given the same git sha + rust-toolchain.toml + Cargo.lock.
The pipeline pins:
- Rust toolchain:
rust-toolchain.tomlfixes the channel + components. - Dependency versions:
Cargo.lockis committed and--lockedis used by every release build. - Build environment: GitHub Actions
ubuntu-latestrunner +cargo build --releasewith noRUSTFLAGSoverrides. - Build provenance: SLSA Level 2 attestation generated by
actions/attest-build-provenance(Phase 27.2 wiring). - Cosign signature: each binary + SBOM signed via OIDC (Phase 27.3).
Reproducing a release locally
# 1. Check out the exact tag.
git clone https://github.com/lordmacu/nexo-rs && cd nexo-rs
git checkout v0.1.1
# 2. Build with the locked deps.
rustup show # confirms the toolchain matches rust-toolchain.toml
cargo build --release --bin nexo --locked
# 3. Compare your binary's sha256 against the release asset:
sha256sum target/release/nexo
# Expected: same hash listed in `sha256sums.txt` on the GitHub release.
If the hashes don't match: the build is not reproducible on your host. Common reasons:
- Different glibc version → embedded
__VERSIONED_SYMBOLstrings drift. The release workflow runs onubuntu-latest(currently Ubuntu 24.04, glibc 2.39); building on Debian 12 (glibc 2.36) produces different bytes. - Different LLVM in your local rustc build (rare, mostly affects Mac users compiling with stable + nightly side-by-side).
- Local
~/.cargo/config.tomlinjectingRUSTFLAGS. - Build PROFILE-DEV vs PROFILE-RELEASE.
For a guaranteed bit-identical reproduction, build inside the same container the workflow uses:
docker run --rm -v $(pwd):/src -w /src \
rust:1.80-bookworm \
cargo build --release --bin nexo --locked
This reproduces what the GitHub Actions runner would do — same glibc, same toolchain version, same LLVM.
SLSA verification
The workflow attaches an attestation.intoto.jsonl (SLSA Level 2
provenance) per release. Verify with slsa-verifier:
go install github.com/slsa-framework/slsa-verifier/v2/cli/slsa-verifier@latest
slsa-verifier verify-artifact nexo \
--provenance-path attestation.intoto.jsonl \
--source-uri github.com/lordmacu/nexo-rs \
--source-tag v0.1.1
A green verification proves:
- The artefact came from the
lordmacu/nexo-rsrepo - It was built by a GitHub-hosted runner (not a fork or local box)
- The build inputs match what's recorded in the provenance
Auditing for known CVEs
The SBOM lets cargo-audit work without rebuilding:
# Convert CycloneDX → cargo-audit's format:
cyclonedx-cli convert --input-format json \
--output-format json sbom-cyclonedx.json | \
jq '...' > deps.json
# Or just feed it to grype (broader scope, multi-format):
grype sbom:./sbom-cyclonedx.json
grype sbom:./sbom-spdx.json
Grype catches CVEs across both Rust crates and any system-level deps captured by syft.
Out of scope (deferred)
apk/pkgSBOM for the Termux deb — Termux's package metadata doesn't speak SPDX yet. The release SBOMs cover the same artifact contents though.- Reproducible Docker image layers — the current Dockerfile
uses
apt-get update && apt-get installwhich pulls whatever's latest at build time. Pinning to specific Debian package versions is a follow-up (Phase 34 hardening).
MCP server from Claude Desktop
Expose nexo-rs tools (memory, Gmail, WhatsApp send, browser, etc.) to the Anthropic desktop app so your agent-sandboxed capabilities show up inside Claude conversations.
Same technique works for Cursor, Zed, and anything else that speaks MCP — the config shape is identical.
Prerequisites
- Built
agentbinary at a known path (e.g./usr/local/bin/agent) - A working
config/directory (reuse the one your daemon normally uses, or point at a dedicated one) - Anthropic API key (or OAuth bundle) configured for the agent
1. Enable the MCP server
config/mcp_server.yaml:
enabled: true
name: nexo
allowlist:
- memory_* # recall + store + history
- forge_memory_checkpoint
- google_* # if you paired Google OAuth
- browser_* # if you want Claude to drive Chrome
expose_proxies: false # hide ext_* and mcp_* from the IDE
auth_token_env: "" # leave empty for local spawn; set if tunneling
Pick the smallest allowlist that covers what you want the IDE to do. Each glob is power you're handing the IDE user.
2. Wire Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json
(macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"nexo": {
"command": "/usr/local/bin/agent",
"args": ["mcp-server", "--config", "/srv/nexo-rs/config"],
"env": {
"RUST_LOG": "info",
"AGENT_LOG_FORMAT": "json"
}
}
}
}
Restart Claude Desktop. The nexo block should appear in the tool
picker; pick tools from it the same way you pick built-ins.
3. Verify
Ask Claude: "use the nexo tool my_stats and show me the output."
If it works, Claude calls agent mcp-server as a subprocess, which
emits JSON-RPC over stdin/stdout. Logs hit Claude's app-level log
file plus stderr of the spawned agent (configurable via
AGENT_LOG_FORMAT=json).
Wire shape
sequenceDiagram
participant CD as Claude Desktop
participant A as agent mcp-server
participant TR as ToolRegistry
participant MEM as Memory tool
participant LTM as SQLite
CD->>A: spawn subprocess
CD->>A: initialize
A-->>CD: {capabilities: {tools}}
CD->>A: notifications/initialized
CD->>A: tools/list
A->>TR: enumerate (allowlist-filtered)
TR-->>A: tool defs
A-->>CD: [memory_recall, memory_store, …]
CD->>A: tools/call {name: memory_recall, args: {query: "..."}}
A->>MEM: invoke
MEM->>LTM: SELECT ...
LTM-->>MEM: rows
MEM-->>A: result
A-->>CD: content
Recipes within the recipe
Recall my cross-session memory from Claude
Allowlist:
allowlist:
- memory_recall
- memory_history
Now inside a Claude conversation: "recall what I told you about
Luis's address last week." Claude calls memory_recall on your
agent's SQLite — Claude itself has no persistent memory; this is how
you give it one.
Post to WhatsApp from Claude
Allowlist:
allowlist:
- whatsapp_send_message
⚠ Be careful. This gives whoever sits at the IDE the ability to send WhatsApp messages from your paired account. Only enable if you trust the IDE user as much as you'd trust the agent.
Read-only Gmail from Claude
Allowlist:
allowlist:
- google_auth_status
- google_call
Pair with GOOGLE_ALLOW_SEND= (unset) to keep the google_call
tool read-only.
Auth token
If you expose the MCP server over a tunnel (not a local spawn), set
auth_token_env to guard the initialize call:
auth_token_env: NEXO_MCP_TOKEN
Then set NEXO_MCP_TOKEN in the agent's env and have the client
send it on initialize. Clients that don't present the token are
rejected.
Gotchas
expose_proxies: truetransitively exposes every upstream MCP server. If the agent already consumes a Gmail MCP server, turning this on lets Claude reach through — usually not what you want.- Allowlist globs match whole tool names.
memory_*is OK;mem*is not — enumerate withagent ext listand real tool names before wiring globs. - Rate limits still apply.
whatsapp_send_messagethrough this path counts against the same WhatsApp rate bucket as the agent's own uses.
Cross-links
Future marketing plugin: multi-client autonomous operation
This recipe shows how the current runtime can operate with a future
marketing plugin without changing core architecture.
Goal
Run multiple marketing clients in the same system while preserving:
- strict client isolation by plugin instance
- per-agent model isolation (no cross-token usage)
- autonomous review loops with operator interrupts
1. Agent template
Start from:
config/agents.d/marketing.multiclient.example.yaml
The template maps one client surface to one agent via strict bindings:
marketing_acme_intakelistens only toplugin.inbound.marketing.acme_inboxmarketing_bravo_retentionlistens only toplugin.inbound.marketing.bravo_retentionmarketing_charlie_execlistens only toplugin.inbound.marketing.charlie_exec
Each agent has its own model.provider + model.model.
2. Future plugin event contract
Your future plugin should publish inbound events with this shape:
- topic:
plugin.inbound.marketing.<instance> Event.session_id: deterministic UUID per conversation thread- payload fields:
text(required for text turns)from(sender/account/contact id)priority(optional:now | next | later)- optional metadata (
channel,campaign_id,thread_id, etc.)
Minimal example payload:
{
"text": "Customer asked to pause campaign due to legal review.",
"from": "client-ops@acme.com",
"priority": "next"
}
Urgent interrupt payload:
{
"text": "STOP ALL SENDS NOW",
"from": "head-of-marketing@acme.com",
"priority": "now"
}
3. Practical flow with current runtime logic
- The plugin publishes on
plugin.inbound.marketing.acme_inbox. - Runtime matches
inbound_bindingsstrictly by(plugin, instance). marketing_acme_intakereceives the event; other agents do not.- Turn runs with ACME agent model only (
MiniMax-M2.5in template). - If the agent chooses
Sleep, runtime schedules proactive wake and injects a synthetic<tick>later. - If
priority: nowarrives during an in-flight turn, runtime preempts the current turn and handles the urgent message first. - A BRAVO event on
plugin.inbound.marketing.bravo_retentionruns on the BRAVO agent model only (claude-haiku-4-5in template).
4. Why this is already ready for production hardening
- Queue priority is built-in (
now > next > later) with in-flight preemption. - Proactive loop is built-in (
Sleep+ wake<tick>+ daily budget). - Session isolation is built-in (per-session debounce task).
- Binding isolation is built-in (strict plugin/instance matching).
- Model isolation is built-in (resolved from the matched agent/binding policy).
5. Rollout checklist when plugin is built
- Emit deterministic
session_idper thread. - Publish to instance-scoped inbound topics (
plugin.inbound.marketing.<instance>). - Send
priority: nowonly for real interrupts. - Keep one agent per client surface when you need strict token/cost isolation.
- Narrow
allowed_toolsonce the marketing tool surface is finalized.
Architecture Decision Records
Short documents capturing why the architecture is the way it is. Each ADR names an alternative that was considered and rejected, and the forces that drove the choice. Read these when you're tempted to change something load-bearing.
Format loosely follows Michael Nygard's ADR template: context, decision, consequences.
Index
| # | Title | Status |
|---|---|---|
| 0001 | Single-process runtime over microservices | Accepted |
| 0002 | NATS as the broker | Accepted |
| 0003 | sqlite-vec for vector search | Accepted |
| 0004 | Per-agent tool sandboxing at registry build time | Accepted |
| 0005 | Drop-in agents.d/ directory for private configs | Accepted |
| 0006 | Per-agent git repo for memory forensics | Accepted |
| 0007 | WhatsApp via whatsapp-rs (Signal Protocol) | Accepted |
| 0008 | MCP dual role — client and server | Accepted |
| 0009 | Dual MIT / Apache-2.0 licensing | Accepted |
Writing a new ADR
- Copy the template (next ADR below, or use
0001as a reference) - Number sequentially:
NNNN-short-slug.md - Set
status: Proposedwhile in review, flip toAcceptedorRejectedafter the discussion settles - Link from this index
- Do not edit accepted ADRs in place. Create a new ADR that
supersedes it and mark the old one
Superseded by NNNN.
ADRs are load-bearing documentation — they're how future you (and future contributors) learn that "NATS over RabbitMQ was not an accident."
ADR 0001 — Single-process runtime over microservices
Status: Accepted Date: 2026-01
Context
nexo-rs hosts N agents, each with its own LLM client, channel plugins, memory views, and extensions. The natural first instinct for Rust systems targeting real uptime is to split this into microservices: an agent service, a plugin service per channel, a memory service, etc., wired over the broker.
Every microservice adds:
- A serialization boundary (more CPU, more latency)
- A deployment artifact (more Dockerfiles, more CI)
- A failure mode (service down vs process down)
- An ops surface (metrics, health, logs per service)
The alternative — one binary hosting every subsystem as tokio tasks — gives up none of the durability (the disk queue + DLQ survive a process restart anyway) and keeps all in-memory caches naturally shared.
Decision
Ship one binary (agent) that hosts:
- Every agent runtime (one tokio task per agent)
- Every channel plugin (WhatsApp, Telegram, browser, …)
- Broker client + disk queue + DLQ
- Memory (short-term in-mem, long-term SQLite, vector sqlite-vec)
- Extension runtimes (stdio / NATS)
- MCP client and server
- TaskFlow runtime
- Metrics + health + admin HTTP servers
Coordination between tasks happens over the broker (NATS or the local mpsc fallback) exactly as if they were separate processes. Swapping to microservices later requires zero code changes on either side of the bus.
Consequences
Positive
- One Dockerfile, one health probe, one metrics endpoint
- No IPC overhead on hot paths (LLM tool calls go
ToolRegistry → Extensionthrough a tokio channel, not a network hop) - Memory caches (session, tool registry) are naturally shared
- Simpler ops: one log stream, one trace span hierarchy
Negative
- A bug that panics the process takes down every agent at once (the single-instance lockfile mitigates the blast radius by preventing silent double-boot)
- Scaling out means running more agent processes pointed at the same NATS — isolation between them requires deliberate NATS subject partitioning
Escape hatch
If a subsystem needs its own lifecycle (example: a GPU-heavy inference service), ship it as a NATS extension — it's automatically out-of-process and auto-discovered by the agent. Microservices by the back door, without splitting the monolith first.
ADR 0002 — NATS as the broker
Status: Accepted Date: 2026-01
Context
The event bus sits under every inter-plugin and inter-agent communication. Requirements:
- Subject-based routing with wildcards (
plugin.inbound.*,agent.route.<id>) - Low-latency pub/sub (sub-millisecond on LAN)
- No broker-side state to manage unless we opt in
- Clustered production deployments
- Mature async Rust client
Alternatives considered:
- RabbitMQ — heavier, queue-per-binding mental model fits less well for fan-out across plugin instances, ops overhead higher
- Redis streams / pub-sub — streams are great for durable event
logs but the stream-per-subject model clashes with free-form
plugin.outbound.<channel>.<instance>naming; pub-sub has no durability - Kafka — overkill for sub-millisecond request/reply loops, heavy ops, partition count becomes a thing you think about
- Custom over TCP — too much invented complexity
Additional implementation note: a crate literally called natsio
came up in early design research; it does not exist on crates.io.
The real Rust client is async-nats (from the NATS org itself),
matching the NATS 2.10 server line.
Decision
Use NATS as the broker. Specifically:
- Client:
async-nats = "0.35"(pinned inCargo.toml) - Subject namespace:
plugin.inbound.*,plugin.outbound.*,plugin.health.*,agent.events.*,agent.route.* - Fallback: a local
tokio::mpscbus implementing the sameBrokertrait for offline / single-machine runs - Durability: SQLite disk queue in front of every publish; drains FIFO on reconnect; 3 attempts before DLQ
Consequences
Positive
- Standard ops path (monitor on
:8222/healthz, prometheus exporter, clustering via well-known recipes) - Pub/sub semantics are trivial to reason about
- Swapping in JetStream later for persistent streams is additive
- Zero broker state in the happy path — restart NATS without catastrophe thanks to the disk queue
Negative
- NATS auth (NKey / JWT) has its own learning curve — see the NATS TLS + auth recipe
- No built-in message ordering guarantee across subjects (only per-subscriber). Callers that need ordering (e.g. delegation with correlation id) must enforce it themselves
Forbidden anti-pattern
- Do not use
natsioor any other non-async-nats client. The crate doesn't exist on crates.io; copy-paste from older design docs will mislead.
ADR 0003 — sqlite-vec for vector search
Status: Accepted Date: 2026-02
Context
Agents benefit from semantic recall — surface a memory whose text doesn't share keywords with the query but shares meaning. The usual playbook: run a dedicated vector database.
Requirements:
- Zero extra infrastructure for single-machine deployments
- Same durability and transactional model as the rest of memory
- Embedding-dimension sanity checks at startup
- Hybrid retrieval (keyword ⊔ vector) without a separate query plane
Alternatives considered:
- Qdrant / Weaviate / Milvus — all excellent; all require an extra service, network hop, and ops surface
- pgvector — would force Postgres everywhere, abandoning SQLite for long-term memory
- Simple numpy file + linear scan — works for small datasets, falls over past ~10k memories per agent
Decision
Use sqlite-vec: a SQLite extension that adds a vec0 virtual
table in the same DB file as long-term memory.
- One SQLite file holds
memories,memories_fts, andvec_memories— a singleJOINreturns content + tags alongside similarity - Dimension is checked at schema init; mismatch between config and existing rows aborts startup with an explicit message
sqlite3_auto_extensionregisters once per process- Hybrid retrieval uses Reciprocal Rank Fusion (K=60) over the keyword FTS5 hits and the vector neighbors
Consequences
Positive
- Zero-infra single-machine deploys keep working — no extra service to run
- Backups, replication, export are all just "copy the
.dbfile" - Transactional writes:
INSERTintomemories+vec_memoriesin one statement; no dual-write races - Hybrid retrieval is easy (see vector docs)
Negative
- sqlite-vec is newer than Qdrant; its indexing algorithm improves over time. Large indexes may need re-sorting periodically
- Changing embedding models (even same-dimension ones) produces a stale index — the ADR doesn't solve this, users must reindex
- The
sqlite3_auto_extensionregistration happens once per process and has caught test suites that spawn many short-lived connections off-guard
Swap-out path
EmbeddingProvider is a trait and the recall_mode = vector branch
is a single code path. Replacing sqlite-vec with Qdrant is a
day's work, not a rewrite.
ADR 0004 — Per-agent tool sandboxing at registry build time
Status: Accepted Date: 2026-02
Context
The same process hosts agents with very different blast radii.
Ana runs on WhatsApp against leads; Kate manages a personal Telegram;
ops has Proxmox credentials. The LLM in one agent must never see —
let alone invoke — tools registered for another agent.
Three enforcement points are possible:
- Prompt-level sandboxing — "don't use these tools." Relies on model compliance. Fails under adversarial prompts.
- Runtime filter — every
tools/callchecks a policy before dispatch. Robust, but the LLM still sees the tools intools/listand can hallucinate calls. - Registry build-time pruning — the agent's
ToolRegistryis built with only the allowed tools. The LLM literally cannot see the others.
Decision
Default to registry build-time pruning.
allowed_tools: [](empty) = every registered tool visibleallowed_tools: [glob, …]= strict allowlist, tools not matching are removed from the registry before the LLM'stools/listcall is answered- For agents with
inbound_bindings[], the base registry keeps every tool and per-binding overrides apply build-time filtering at turn time — a single agent can narrow its surface differently per channel
Additional layers stack on top:
outbound_allowlist.<channel>: [recipients]— even withwhatsapp_send_messagein the registry, the runtime rejects sends to unlisted recipients (defense in depth)tool_rate_limits— per-tool rate limiting for side-effectful tools- Per-agent
workspaceandlong-term memory (WHERE agent_id = ?)— data-level isolation
Consequences
Positive
- Adversarial prompts can't invoke missing tools — the model has no token string for them
- Easy mental model: grep
allowed_toolsto see what an agent can do - Prompt tokens stay small (tool list scales with allowlist, not registry)
Negative
- A misconfigured
allowed_toolssilently hides tools the LLM expected to use — the agent returns "I can't do that," puzzling both user and developer. Mitigation:agent statusshows the effective tool set per agent - Dynamic granting mid-session is not supported (would require re-handshake with the MCP clients)
Related
- Config — agents.yaml (allowed_tools semantics)
- Per-agent credentials — the gauntlet validates that the binding's channel instance is actually allowed
ADR 0005 — Drop-in agents.d/ directory for private configs
Status: Accepted Date: 2026-02
Context
Two kinds of agent content coexist in the same project:
- Public — the framework demo agents, ops helpers, templates
- Private — sales prompts, tarifarios, internal phone numbers, compliance-flagged customer scripts
The obvious "one agents.yaml" approach forces everything to be
either committed (leaking business content) or gitignored (losing
the template reference). Neither is acceptable.
Decision
Split by path convention:
config/agents.yaml— committed, public-safe defaultsconfig/agents.d/*.yaml— gitignored drop-in directoryconfig/agents.d/*.example.yaml— committed templates- Merge happens at load time: every
.yamlinagents.d/gets itsagents:array concatenated to the base list - Files load in lexicographic filename order, so
00-common.yaml10-prod.yamlcomposes predictably
.gitignoreincludes:config/agents.d/*.yaml !config/agents.d/*.example.yaml
Consequences
Positive
- Safe to open-source the repo; real business content stays private
- Templates stay in git (
ana.example.yaml) so newcomers can copy and fill - Per-environment layering falls out for free (
00-dev.yamlvs10-prod.yamlper deploy)
Negative
- Agent-id collisions across files are possible — the loader rejects them at startup with an explicit error. Operators must coordinate file naming
- Not every config is split this way — some operators expected
plugins.d/,llm.d/, etc. We decided against the generalization until a concrete need appeared
Related
- Config — drop-in agents — full mechanics
- Recipes — WhatsApp sales agent — shows the pattern in practice
ADR 0006 — Per-agent git repo for memory forensics
Status: Accepted Date: 2026-03
Context
An agent's memory evolves over time — dream sweeps promote memories, the agent writes USER.md / AGENTS.md / SOUL.md revisions, session closes append to MEMORY.md. When an agent misbehaves, "what did it know and when?" is a real debugging question.
Options considered:
- Append-only audit log per write — possible, but rolls out a custom scheme for every file
- DB-level revision history — works for LTM rows but not for workspace markdown files
- Git — battle-tested, standard tooling,
git logandgit blameship with every developer's laptop
Decision
When workspace_git.enabled: true, the agent's workspace
directory is a per-agent git repository. The runtime commits at
three specific moments:
- Dream sweep finishes — commit subject
promote, body lists promoted memories with scores - Session close — commit subject
session-close, body includes session id and agent id - Explicit
forge_memory_checkpoint(note)tool call — commit subjectcheckpoint: {note}
Commit mechanics:
- Staged: every non-ignored file (respects auto-generated
.gitignorethat excludestranscripts/,media/,*.tmp) - Skipped: files larger than 1 MiB (
MAX_COMMIT_FILE_BYTES) - Idempotent: no-op commit if tree clean
- Author:
{agent_id} <agent@localhost>(configurable) - No remote by default — operators add one if archival matters
Consequences
Positive
git loggives you a timestamped history of every memory evolution, for freememory_historytool lets the LLM reason about its own past state — e.g. "what did I believe about this user last week?"git diff <oldest>..HEADis one command away when debugging- Familiar tooling for humans (
git bisecta misbehaving agent)
Negative
- Repositories grow over time; operators should add a remote with periodic push-and-repack
- Commits are process-scoped — an agent process crash between "write MEMORY.md" and "commit" leaves an uncommitted diff. The next commit picks it up, but at that point the audit event is merged
- Transcripts are intentionally excluded from commits — they can be enormous and aren't the forensic artifact the ADR is aimed at
Related
- Soul — MEMORY.md + workspace-git
- Agent runtime — Graceful shutdown (session-close commit runs here)
ADR 0007 — WhatsApp via whatsapp-rs (Signal Protocol)
Status: Accepted Date: 2026-02
Context
"Add WhatsApp support" has three common paths:
- Official WhatsApp Business API — rate-limited, costs per message, requires business verification, limits proactive outreach to approved templates. Fine for some deployments, a bad fit for "run an agent on your personal number for a small business."
- Unofficial web-scraping libraries (e.g.
whatsapp-web.js) — pretend to be a browser, fragile against UI changes, frequently banned - Signal Protocol reimplementation — speak the native protocol that the WhatsApp mobile app speaks. Stable, fast, no scraping, permits all message types (voice, media, reactions, edits, etc.)
Decision
Use whatsapp-rs (Cristian's crate) which implements the Signal
Protocol handshake + pairing + message layer in Rust. nexo-rs wraps
it in crates/plugins/whatsapp:
- Pairing: setup-time QR scan via
Client::new_in_dir()— the wizard creates a per-agent session dir and renders the QR as Unicode blocks - Runtime: the plugin subscribes to inbound messages, forwards
to
plugin.inbound.whatsapp[.<instance>], handles the outbound side via the tool family (whatsapp_send_message,whatsapp_send_reply,whatsapp_send_reaction,whatsapp_send_media) - Credentials expiry: the plugin does not fall back to a runtime QR on 401 — the operator must re-pair via the wizard. The runtime refuses to boot without valid creds. This is a deliberate safety net against silent re-pair loops that would cross-deliver to the wrong account
- Multi-account: each agent points at its own session dir. No XDG_DATA_HOME mutation
Consequences
Positive
- Full feature coverage (voice, media, reactions, edits, groups)
- No per-message cost beyond the bandwidth
- No business-verification paperwork
- Works on a personal number, a secondary SIM, anything you can pair to WhatsApp's Linked Devices
Negative
- Signal Protocol parity is non-trivial; keeping up with WhatsApp
protocol evolution is an ongoing commitment of
whatsapp-rs - Running an agent on a personal number is a policy choice.
WhatsApp's Terms of Service don't love automated accounts; use
whatsapp-rson numbers you own and are ready to re-pair if they get banned - Multi-account needs careful session-dir management — see Plugins — WhatsApp gotchas
Forbidden alternatives
- Puppeteer / whatsapp-web.js / selenium — pulls the entire Chromium runtime into the process, breaks constantly, and is detected and banned faster than the Signal Protocol path
- Business API — only if the deployment pays for it and the agent flow survives template constraints; ship a separate plugin if this comes up
Related
../whatsapp-rs/sibling crate (Signal Protocol + pairing + Client)- Plugins — WhatsApp
- Recipes — WhatsApp sales agent
ADR 0008 — MCP dual role: client and server
Status: Accepted Date: 2026-03
Context
Model Context Protocol is becoming the de facto integration surface for LLM-driven tools. Two questions arose during the Phase 12 design:
- Should the agent be an MCP client (consume external MCP servers as tools)?
- Should the agent be an MCP server (expose its own tools to external MCP clients like Claude Desktop, Cursor, Zed)?
These are independent decisions. Picking one does not force the other.
Decision
Do both. Same process, same ToolRegistry, different transports.
- Client —
McpRuntimeManagerspawns stdio or HTTP MCP servers per session (with a shared "sentinel session" for servers that don't need per-session isolation). Their tools register into the per-sessionToolRegistrywith names like{server_name}_{tool_name}and are callable by the agent like any built-in - Server —
agent mcp-serversubcommand reads JSON-RPC from stdin and writes responses to stdout. Anmcp_server.yamlallowlist controls which tools are exposed. Configurableauth_token_envguards theinitializecall when the server is exposed through a tunnel
Both sides speak MCP 2024-11-05 (streamable HTTP) with SSE fallback for legacy servers.
Consequences
Positive
- Being a client: any MCP-speaking tool ecosystem is reachable without writing a custom extension
- Being a server: the agent's tools + memory become available inside Claude Desktop / Cursor / Zed — cross-session memory, remote actions, etc.
- Interop with the broader MCP catalog is a configuration change, not a code change
Negative
- Two independent code paths to keep current as the MCP spec evolves
expose_proxiesconfiguration gotcha: enabling it on the server side makes every upstream MCP server transitively visible to the consuming client. Default isfalseand the docs call this out explicitly- MCP spec churn (2024-11-05 vs future versions) needs staying power
Related
ADR 0009 — Dual MIT / Apache-2.0 licensing
Status: Accepted Date: 2026-04
Context
Open-sourcing nexo-rs required picking a license. Constraints:
- The Rust ecosystem convention (rustc, tokio, serde, clap, axum…) is dual MIT / Apache-2.0
- Downstream projects should be able to pick whichever license fits their own project's obligations
- Attribution to the original author must be legally enforceable — the author explicitly asked that users "use it, just name me"
- The author doesn't want to ship a custom / restrictive license that confuses or scares off contributors
Alternatives considered:
- MIT alone — fine, but missing the explicit patent grant that Apache-2 gives (relevant to corporate downstream users)
- Apache-2 alone — fine, but incompatible with GPLv2 downstream (MIT is compatible)
- AGPL-3 — forces source-release on SaaS; nexo-rs isn't trying to prevent cloud forks
- BSL (Business Source License) — source-available with time-delayed open-source conversion; inappropriate for a framework whose value is in wide adoption
- Custom "use it, name me" — would need a lawyer for every edge case; a solved problem doesn't need a new solution
Decision
Dual-license under MIT OR Apache-2.0:
LICENSE-MIT— full text of the MIT License, 2026 Cristian GarcíaLICENSE-APACHE— full text of the Apache-2.0 LicenseCargo.toml:license = "MIT OR Apache-2.0"(SPDX)NOTICEfile at repo root (required to be preserved by Apache-2.0 §4(d)) carries the attribution — author, contact, original repo URL- README links all three + explains the SPDX choice
Downstream users pick whichever they prefer. Attribution is mandatory under both.
Consequences
Positive
- Fits existing Rust ecosystem tooling (crates.io, rustdoc headers, CI scanners)
- Maximum compatibility: GPLv2 projects pick MIT, patent-sensitive corporate projects pick Apache-2
NOTICEfile gives the author the strongest attribution lever available in permissive OSS: removing it is a license violation
Negative
- Contributors who want to submit PRs agree (per Apache-2 §5) that their contributions are dual-licensed under the same terms. Some contributors may require a CLA discussion; none so far
- Trademark on the name "nexo-rs" is not covered — this ADR is about the code, not the brand. If the brand becomes load-bearing, register a trademark separately
Related
Contributing
PRs welcome. A few ground rules keep the codebase coherent.
Workflow
All feature work follows the /forge pipeline:
/forge brainstorm <topic> → /forge spec <topic> → /forge plan <topic> → /forge ejecutar <topic>
Per-sub-phase done criteria live in
PHASES.md.
Rules of the road
- All code, code comments, and Markdown docs in English.
- No hardcoded secrets. Use
${ENV_VAR}or${file:...}in YAML. - Every external call goes through
CircuitBreaker. No exceptions. - Don't commit anything under
secrets/. - Don't skip hooks (
--no-verify). Fix the underlying lint / test issue instead.
Docs must follow
Any change that touches user-visible behavior — features, config
fields, CLI flags, tool surfaces, retry policies — must update the
mdBook under docs/ in the same commit. Docs phase plan:
docs/PHASES.md.
All mdBook pages must be written in English.
Pure-internal changes (private renames, refactors, test-only) are exempt — mention that explicitly in the commit body.
Local checks
cargo fmt --all
cargo clippy --workspace --all-targets -- -D warnings
cargo test --workspace
./scripts/check_mdbook_english.sh
./scripts/check_markdown_english.sh
mdbook build docs
CI runs all of the above on every push and every PR.
Git pre-commit hook
The repo ships a pre-commit hook at .githooks/pre-commit that:
- Docs-sync gate — rejects the commit if production files under
crates/,src/,config/,extensions/,scripts/,.github/, orCargo.{toml,lock}are staged without anything underdocs/. cargo fmt --all -- --checkcargo clippy --workspace -- -D warningscargo test --workspace --quiet
Enable it once per clone:
git config core.hooksPath .githooks
(./scripts/bootstrap.sh does this for you.)
Bypass tags
The docs-sync gate honors a single opt-out tag. Include it in the commit message when the change is genuinely internal and doesn't need docs:
refactor: rename private fn [no-docs]
Acceptable reasons:
- Private refactor, no change to any public API
- Test-only changes
- Dependency bumps with no behavior change
- CI-config fiddling that doesn't alter ops
Do not use [no-docs] for anything a user would notice. If in
doubt, update the docs — it's the lower-regret path.
Full escape hatch
git commit --no-verify disables all hooks (fmt, clippy, tests,
docs-sync). Last resort, not a habit.
Reporting issues
Open a GitHub issue with:
- nexo-rs version / commit hash
- Rust version (
rustc -V) - OS / arch
- Relevant log lines (redact secrets)
- Minimal reproduction
License of contributions
Contributions are dual-licensed MIT OR Apache-2.0 as described in License.
Releases
Two complementary tools own the release pipeline:
| Tool | Owns |
|---|---|
release-plz | version bumps, git tags, crates.io publish, per-crate CHANGELOG.md |
cargo-dist | cross-target binary tarballs, curl | sh / PowerShell installers, sha256 sidecars |
They run on the same tag (nexo-rs-v<version>) and stay independent
— no overlapping config. Phase 27 brings both online; Phase 27.2
wires the GitHub Actions workflow that combines them on tag push.
What ships
The nexo binary is the only artifact in release tarballs. Every
other binary in the workspace (driver subsystem, dispatch tools,
companion-tui, mock MCP server) carries
[package.metadata.dist] dist = false so cargo-dist excludes it.
Dev / smoke programs (browser-test, integration-browser-check,
llm_smoke) live as [[example]] entries under examples/ for
the same reason.
Build provenance — nexo version
build.rs injects four stamps captured at compile time:
NEXO_BUILD_GIT_SHA— short git SHA of the build commit (orunknownoutside a git checkout)NEXO_BUILD_TARGET_TRIPLE— full Rust target tripleNEXO_BUILD_CHANNEL— opaque channel marker; defaults tosource. The release workflow overrides viaNEXO_BUILD_CHANNEL=apt-musl(etc.) so support tickets carry install-channel provenance.NEXO_BUILD_TIMESTAMP— UTC ISO8601 timestamp of the build
Operators see them with:
nexo version
# nexo 0.1.1
# git-sha: abc1234
# target: x86_64-unknown-linux-musl
# channel: apt-musl
# built-at: 2026-04-27T12:34:56Z
nexo --version (without --verbose or the subcommand) prints the
short form nexo <version>.
Local validation
make dist-check
Builds the host-target tarball via dist build --target $(rustc -vV | sed -n 's|host: ||p') and runs
scripts/release-check.sh.
The smoke gate verifies every present tarball contains the bin +
LICENSE-* + README.md and that the host-native --version
output matches the workspace version. Targets the local toolchain
can't satisfy emit [release-check] WARN lines instead of failing.
Full setup notes (cargo-dist, cargo-zigbuild, zig, rustup targets):
packaging/README.md.
What's automatic vs manual
| Step | Owner |
|---|---|
| Bump version + open release PR | release-plz (CI on push to main) |
Tag commit + crates.io publish | release-plz (on PR merge) |
| Build 2 musl tarballs (x86_64 + aarch64) | release.yml (Phase 27.2 ✅) — cargo-dist |
Build Termux .deb (aarch64-linux-android) | release.yml (Phase 27.2 ✅) — packaging/termux/build.sh |
Build Debian .deb (amd64 + arm64) | release.yml (Phase 27.4 ✅) — packaging/debian/build.sh |
| Build RPM (x86_64 + aarch64) | release.yml (Phase 27.4 ✅) — packaging/rpm/build.sh |
Install-test .deb on Debian 12 / Ubuntu 22.04 / 24.04 | release.yml (Phase 27.4 ✅) — docker matrix |
Install-test .rpm on Fedora 40 / Rocky 9 | release.yml (Phase 27.4 ✅) — docker matrix |
| Upload all tarballs + debs + rpms + sha256 sidecars | release.yml (Phase 27.2 ✅) |
Smoke-test nexo --version + provenance stamps | release.yml (Phase 27.2 ✅) |
| Sign every asset (cosign keyless) | sign-artifacts.yml (Phase 27.3 ✅) |
| Generate CycloneDX + SPDX SBOMs | sbom.yml (Phase 27.9 🔄) |
Apt repo publish + signed Release file | Phase 27.4.b deferred |
Yum / dnf repo publish + RPM-GPG-KEY-nexo | Phase 27.4.b deferred |
| Termux pkg index | Phase 27.8 deferred |
| Homebrew bottle auto-PR | Phase 27.6 PARKED (Apple targets dropped) |
nexo self-update | Phase 27.10 deferred |
Adding a new bin to the release
- Declare the
[[bin]]in the appropriate crate'sCargo.toml. - If the crate hosts the bin via
[package.metadata.dist] dist = false, either remove that opt-out or move the bin to a new crate that doesn't carry it. - Re-run
make dist-checkand confirm the new bin shows up under[bin]in the dist plan output. - Update
scripts/release-check.sh's per-archive content check if the new bin should be required.
Adding a new target
- Append the target triple to
targets = […]indist-workspace.toml. - Append the matching tarball name to
EXPECTED_TARBALLSin the smoke gate. - Land the toolchain story in the GH Actions release workflow (Phase 27.2) — without that, the target builds locally only.
License
nexo-rs is dual-licensed under either:
- MIT — see
LICENSE-MIT - Apache License, Version 2.0 — see
LICENSE-APACHE
at your option. SPDX: MIT OR Apache-2.0.
Attribution is required
Redistributions — source, binary, modified, or unmodified — must
preserve the NOTICE
file and the copyright attribution, as required by Section 4(d) of the
Apache License.
Nexo-rs
Copyright 2026 Cristian García <informacion@cristiangarcia.co>
This product includes software developed by Cristian García.
Original project: https://github.com/lordmacu/nexo-rs
Why dual-licensed
Dual MIT / Apache-2.0 is the Rust ecosystem convention (rustc,
tokio, serde, clap, etc.). It maximizes downstream compatibility:
- MIT is compatible with GPLv2 (Apache-2.0 is not)
- Apache-2.0 grants explicit patent rights (MIT does not)
Users pick whichever fits their project.
Contributions
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in nexo-rs by you shall be dual-licensed as above, without any additional terms or conditions — per Section 5 of the Apache License.
API reference (rustdoc)
Every public type, trait, function, and module in the nexo-rs
workspace is documented via cargo doc. The CI workflow runs
cargo doc --workspace --no-deps and publishes the output under
/api/ on the same GitHub Pages deployment as this book.
Open the rustdoc
- Published site: https://lordmacu.github.io/nexo-rs/api/
- Local build:
cargo doc --workspace --no-deps --open
What's there
One rustdoc page per workspace crate:
| Crate | Contents |
|---|---|
agent | Top-level binary — mostly wiring; see src/main.rs. |
nexo-core | Agent trait, AgentRuntime, SessionManager, ToolRegistry, HookRegistry, agent-facing tools (memory, taskflow, self_report, delegate, workspace_git). |
nexo-broker | Broker trait (NatsBroker, LocalBroker), disk queue, DLQ. |
nexo-llm | LlmClient trait, MiniMax / Anthropic / OpenAI-compat / Gemini clients, retry + rate limiter. |
nexo-memory | Short-term / long-term / vector types, LongTermMemory API. |
nexo-config | YAML struct types, env/file placeholder resolution. |
nexo-extensions | ExtensionManifest, ExtensionDiscovery, StdioRuntime, CLI. |
nexo-mcp | MCP client + server primitives. |
nexo-taskflow | Flow, FlowStore, FlowManager, WaitEngine. |
nexo-resilience | CircuitBreaker. |
nexo-setup | Wizard field registry, YAML patcher. |
nexo-tunnel | Cloudflared tunnel helper. |
nexo-auth | Per-agent credential gauntlet, resolver, audit. |
nexo-plugin-* | Channel plugins (browser, whatsapp, telegram, email, google, gmail-poller). |
When to read rustdoc vs the book
| Goal | Start here |
|---|---|
| Understand a subsystem's purpose | this book |
| Read a specific trait's methods / signatures | rustdoc |
| Wire two subsystems together | book → rustdoc |
| Embed a crate in your own binary | rustdoc |
| Audit what's public API | rustdoc (anything not in rustdoc is internal) |
Building locally
# All crates, no dependencies:
cargo doc --workspace --no-deps
# Open the nexo-core rustdoc in a browser:
cargo doc -p nexo-core --no-deps --open
Warnings are rejected in CI (RUSTDOCFLAGS=-D warnings). Run the
same locally before pushing if you edited doc comments:
RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps
Public-API stability
The workspace has not committed to semver-level stability yet.
Public signatures change between code phases; follow PHASES.md and
commit history when upgrading.
Cross-links
- Contributing — how to add
///docs when you touch public surface - Architecture overview — the mental model that rustdoc fills in the fine detail for