Architecture overview

nexo-rs is a single-process multi-agent runtime. One binary (agent) hosts every agent, every channel plugin, every extension, and the persistence layer. Coordination between components happens over NATS (with a local tokio-mpsc fallback when NATS is offline).

Why single-process: shared in-memory caches, zero IPC overhead between agent and tool invocations, simpler ops. The broker and disk queue give us the durability a multi-process layout would provide, without the coordination cost.

High-level layout

flowchart TB
    subgraph PROC[agent process]
        direction TB

        subgraph PLUGINS[Channel plugins]
            WA[WhatsApp]
            TG[Telegram]
            MAIL[Email / Gmail poller]
            BR[Browser CDP]
            GOOG[Google APIs]
        end

        subgraph BUS[Event bus]
            NATS[(NATS)]
            LOCAL[(Local mpsc fallback)]
            DQ[(Disk queue + DLQ)]
        end

        subgraph AGENTS[Agent runtimes]
            A1[Agent: ana]
            A2[Agent: kate]
            A3[Agent: ops]
        end

        subgraph STORE[Persistence]
            STM[(Short-term sessions<br/>in-memory)]
            LTM[(Long-term memory<br/>SQLite + sqlite-vec)]
            WS[(Workspace-git<br/>per agent)]
        end

        subgraph TOOLS[Tools & integrations]
            EXT[Extensions<br/>stdio / NATS]
            MCP[MCP client / server]
            LLM[LLM providers]
        end

        PLUGINS --> BUS
        BUS --> AGENTS
        AGENTS --> BUS
        AGENTS --> STORE
        AGENTS --> TOOLS
        TOOLS --> LLM
    end

    USERS[End users] <--> PLUGINS

Workspace crates

The Cargo.toml workspace defines these member crates:

CrateResponsibility
crates/coreAgent runtime, trait, SessionManager, HookRegistry, heartbeat, tool registry
crates/brokerNATS client, local fallback, disk queue, DLQ, backpressure
crates/llmLLM clients (MiniMax, Anthropic, OpenAI-compat, Gemini), retry, rate limiter
crates/memoryShort-term sessions, long-term SQLite, vector search via sqlite-vec
crates/configYAML parsing, env-var resolution, secrets loading
crates/extensionsManifest parser, discovery, stdio + NATS runtimes, watcher, CLI
crates/mcpMCP client (stdio + HTTP), server mode, tool catalog, hot-reload
crates/taskflowDurable flow state machine with wait/resume
crates/resilienceCircuitBreaker three-state machine
crates/setupInteractive wizard, YAML patcher, pairing flows
crates/tunnelPublic HTTPS tunnel for pairing / webhooks
crates/plugins/browserChrome DevTools Protocol client
crates/plugins/whatsappWrapper over whatsapp-rs (Signal Protocol)
crates/plugins/telegramBot API client
crates/plugins/emailIMAP / SMTP
crates/plugins/gmail-pollerCron-style Gmail → broker bridge
crates/plugins/googleGmail / Calendar / Drive / Sheets tools

Binaries

Defined in Cargo.toml:

BinaryEntryPurpose
agentsrc/main.rsMain daemon; also exposes setup, dlq, ext, flow, status subcommands
browser-testsrc/browser_test.rsCDP integration smoke test
integration-browser-checksrc/integration_browser_check.rsEnd-to-end browser flow validation
llm_smokesrc/bin/llm_smoke.rsLLM provider smoke test

Runtime topology

agent runs a single tokio multi-thread runtime. Work is split into independent tasks:

flowchart LR
    MAIN[main tokio runtime]
    MAIN --> PA[Per-agent runtime task]
    MAIN --> PI[Plugin intake loops]
    MAIN --> HB[Heartbeat scheduler]
    MAIN --> MCP[MCP runtime manager]
    MAIN --> EXT[Extension stdio runtimes]
    MAIN --> MET[Metrics server :9090]
    MAIN --> HEALTH[Health server :8080]
    MAIN --> ADMIN[Admin console :9091]
    MAIN --> LOCK[Single-instance lock watcher]

Each agent runtime owns its own subscription to inbound topics, its own session manager view, its own LLM-loop state. Agents do not share mutable in-memory state — coordination between agents happens over the event bus (agent.route.<target_id>).

What lives where — quick mental model

  • A message arrives → lands on plugin.inbound.<channel> (NATS)
  • Agent runtime consumes itSessionManager attaches or creates a session, HookRegistry fires before_message
  • LLM loop runs → tools invoked through the registry, which calls into extensions / MCP / built-ins, each wrapped by CircuitBreaker
  • Tool result flows backafter_tool_call hooks fire, LLM decides next turn
  • Agent emits reply → publishes to plugin.outbound.<channel>
  • Channel plugin delivers → physical message goes to the user

Details per subsystem: