Project tracker + multi-agent dispatch (Phase 67.A–H)

The project-tracker subsystem lets a nexo-rs agent answer "qué fase va el desarrollo" through Telegram / WhatsApp / a shell, and lets it dispatch async programmer agents that ship phases on its behalf.

The implementation is layered:

LayerCrateResponsibility
Project filesnexo-project-trackerParse PHASES.md + FOLLOWUPS.md, watch for changes, expose read tools.
Multi-agent statenexo-agent-registryDashMap + SQLite store of every in-flight goal, cap + queue + reattach.
Goal controlnexo-driver-loopspawn_goal / pause_goal / resume_goal / cancel_goal per-goal.
Tool surfacenexo-dispatch-toolsprogram_phase, dispatch_followup, hook system, agent control + query, admin.
Capability gatenexo-config + nexo-coreDispatchPolicy per agent / binding, ToolRegistry filter.

Project tracker (Phase 67.A)

FsProjectTracker reads <root>/PHASES.md (required) and <root>/FOLLOWUPS.md (optional) at startup, caches parsed state behind a parking-lot RwLock with a 60 s TTL, and starts a notify watcher on the parent directory that invalidates the cache on Modify | Create | Remove events.

Read tools register through nexo_dispatch_tools::READ_TOOL_NAMES (project_status, project_phases_list, followup_detail, git_log_for_phase).

Set ${NEXO_PROJECT_ROOT} to point at a workspace other than the daemon's cwd.

Multi-agent registry (Phase 67.B)

AgentRegistry is the single source of truth for every goal the driver has admitted. Each entry holds an ArcSwap<AgentSnapshot> (turn N/M, last acceptance, last decision summary, diff_stat) so list_agents / agent_status readers never block writers.

  • admit(handle, enqueue) enforces the global cap. Beyond the cap, enqueue=true parks the goal as Queued; enqueue=false rejects.
  • release(goal_id, terminal) returns the next-up queued goal so the orchestrator can promote it via promote_queued once the worktree / binding is ready.
  • apply_attempt(AttemptResult) refreshes the live snapshot. Idempotent against out-of-order replay (lower turn_index ignored).
  • Reattach (Phase 67.B.4) walks the SQLite store at boot and rehydrates Running rows. With resume_running=false they flip to LostOnRestart and surface to the operator.

LogBuffer keeps a per-goal ring of recent driver events for the agent_logs_tail tool — bounded so a chatty goal cannot OOM the process.

Persistence wiring (Phase 71)

The bin reads agent_registry.store from config/project-tracker/project_tracker.yaml and opens SqliteAgentRegistryStore when the resolved path is non-empty. Env placeholders (${NEXO_AGENT_REGISTRY_DB:-./data/agents.db}) are expanded before the open. Path open failures fall back to MemoryAgentRegistryStore with a warn so a corrupt sqlite file never bricks boot.

When the registry is sqlite-backed and reattach_on_boot: true, the bin runs the reattach sweep with resume_running=false. Every prior-run Running row flips to LostOnRestart, and any notify_origin / notify_channel hook attached to that goal fires once with an [abandoned] summary so the originating chat learns the goal could not be resumed. Subprocess respawn is intentionally not attempted — restoring a Claude Code worktree the daemon no longer owns is unsafe to do silently and lives under Phase 67.C.1.

Shutdown drain (Phase 71.3)

On SIGTERM the bin runs nexo_dispatch_tools::drain_running_goals before plugin teardown so notify_origin reaches WhatsApp / Telegram while their adapters are still alive. Each Running goal's Cancelled hooks fire with a [shutdown] summary; per-hook dispatch is bounded by a 2 s timeout so a stuck publish cannot hold shutdown hostage. The row then flips to LostOnRestart so the next boot's reattach sweep does not re-fire the same notification.

[shutdown] daemon stopping — goal `<id>` was running and has
been marked abandoned. Re-dispatch with `program_phase
phase_id=<phase>` if you still need it.

SIGKILL still bypasses this — the boot-time reattach sweep is the safety net for that case.

Turn-level audit log (Phase 72)

Live state (AgentSnapshot) only carries the latest decision / diff / acceptance per goal. Once a turn rolls forward the previous turn's data is gone. To answer "what did the agent actually do across its 40 turns?" the runtime now writes a durable row per turn into a goal_turns table on the same agents.db:

goal_turns(
    goal_id      TEXT,
    turn_index   INTEGER,
    recorded_at  INTEGER,
    outcome      TEXT,        -- done | continue | needs_retry | …
    decision     TEXT,        -- last Decision rendered as
                              --   "<tool> (allow|deny:msg|observe:note) — rationale"
    summary      TEXT,        -- mirror of AgentSnapshot.last_progress_text
    diff_stat    TEXT,
    error        TEXT,        -- pre-rendered for needs_retry / escalate / budget
    raw_json     TEXT,        -- full AttemptResult payload
    PRIMARY KEY (goal_id, turn_index)
);

EventForwarder writes a row on every AttemptResult event, upsert-on-conflict so a replay can't dup history. The new chat tool agent_turns_tail goal_id=<uuid> [n=20] returns a markdown table of the last N rows (default 20, capped at 1000):

showing 20 of 40 turn(s) for `…`

| turn | outcome | decision | summary | error |
|---|---|---|---|---|
| 21 | continue | Edit (allow) — patch crate slack | wired Plugin trait | - |
| 22 | needs_retry | Bash (allow) — cargo build | … | E0432 in slack/src/lib.rs |
…

Best-effort writes: an append failure logs a warn but never blocks the driver loop. When the registry isn't sqlite-backed (memory fallback), the tool reports "set agent_registry.store in project_tracker.yaml" rather than silently returning empty.

Async dispatch (Phase 67.C + 67.E)

DriverOrchestrator::spawn_goal(self: Arc<Self>, goal) returns a tokio::task::JoinHandle so the calling tool returns the goal id instantly without waiting for the run to finish. Per-goal pause / cancel signals (watch<bool> and CancellationToken::child_token) let pause_agent / cancel_agent target one goal without taking down the rest of the orchestrator.

program_phase_dispatch is the heart of the dispatch surface: it reads the sub-phase out of PHASES.md, runs DispatchGate::check, constructs a Goal with the dispatcher / origin metadata, asks the registry for a slot, and either spawns the goal or returns Queued / Forbidden / NotFound. dispatch_followup is the mirror that pulls the description from a FOLLOWUPS.md item.

Capability gate (Phase 67.D)

DispatchPolicy { mode, max_concurrent_per_dispatcher, allowed_phase_ids, forbidden_phase_ids } lives on AgentConfig and (as Option<DispatchPolicy>) on InboundBinding. The per-binding override fully replaces the agent-level value so an operator can be precise per channel ("asistente is none everywhere except this Telegram chat where it is full").

DispatchGate::check short-circuits in this order:

  1. capability NoneCapabilityNone (every kind).
  2. ReadOnly capability + write kind → CapabilityReadOnly.
  3. write + require_trusted + !sender_trustedSenderNotTrusted. Read tools bypass the trust gate so list_agents stays open for unpaired senders.
  4. forbidden_phase_ids match → PhaseForbidden.
  5. non-empty allowed_phase_ids + no match → PhaseNotAllowed.
  6. dispatcher / sender / global caps. Global cap with queue_when_full=true is admitted; the orchestrator queues it. Without queue → GlobalCapReached.

ToolRegistry::apply_dispatch_capability(policy, is_admin) prunes the registry of dispatch tool names not allowed by the resolved policy. ToolRegistryCache::get_or_build_with_dispatch builds the per-binding filtered registry that respects both allowed_tools and dispatch_policy. Hot reload (Phase 18) constructs a fresh ToolRegistryCache per snapshot, so a new dispatch_policy lands on the next intake without restart; in-flight goals keep their pre-reload tool surface so a hot reload never preempts.

Completion hooks (Phase 67.F)

Each hook is (on: HookTrigger, action: HookAction, id). Triggers fire on Done | Failed | Cancelled | Progress { every_turns }. Actions:

  • notify_origin — publish a markdown summary to the chat that triggered the goal. No-op when origin.plugin == "console".
  • notify_channel { plugin, instance, recipient } — publish to an explicit channel different from the origin (escalate to ops).
  • dispatch_phase { phase_id, only_if } — chain another goal when only_if matches the firing transition. Implemented via a pluggable DispatchPhaseChainer so the runtime owns program_phase_dispatch plumbing.
  • nats_publish { subject } — JSON payload to a custom subject.
  • shell { cmd, timeout } — opt-in via allow_shell_hooks. Capability PROGRAM_PHASE_ALLOW_SHELL_HOOKS registered with the setup inventory so agent doctor capabilities flags it the moment the operator exports the env var. Receives NEXO_HOOK_GOAL_ID / PHASE_ID / TRANSITION / PAYLOAD_JSON env vars.

HookIdempotencyStore (SQLite) keeps (goal_id, transition, action_kind, action_id) UNIQUE so at-least-once NATS replay or a mid-hook restart cannot fire a hook twice.

HookRegistry (in-memory DashMap<GoalId, Vec<CompletionHook>>) backs add_hook / remove_hook / agent_hooks_list.

NATS subjects (Phase 67.H.2)

SubjectProducer
agent.dispatch.spawnedprogram_phase_dispatch admitted
agent.dispatch.deniedDispatchGate::check denied
agent.tool.hook.dispatchedhook fired ok
agent.tool.hook.failedhook attempt errored
agent.registry.snapshot.<goal_id>per-goal periodic beacon
agent.driver.progressevery Nth completed work-turn

Plus the existing Phase 67.0–67.9 subjects: agent.driver.{goal,attempt}.{started,completed}, agent.driver.{decision,acceptance,budget.exhausted,escalate,replay,compact}.

CLI (Phase 67.H.1)

nexo-driver-tools mirrors the chat tool surface for shell use:

nexo-driver-tools status [--phase <id> | --followups]
nexo-driver-tools dispatch <phase_id>
nexo-driver-tools agents list [--filter running|queued|...]
nexo-driver-tools agents show <goal_id>
nexo-driver-tools agents cancel <goal_id> [--reason "…"]

origin.plugin = "console" so notify_origin is a no-op (the operator sees stdout, not a chat reply).

Built-in registration (nexo daemon)

The default nexo agent binary registers every dispatch tool definition at boot via nexo_core::agent::dispatch_handlers::register_dispatch_tools_into. The LLM sees program_phase, list_agents, agent_status, etc. in its toolset; per-binding dispatch_capability (config/agents.yaml) prunes the write tools for bindings that opted out.

What's NOT bundled by default is the runtime DispatchToolContext — the orchestrator + registry + tracker references the handlers consult. Without it, a tool call returns a clean dispatch tools require AgentContext.dispatch to be set at boot error instead of pretending success. Two integration paths from there:

  • In-process orchestrator — boot a DriverOrchestrator alongside the agents, share one AgentRegistry. See the next section for the wiring sample.
  • NATS-based dispatch — agent bin publishes a message to agent.driver.dispatch.request that a separate nexo-driver daemon consumes. This is the topology to use when the Claude subprocess needs hardware (GPU box) the agent daemon doesn't have. The dispatch tool surface only changes in the registry it consults; operators can swap the in- process AgentRegistry for one that mirrors a NATS-backed registry without touching the handlers.

Boot wiring (B8)

The integrator's main.rs ties everything together. Minimal shape:

use std::sync::Arc;
use nexo_agent_registry::{AgentRegistry, MemoryAgentRegistryStore, LogBuffer};
use nexo_core::agent::{
    dispatch_handlers::{register_dispatch_tools_into, DispatchToolContext},
    tool_registry::ToolRegistry,
};
use nexo_dispatch_tools::{
    event_forwarder::EventForwarder,
    hooks::{DefaultHookDispatcher, HookRegistry, NoopNatsHookPublisher},
    policy_gate::CapSnapshot,
    NoopTelemetry,
};
use nexo_pairing::PairingAdapterRegistry;
use nexo_project_tracker::FsProjectTracker;

// 1. Project tracker.
let tracker: Arc<dyn nexo_project_tracker::ProjectTracker> =
    Arc::new(FsProjectTracker::open(std::env::current_dir().unwrap())?);

// 2. Agent registry + log buffer.
let registry = Arc::new(AgentRegistry::new(
    Arc::new(MemoryAgentRegistryStore::default()),
    4,
));
let log_buffer = Arc::new(LogBuffer::new(200));
let hook_registry = Arc::new(HookRegistry::new());

// 3. Hook dispatcher with the channel adapters that Phase 26
//    registered (whatsapp / telegram).
let pairing = PairingAdapterRegistry::new();
// pairing.register(WhatsappPairingAdapter::new(...));
// pairing.register(TelegramPairingAdapter::new(...));
let hook_dispatcher = Arc::new(DefaultHookDispatcher::new(
    pairing,
    Arc::new(NoopNatsHookPublisher),
));

// 4. Orchestrator with EventForwarder so registry / log_buffer /
//    hooks see every driver event.
let inner_sink: Arc<dyn nexo_driver_loop::DriverEventSink> =
    Arc::new(nexo_driver_loop::NoopEventSink);
let event_sink: Arc<dyn nexo_driver_loop::DriverEventSink> =
    Arc::new(EventForwarder::new(
        registry.clone(),
        log_buffer.clone(),
        hook_registry.clone(),
        hook_dispatcher.clone(),
        inner_sink,
    ));
// (orchestrator builder consumes event_sink)

// 5. Bundle for AgentContext.dispatch.
let dispatch_ctx = Arc::new(DispatchToolContext {
    tracker,
    orchestrator: orch.clone(),
    registry,
    hooks: hook_registry,
    log_buffer,
    default_caps: CapSnapshot {
        queue_when_full: true,
        ..Default::default()
    },
    require_trusted: true,
    telemetry: Arc::new(NoopTelemetry),
});

// 6. Register the handlers into the base ToolRegistry. The
//    per-binding cache prunes write tools when capability=None
//    or read_only.
let base = ToolRegistry::new();
register_dispatch_tools_into(&base);

// 7. Per-session AgentContext.with_dispatch(dispatch_ctx)
//    + .with_sender_trusted(true) + .with_inbound_origin(plugin,
//    instance, sender).

Without step 6 the handlers exist but aren't reachable by the LLM. Without step 4 the registry / log_buffer / hooks stay inert. Without step 5 the handlers return MissingDispatchCtx.

See also

  • proyecto/PHASES.md — Phase 67.A–H sub-phase status of record.
  • architecture/driver-subsystem.md — Phase 67.0–67.9 driver loop
    • replay + compact policies.