Benchmarks
The workspace ships criterion benchmark suites for every hot path
that runs on the data plane. CI executes them on every PR + weekly
on main so regressions are visible before merge.
Quick run
# Single crate:
cargo bench -p nexo-resilience
# Single bench within a crate:
cargo bench -p nexo-broker --bench topic_matches
# Single group within a bench:
cargo bench -p nexo-broker --bench topic_matches -- 'topic_matches/wildcard'
Output goes to target/criterion/. Open index.html under that
directory in a browser for the full HTML report.
Coverage matrix
| Crate | Bench | What it measures | Run target |
|---|---|---|---|
nexo-resilience | circuit_breaker | CircuitBreaker::allow (closed + open), on_success, on_failure, 8-task concurrent allow contention | sub-100ns per call |
nexo-broker | topic_matches | NATS-style pattern matching (exact, single-wildcard *, multi-wildcard >, 50-pattern storm) | sub-100ns per match |
nexo-broker | local_publish | End-to-end LocalBroker::publish with 0 / 1 / 10 / 50 subscribers (DashMap scan + try_send + slow-consumer drop counter) | sub-10µs at 50 subs |
nexo-llm | sse_parsers | OpenAI / Anthropic / Gemini SSE parsers, 50-chunk fixtures (typical short answer) | chunks/sec scales linearly |
nexo-taskflow | tick | WaitEngine::tick at 10 / 100 / 1 000 active waiting flows | sub-millisecond at single-host scale |
What's NOT benched yet
These are tracked under Phase 35.5 follow-up:
nexo-coretranscripts FTS search — needs SQLite fixture seed before the bench is meaningful.nexo-coreredaction pipeline — wait for the local-LLM redaction backend (Phase 68.7) so we measure the real path operators ship.nexo-mcpencode_request/parse_notification_method— cheap to add; will land alongside an MCP-stdio round-trip bench.nexo-memoryvector-search recall — needs a public dataset baseline.
Add a bench by following the patterns in crates/<x>/benches/:
[dev-dependencies]addscriterion = "0.5"(withasync_tokioif you need a runtime).[[bench]]registersname = "<bench>"andharness = false.- Bench file uses
Throughput::Elements(N)so output is ops/sec, not rawns/iter. - Each
criterion_group!covers a distinct conceptual path — don't bundle unrelated paths.
CI integration
.github/workflows/bench.yml runs the matrix on:
- every PR that touches
crates/**,Cargo.lock, orCargo.toml - weekly on Sunday 04:00 UTC against
main - manual
workflow_dispatch
Each run uploads target/criterion/ as an artifact retained 30
days. PR runs save with --save-baseline pr-<number>; main runs
save as main. Compare locally with:
# Pull the artifact for PR #42
gh run download <run-id> --name bench-nexo-broker-<run-id>
# Compare against the local main baseline
cargo bench -p nexo-broker -- --baseline main
Today the CI job is informational — a regression doesn't
fail the PR. Once we have ~10 main runs of baseline data per
crate, the workflow gates on >10% regression per group. That's
Phase 35.6 done-criteria.
Known limitations
- GitHub Actions runners are noisy. The
ubuntu-latestshared runner tier shows ±5-10% variance on microbenchmarks. This is why we don't gate on small regressions yet — the baseline noise floor is itself ~5%. - Benches don't measure cold cache.
cargo bench's warm-up phase reaches steady-state CPU caches; first-call latency on a cold runtime is not captured. Add a separatebench_cold_*group when this matters (it usually doesn't — hot path is what matters at scale). - No cross-crate end-to-end benchmark yet. Phase 35.3 (load test rig) covers that; today's suites are per-crate microbenchmarks.
Reading criterion output
A typical run prints:
publish/mixed_50_subs time: [12.347 µs 12.451 µs 12.567 µs]
thrpt: [3.9786 Melem/s 4.0153 Melem/s 4.0494 Melem/s]
change: time: [-0.4% +0.3% +1.1%] (p = 0.62 > 0.05)
thrpt: [-1.1% -0.3% +0.4%]
No change in performance detected.
timeis the per-iteration latency (lower better).thrptis throughput (higher better) — only present when the bench declaredThroughput::Elements(N).changecompares against the previous run on the same hardware.p > 0.05means the difference is within noise.
Look for change reporting "Performance has regressed" with a
red bar — that's the signal a PR introduced a regression.