Benchmarks

Honest latency, honest methodology.

We publish what we measure, and we publish how we measured it. No cherry-picked P50s, no synthetic request shapes.

Status: re-measuring for v1.4

Numbers below reflect our last stable release measurement run. The v1.4 ContextWorker spawn path rework (see changelog) shifts the middle of the distribution; a fresh measurement campaign is underway and will publish here when it stabilizes. Pending values are marked Pending.

Methodology

  • Hardware: c7i.2xlarge (4 vCPU, 8 GiB RAM), Ubuntu 22.04, kernel 6.5.
  • Request shape: 2.1 KB JSON body, OpenAI chat-completions format, 1.2 KB response body.
  • Concurrency: 200 in-flight requests held open; upstream is a local echo service to isolate gateway overhead from provider latency.
  • Measurement tool: vegeta at 2,000 req/s for 120 seconds, warm pool.
  • Report: latency as gateway overhead only (client to upstream to client, minus upstream round-trip baseline).

Gateway overhead

PipelineP50P95P99
Bare pass-through182 us298 us421 us
Pass-through + 1 worker (pii-scrub)238 us389 us577 us
Pass-through + 3 workers (pii-scrub, metering, audit)312 us501 us742 us
Pass-through + 4 workers (+ policy)338 us548 us801 us

Component overhead

ComponentP50Notes
Router + circuit-breaker selection14 usO(1) hash lookup + atomic read of breaker state
WASM ContextWorker cold-spawn90 usWarm pool; cold path amortized over first 4 requests
WASM worker steady-state call46 usPer-worker, measured with pii-scrub
CEL policy evaluate8 usType-checked at config load
Receipt sign (Ed25519)22 usPer request, one signature

Throughput ceiling

Single instance, c7i.2xlarge, pass-through + metering pipeline, saturated: ~14,500 req/s sustained. Add an 8 GiB instance and you are CPU-bound, not network-bound. Horizontal scaling is linear up to the stateful limit of the local receipt store; beyond that, use the managed ledger.

Streaming

Streaming responses add ~40 us of first-byte overhead from the transform layer that converts between inbound and upstream event shapes. Per-chunk overhead is Pending; the v1.4 benchmark refresh is scheduled for May 2026.

Reproducibility

The measurement scripts and config live in the bench directory. Clone, run, compare. If your numbers diverge, open an issue with the profile and hardware.