Status: re-measuring for v1.4

Numbers below reflect our last stable release measurement run. The v1.4 ContextWorker spawn path rework (see changelog) shifts the middle of the distribution; a fresh measurement campaign is underway and will publish here when it stabilizes. Pending values are marked Pending.

Methodology

Hardware: c7i.2xlarge (4 vCPU, 8 GiB RAM), Ubuntu 22.04, kernel 6.5.
Request shape: 2.1 KB JSON body, OpenAI chat-completions format, 1.2 KB response body.
Concurrency: 200 in-flight requests held open; upstream is a local echo service to isolate gateway overhead from provider latency.
Measurement tool: vegeta at 2,000 req/s for 120 seconds, warm pool.
Report: latency as gateway overhead only (client to upstream to client, minus upstream round-trip baseline).

Gateway overhead

Pipeline	P50	P95	P99
Bare pass-through	182 us	298 us	421 us
Pass-through + 1 worker (pii-scrub)	238 us	389 us	577 us
Pass-through + 3 workers (pii-scrub, metering, audit)	312 us	501 us	742 us
Pass-through + 4 workers (+ policy)	338 us	548 us	801 us

Component overhead

Component	P50	Notes
Router + circuit-breaker selection	14 us	O(1) hash lookup + atomic read of breaker state
WASM ContextWorker cold-spawn	90 us	Warm pool; cold path amortized over first 4 requests
WASM worker steady-state call	46 us	Per-worker, measured with pii-scrub
CEL policy evaluate	8 us	Type-checked at config load
Receipt sign (Ed25519)	22 us	Per request, one signature

Throughput ceiling

Single instance, c7i.2xlarge, pass-through + metering pipeline, saturated: ~14,500 req/s sustained. Add an 8 GiB instance and you are CPU-bound, not network-bound. Horizontal scaling is linear up to the stateful limit of the local receipt store; beyond that, use the managed ledger.

Streaming

Streaming responses add ~40 us of first-byte overhead from the transform layer that converts between inbound and upstream event shapes. Per-chunk overhead is Pending; the v1.4 benchmark refresh is scheduled for May 2026.

Reproducibility

The measurement scripts and config live in the bench directory. Clone, run, compare. If your numbers diverge, open an issue with the profile and hardware.