RelayGate · one gateway for every model · v1.x

One gateway. Every model. Every audit.

Stop juggling provider keys. RelayGate is the open-source gateway between your apps and every LLM provider: virtual keys, programmable policy, cost caps, signed audit, and inline rewrites in one place.

For solo devs juggling 4 provider keys · platform teams · compliance teams

Point your apps at RelayGate instead of each model vendor directly. The gateway decides which provider to use, which prompts need redaction, when to fall back to a cheaper model, and which requests should be blocked or rerouted before the model ever sees them.

See pricing →
request flow · live
ContextWorker: ON
request POST /v1/chat/completions inbound: OpenAI · 2.1 KB
ContextWorker cw_pii_scrub · active pre-request · 0.14 ms · 2 redactions
backend anthropic-direct claude-sonnet-4 · streaming
worker ran
worker_idcw_pii_scrub_v2
modepre-request
latency0.14 ms
fields_modified[prompt]
redactions2
receipt
idrcpt_01HY...
ed25519signed
backendanthropic-direct
status200
12+ providers one virtual key per app
·
CEL policy every rule explicit
·
$0 open-source self-host
·
1 gateway every model, every audit
§ demo 01 / 04
Routing proof · 01 / 04

Watch one request get checked, rewritten, and routed before the model sees it.

This is the core RelayGate promise in motion: one request comes in, policy runs, a ContextWorker changes what needs changing, and the signed result moves on without the app needing to know which provider handled it.

liverequest-lifecycle · /v1/chat/completions
scene 1 / 10
total duration: 1220 ms · worker overhead: illustrative · signed events: 2
Request received t = 0.00s
state
event
contextworker
Illustrative. ContextWorker latencies vary with worker implementation. Benchmarks at /benchmarks.
§ demo 02 / 04
One canonical shape · 02 / 04

OpenAI, Anthropic, or Gemini in. ChatRequest out.

RelayGate translates every inbound format to a canonical ChatRequest before routing. Your code doesn't know or care which client protocol arrived. Switch providers without rewrites.

same logical request · three wire formats
inbound translator
inbound wire formatopenai
POST /v1/chat/completions
RelayGate inbound translator
0.03 ms
internal/inbound/openai.go
canonical outputstable
ChatRequest v1
{
  "model":    "<normalized>",
  "messages": "<canonical>",
  "tools":    []
}
output is identical across all three inbound formats
Your code doesn't know which format arrived. Three translators, one internal shape.
§ demo 03 / 04
How we differ · 03 / 04

When all backends are down, we tell you.

Most gateways queue your request internally and hope. RelayGate returns 503 plus Retry-After. Your SDK's retry logic takes over. You stay in control of timing, budget, and fallback.

simulation · all backends: circuit open
idle
idle — press trigger to simulate a full-provider outage
Typical gateway under outage
all backends circuit-open · queueing enabled
silent failures
RelayGate under outage
all backends circuit-open · honest backpressure
honest backpressure
Illustrative outage simulation. Underlying circuit-breaker benchmark: ~18.7 µs, zero allocations, from /benchmarks.
§ demo 04 / 04
One engine · 04 / 04

One expression language. Routing. Policy. Rate limits. Budgets.

CEL for routing decisions. CEL for policy gates. CEL for rate-limits and budget checks. Same engine, same syntax, one thing to learn. Type-checked expressions with predictable error messages.

config · /etc/relaygate/rules.cel
CEL v1
6 / 8 rules active routing · policy · rate-limit
Permitted action surface 6 active
All rules are CEL. One engine handles all four surfaces. Type-checked at config-load.
§ demo 05 / 08
Compose · 05 / 08

Chain ContextWorkers into a pipeline.

Pick from the gallery. Drop them into the request path in any order. The pipeline executes top-to-bottom, each worker can pass, mutate, enrich, block, or branch. Drag to reorder. The total overhead is the sum.

pipelinerelaygate.pipeline.compose
total: 0 workers · 0.00 ms
REQ · as the client sent itraw input
POST /v1/chat/completions
tenant: org_01HX
model:  claude-sonnet-4
tokens_max: 4000
prompt: "Review this PR for compliance. Contact the
         author at [email protected] or 555-12-3456 with
         issues. Reference: PR #4821."
pipeline · executes top → bottomdrop workers below
empty pipeline · request passes through at 0 ms
OUT · what the backend actually receivesafter pipeline
waiting for pipeline run…
auto-loop
worker action logidle
run a pipeline to see what each worker did to the request, step by step.
Workers execute in order. Mutations are passed to the next worker. Block aborts the chain and returns to caller. Every run emits a signed receipt listing every worker hit.
§ demo 06 / 08
Integrations · 06 / 08

Five flows through integrated products. In development

RelayGate is designed to compose with R1, TrueCom, RelayOne, DeepTap, Veritize, Actium, CloudSwarm, and Heroa. The flows below are an illustrative walkthrough of the planned integrations — some hops are live today (TrueCom receipt emission), others are scaffolded and land progressively.

Status (Q3 2026 roadmap): The inline R1 ContextWorker stage, DeepTap private-corpus grounding, and Veritize post-response verification shown in the flows below are illustrative of the integration roadmap. Stages are scaffolded; production wiring lands progressively. See pipeline status →

SaaS customer, fully managed path
LLM client → RelayOne → RelayGate → model. Grounding and verification inline.
step 0 / 0
trace · eventsidle
products in this flow
§ demo 07 / 08
In production · 07 / 08

Live traffic, live decisions. Watch the edge work.

Every request passes through CEL evaluation, ContextWorker execution, routing, and receipt signing. This is a simulated stream of ~8 rps. Click a request to see the full decision tree. Inject chaos to watch the system respond.

liverelaygate.console · tail -f /edge
● 0.0 rps
request stream0 / 0
decision tree
click a request in the stream to inspect its decision tree
inject:
§ demo 08 / 08
Shape · 08 / 08

The shape of one binary.

Reveal the stack. Every capability below ships inside the same ~18 MB statically-compiled binary. No sidecars, no C dependencies, no runtime per-feature install.

inbound translators
3 formats
OpenAI · Anthropic · Gemini
backend drivers
10 providers
OpenAI · Anthropic · Google · Groq · DeepSeek · Together · Mistral · Cohere · OpenRouter · +1 slot
CEL engine
4 surfaces
routing · policy · rate-limits · budgets
ContextWorker runtime
inline exec
scripts · R1 agents · sub-millisecond spawn · ~12 µs
circuit breakers
~18.7 µs
zero allocations · per-backend
receipt signer
Ed25519
TrueCom-compatible · < 0.3 ms
install paths
6 package managers
brew · deb · rpm · tar · Docker · Helm
binary
~18 MB
CGO=0 · no runtime deps · static
Every cell is a feature that ships inside the single binary. No per-feature install, no sidecar. Total router overhead per request: ~238 µs.
Measured, not marketing

What RelayGate actually costs per request.

Published benchmarks. Updated on each release. Reproducible from the repo.

Benchmarks update with each release. Reproduce from the /benchmarks page.
View full benchmarks
What's rare here

Five things that don't exist in the rest of the AI gateway market.

Not features. Specific architectural decisions that make RelayGate a different product, not just a faster one.

01 / 05

ContextWorker: programmable middleware, not dressed-up routing.

Every AI gateway offers routing, caching, and retries. RelayGate is the only one that lets you drop a script or a full R1 agent into the middle of a request, run it inline at sub-millisecond overhead, and shape the request or response before anything reaches your backend. A PII scrub, a code-review pass, an inline tool-call round-trip: all run as ContextWorkers, not as out-of-band webhook callbacks with multi-second latency.

02 / 05

Honest backpressure, by design.

When every backend is down, most gateways queue your request internally and silently consume your timeout budget while trying to recover. RelayGate immediately returns 503 with a Retry-After header. Your SDK's retry logic takes over. You stay in control of timing, fallback, and budget. Silent failure is the feature you did not choose; honest backpressure is the one you wanted.

03 / 05

CEL for everything.

Most AI gateways invent a routing DSL, then a different policy DSL, then a separate rate-limit grammar, then a budget mini-language. RelayGate uses Common Expression Language across all four surfaces. One engine. Type-checked at config-load time, not at request time. Predictable error messages. Your platform team learns one thing.

04 / 05

One binary, CGO=0, no runtime dependencies.

RelayGate is a single statically-compiled Go binary with no C dependencies. Under twenty megabytes. Deploys on bare metal, container, Helm, Docker, or a Raspberry Pi. No glibc incompatibilities, no libstdc++ version pins, no "it worked in dev." The operations story is: ship the binary, set some environment variables, run.

05 / 05

Signed receipts per request, TrueCom-compatible.

Every request produces an Ed25519-signed receipt suitable for audit, billing, or dispute resolution. The receipt format matches TrueCom's commerce substrate, so the same receipt can land in your finance pipeline without a second integration. Compliance and finance see the same signed event.

Each claim corresponds to a measurable capability or a demo above. Benchmarks at /benchmarks.
Integrations · 9 products

RelayGate composes with adjacent products.

RelayGate stands alone. It also turns into something more when it runs alongside these tools. Click a product to see how it plugs into the request path.

products in the suitehover to inspect
RelayGate is open source at the core. Self-host is free forever. Managed and fleet features are commercial.
Pricing

Self-host is free. Managed starts when you want it to.

Full pricing and feature matrix on /pricing. Quick preview below.

Self-host
for teams running their own infrastructure
free forever · Apache 2.0 core
  • Single binary, full feature set
  • ContextWorker, CEL, all 10 backends, all 3 inbound formats
  • Community support, GitHub issues
Download binary
Enterprise
for regulated or fleet-scale deployments
Custom SLA · sovereign deployment options
  • SSO, SCIM, fleet deployment via RelayOne
  • SLA, dedicated success engineer
  • On-prem connect, sovereign deployment options
Contact sales
Pricing is request-rate based. See full schedule on /pricing.

Install RelayGate.

One binary. Every major platform. No runtime dependencies.

brew install relayone/tap/relaygate
No runtime dependencies. CGO=0. Apache 2.0 core, MIT drivers, Managed tier commercial.