RelayGate · one gateway for every model · v1.x

One gateway. Every model. Every audit.

Stop juggling provider keys. RelayGate is the open-source gateway between your apps and every LLM provider: virtual keys, programmable policy, cost caps, signed audit, and inline rewrites in one place.

For solo devs juggling 4 provider keys · platform teams · compliance teams

Point your apps at RelayGate instead of each model vendor directly. The gateway decides which provider to use, which prompts need redaction, when to fall back to a cheaper model, and which requests should be blocked or rerouted before the model ever sees them.

Self-host in 60s See it route a request ↓

See pricing →

request flow · live

ContextWorker: ON

request POST /v1/chat/completions inbound: OpenAI · 2.1 KB

ContextWorker cw_pii_scrub · active pre-request · 0.14 ms · 2 redactions

backend anthropic-direct claude-sonnet-4 · streaming

worker ran

worker_idcw_pii_scrub_v2

modepre-request

latency0.14 ms

fields_modified[prompt]

redactions2

receipt

idrcpt_01HY...

ed25519signed

backendanthropic-direct

status200

12+ providers one virtual key per app

CEL policy every rule explicit

$0 open-source self-host

1 gateway every model, every audit

§ demo 01 / 04

Routing proof · 01 / 04

Watch one request get checked, rewritten, and routed before the model sees it.

This is the core RelayGate promise in motion: one request comes in, policy runs, a ContextWorker changes what needs changing, and the signed result moves on without the app needing to know which provider handled it.

liverequest-lifecycle · /v1/chat/completions

scene 1 / 10

total duration: 1220 ms · worker overhead: illustrative · signed events: 2

Request received t = 0.00s

state

event

contextworker

Illustrative. ContextWorker latencies vary with worker implementation. Benchmarks at /benchmarks.

§ demo 02 / 04

One canonical shape · 02 / 04

OpenAI, Anthropic, or Gemini in. ChatRequest out.

RelayGate translates every inbound format to a canonical ChatRequest before routing. Your code doesn't know or care which client protocol arrived. Switch providers without rewrites.

same logical request · three wire formats

inbound translator

inbound wire formatopenai

POST /v1/chat/completions

RelayGate inbound translator

0.03 ms

internal/inbound/openai.go

canonical outputstable

ChatRequest v1

{
  "model":    "<normalized>",
  "messages": "<canonical>",
  "tools":    []
}

output is identical across all three inbound formats

Your code doesn't know which format arrived. Three translators, one internal shape.

§ demo 03 / 04

How we differ · 03 / 04

When all backends are down, we tell you.

Most gateways queue your request internally and hope. RelayGate returns 503 plus Retry-After. Your SDK's retry logic takes over. You stay in control of timing, budget, and fallback.

simulation · all backends: circuit open

idle

idle — press trigger to simulate a full-provider outage

Typical gateway under outage

all backends circuit-open · queueing enabled

silent failures

RelayGate under outage

all backends circuit-open · honest backpressure

honest backpressure

Illustrative outage simulation. Underlying circuit-breaker benchmark: ~18.7 µs, zero allocations, from /benchmarks.

§ demo 04 / 04

One engine · 04 / 04

One expression language. Routing. Policy. Rate limits. Budgets.

CEL for routing decisions. CEL for policy gates. CEL for rate-limits and budget checks. Same engine, same syntax, one thing to learn. Type-checked expressions with predictable error messages.

config · /etc/relaygate/rules.cel

CEL v1

6 / 8 rules active routing · policy · rate-limit

Permitted action surface 6 active

All rules are CEL. One engine handles all four surfaces. Type-checked at config-load.

§ demo 05 / 08

Compose · 05 / 08

Chain ContextWorkers into a pipeline.

Pick from the gallery. Drop them into the request path in any order. The pipeline executes top-to-bottom, each worker can pass, mutate, enrich, block, or branch. Drag to reorder. The total overhead is the sum.

pipelinerelaygate.pipeline.compose

total: 0 workers · 0.00 ms

library10 workers

REQ · as the client sent itraw input

POST /v1/chat/completions
tenant: org_01HX
model:  claude-sonnet-4
tokens_max: 4000
prompt: "Review this PR for compliance. Contact the
         author at [email protected] or 555-12-3456 with
         issues. Reference: PR #4821."

pipeline · executes top → bottomdrop workers below

empty pipeline · request passes through at 0 ms

OUT · what the backend actually receivesafter pipeline

waiting for pipeline run…

auto-loop

worker action logidle

run a pipeline to see what each worker did to the request, step by step.

Workers execute in order. Mutations are passed to the next worker. Block aborts the chain and returns to caller. Every run emits a signed receipt listing every worker hit.

§ demo 06 / 08

Integrations · 06 / 08

Five flows through integrated products. In development

RelayGate is designed to compose with R1, TrueCom, RelayOne, DeepTap, Veritize, Actium, CloudSwarm, and Heroa. The flows below are an illustrative walkthrough of the planned integrations — some hops are live today (TrueCom receipt emission), others are scaffolded and land progressively.

Status (Q3 2026 roadmap): The inline R1 ContextWorker stage, DeepTap private-corpus grounding, and Veritize post-response verification shown in the flows below are illustrative of the integration roadmap. Stages are scaffolded; production wiring lands progressively. See pipeline status →

SaaS customer, fully managed path

LLM client → RelayOne → RelayGate → model. Grounding and verification inline.

step 0 / 0

trace · eventsidle

products in this flow

§ demo 07 / 08

In production · 07 / 08

Live traffic, live decisions. Watch the edge work.

Every request passes through CEL evaluation, ContextWorker execution, routing, and receipt signing. This is a simulated stream of ~8 rps. Click a request to see the full decision tree. Inject chaos to watch the system respond.

liverelaygate.console · tail -f /edge

● 0.0 rps

request stream0 / 0

decision tree—

click a request in the stream to inspect its decision tree

inject:

§ demo 08 / 08

Shape · 08 / 08

The shape of one binary.

Reveal the stack. Every capability below ships inside the same ~18 MB statically-compiled binary. No sidecars, no C dependencies, no runtime per-feature install.

inbound translators

3 formats

OpenAI · Anthropic · Gemini

backend drivers

10 providers

OpenAI · Anthropic · Google · Groq · DeepSeek · Together · Mistral · Cohere · OpenRouter · +1 slot

CEL engine

4 surfaces

routing · policy · rate-limits · budgets

ContextWorker runtime

inline exec

scripts · R1 agents · sub-millisecond spawn · ~12 µs

circuit breakers

~18.7 µs

zero allocations · per-backend

receipt signer

Ed25519

TrueCom-compatible · < 0.3 ms

install paths

6 package managers

brew · deb · rpm · tar · Docker · Helm

binary

~18 MB

CGO=0 · no runtime deps · static

Every cell is a feature that ships inside the single binary. No per-feature install, no sidecar. Total router overhead per request: ~238 µs.

Measured, not marketing

What RelayGate actually costs per request.

Published benchmarks. Updated on each release. Reproducible from the repo.

Benchmarks update with each release. Reproduce from the /benchmarks page.

View full benchmarks

What's rare here

Five things that don't exist in the rest of the AI gateway market.

Not features. Specific architectural decisions that make RelayGate a different product, not just a faster one.

01 / 05

ContextWorker: programmable middleware, not dressed-up routing.

Every AI gateway offers routing, caching, and retries. RelayGate is the only one that lets you drop a script or a full R1 agent into the middle of a request, run it inline at sub-millisecond overhead, and shape the request or response before anything reaches your backend. A PII scrub, a code-review pass, an inline tool-call round-trip: all run as ContextWorkers, not as out-of-band webhook callbacks with multi-second latency.

02 / 05

Honest backpressure, by design.

When every backend is down, most gateways queue your request internally and silently consume your timeout budget while trying to recover. RelayGate immediately returns 503 with a Retry-After header. Your SDK's retry logic takes over. You stay in control of timing, fallback, and budget. Silent failure is the feature you did not choose; honest backpressure is the one you wanted.

03 / 05

CEL for everything.

Most AI gateways invent a routing DSL, then a different policy DSL, then a separate rate-limit grammar, then a budget mini-language. RelayGate uses Common Expression Language across all four surfaces. One engine. Type-checked at config-load time, not at request time. Predictable error messages. Your platform team learns one thing.

04 / 05

One binary, CGO=0, no runtime dependencies.

RelayGate is a single statically-compiled Go binary with no C dependencies. Under twenty megabytes. Deploys on bare metal, container, Helm, Docker, or a Raspberry Pi. No glibc incompatibilities, no libstdc++ version pins, no "it worked in dev." The operations story is: ship the binary, set some environment variables, run.

05 / 05

Signed receipts per request, TrueCom-compatible.

Every request produces an Ed25519-signed receipt suitable for audit, billing, or dispute resolution. The receipt format matches TrueCom's commerce substrate, so the same receipt can land in your finance pipeline without a second integration. Compliance and finance see the same signed event.

Each claim corresponds to a measurable capability or a demo above. Benchmarks at /benchmarks.

Integrations · 9 products

RelayGate composes with adjacent products.

RelayGate stands alone. It also turns into something more when it runs alongside these tools. Click a product to see how it plugs into the request path.

products in the suitehover to inspect

RelayGate is open source at the core. Self-host is free forever. Managed and fleet features are commercial.

Pricing

Self-host is free. Managed starts when you want it to.

Full pricing and feature matrix on /pricing. Quick preview below.

Self-host

for teams running their own infrastructure

free forever · Apache 2.0 core

Single binary, full feature set
ContextWorker, CEL, all 10 backends, all 3 inbound formats
Community support, GitHub issues

Download binary

Managed

for teams that want someone else to run it

Managed deployment, health monitoring, credential rotation
Managed ContextWorker library, priority support
Usage dashboards, quarterly review

Start managed trial

Enterprise

for regulated or fleet-scale deployments

Custom SLA · sovereign deployment options

SSO, SCIM, fleet deployment via RelayOne
SLA, dedicated success engineer
On-prem connect, sovereign deployment options

Contact sales

Pricing is request-rate based. See full schedule on /pricing.

Install RelayGate.

One binary. Every major platform. No runtime dependencies.

brew install relayone/tap/relaygate

No runtime dependencies. CGO=0. Apache 2.0 core, MIT drivers, Managed tier commercial.

Read the docs → Browse the repo