Limits & Timeouts
The timeout budget is industry-standard and deliberately generous: a healthy stream is never cut.
Timeouts
| Stage | Budget |
|---|---|
| Upstream connect | 10 s |
| Non-streaming request, total | 10 min |
| Streaming: time to first token | 5 min |
| Streaming: inter-chunk idle | 120 s |
| Streaming: absolute stream cap | 60 min |
| SSE keep-alive comments | every 15 s |
Keep-alive comments (: keep-alive) are emitted every 15 seconds so intermediate load
balancers and proxies never idle-close a slow-but-healthy stream. Configure your own
client and proxy read timeouts above 120 s for streams.
Exceeding a timeout returns provider_timeout (HTTP 504) in your ingress format’s
error shape — see Errors.
Request size
| Limit | Value |
|---|---|
| Request body (incl. base64 files/PDFs/images) | 50 MB |
JSON depth and size are validated before parsing. Oversized bodies fail with
payload_too_large (HTTP 413); structurally invalid bodies with invalid_request
(HTTP 422/400) — always in your format’s native error envelope. Response/download
streaming is unbounded within the 60-minute stream cap.
Rate limits
Distinct from per-key spend limits, an abuse guard applies per key:
| Default | |
|---|---|
| Sustained | 60 requests/min |
| Burst | 120 requests |
Exceeding it returns rate_limit_exceeded (HTTP 429) with a Retry-After header.
Limits are enforced globally across all API instances (token bucket), not per instance.
Gateway retries (on our side)
For transient upstream failures — connect errors and 5xx/429 before any byte of a
streamed response — the gateway retries at most 2 times with jittered backoff,
honoring upstream Retry-After. Mid-stream failures are never retried; you’ll receive
the stream’s error event instead of silently duplicated content.