Limits & Timeouts

The timeout budget is industry-standard and deliberately generous: a healthy stream is never cut.

Timeouts

Stage	Budget
Upstream connect	10 s
Non-streaming request, total	10 min
Streaming: time to first token	5 min
Streaming: inter-chunk idle	120 s
Streaming: absolute stream cap	60 min
SSE keep-alive comments	every 15 s

Keep-alive comments (: keep-alive) are emitted every 15 seconds so intermediate load balancers and proxies never idle-close a slow-but-healthy stream. Configure your own client and proxy read timeouts above 120 s for streams.

Exceeding a timeout returns provider_timeout (HTTP 504) in your ingress format’s error shape — see Errors.

Request size

Limit	Value
Request body (incl. base64 files/PDFs/images)	50 MB

JSON depth and size are validated before parsing. Oversized bodies fail with payload_too_large (HTTP 413); structurally invalid bodies with invalid_request (HTTP 422/400) — always in your format’s native error envelope. Response/download streaming is unbounded within the 60-minute stream cap.

Rate limits

Distinct from per-key spend limits, an abuse guard applies per key:

	Default
Sustained	60 requests/min
Burst	120 requests

Exceeding it returns rate_limit_exceeded (HTTP 429) with a Retry-After header. Limits are enforced globally across all API instances (token bucket), not per instance.

Gateway retries (on our side)

For transient upstream failures — connect errors and 5xx/429 before any byte of a streamed response — the gateway retries at most 2 times with jittered backoff, honoring upstream Retry-After. Mid-stream failures are never retried; you’ll receive the stream’s error event instead of silently duplicated content.