Models & Routing
Model slugs
HyperInfer uses the same model slugs as OpenRouter — provider/model, e.g.
anthropic/claude-sonnet-4.5 or openai/gpt-4o-mini. If you are migrating from
OpenRouter, your model strings work unchanged.
The live catalog — pricing, context length, max output tokens, supported features,
datacenter locations — is published at GET /api/v1/models
(no auth required) and rendered on the Pricing page.
Any format, any model
The gateway decouples the ingress format (how you talk to us) from the model (who serves the request):
Your request (any ingress format)
│
▼
canonical internal representation (IR)
│ middleware chain (server tools, metering)
▼
provider's native format ──▶ model ──▶ provider response
│
▼
translated back into YOUR ingress format (incl. the stream)A Chat Completions request for anthropic/claude-sonnet-4.5 is translated to the
provider’s native format upstream and comes back as a well-formed Chat Completions
response. Identity translations (ingress format = provider format) pass through the
same IR and middleware path, so behavior is uniform.
What round-trips across formats
The IR preserves, per ingress format: system/developer messages, multi-turn roles, tool
definitions + calls + results, image and file inputs, reasoning content where the format
expresses it, sampling parameters (temperature, top_p, max output tokens, stop),
JSON/response-format mode, and streaming deltas.
How each capability is expressed per format:
| Capability | Chat Completions | Responses | Anthropic Messages |
|---|---|---|---|
| System prompt | system / developer message | instructions | top-level system |
| Multi-turn roles | messages[] | input[] items | messages[] |
| Tool definitions | tools[].function | tools[] | tools[] (input_schema) |
| Tool calls | assistant tool_calls | function_call output items | tool_use content blocks |
| Tool results | role: "tool" messages | function_call_output items | tool_result content blocks |
| Images | image_url content parts | input_image | image content blocks |
| Files (e.g. PDF) | file content parts | input_file | document content blocks |
| Reasoning content | not expressible — see gaps | reasoning output items | thinking content blocks |
| Max output | max_tokens / max_completion_tokens | max_output_tokens | max_tokens (required) |
| Sampling | temperature, top_p | temperature, top_p | temperature, top_p |
| Stop sequences | stop | not expressible — see gaps | stop_sequences |
| JSON mode | response_format | text.format | not expressible — see gaps |
| Streaming | chat.completion.chunk SSE | semantic SSE events | Anthropic event stream |
Capability gaps degrade explicitly
Where a capability cannot be expressed in the ingress format, the gap is documented and explicit — content is never silently dropped:
- Reasoning content in Chat Completions: the format has no standard field for
reasoning output. Reasoning tokens are still counted and reported in
usage; the reasoning text itself is only surfaced by formats that can express it (Responses, Messages). - Stop sequences in Responses: the Responses format does not carry stop sequences;
sending them there is rejected as
invalid_requestrather than ignored. - JSON mode in Messages: the Anthropic format has no native JSON response mode; use
a tool definition with an
input_schemato enforce structured output.
This table summarizes the translation contract. The authoritative, exhaustive mapping tables are maintained alongside the gateway’s golden-fixture test matrix and the generated OpenAPI schema.