Models & Routing

Model slugs

HyperInfer uses the same model slugs as OpenRouter — provider/model, e.g. anthropic/claude-sonnet-4.5 or openai/gpt-4o-mini. If you are migrating from OpenRouter, your model strings work unchanged.

The live catalog — pricing, context length, max output tokens, supported features, datacenter locations — is published at GET /api/v1/models (no auth required) and rendered on the Pricing page.

Any format, any model

The gateway decouples the ingress format (how you talk to us) from the model (who serves the request):


Your request (any ingress format)
        │
        ▼
  canonical internal representation (IR)
        │  middleware chain (server tools, metering)
        ▼
  provider's native format ──▶ model ──▶ provider response
        │
        ▼
  translated back into YOUR ingress format (incl. the stream)

A Chat Completions request for anthropic/claude-sonnet-4.5 is translated to the provider’s native format upstream and comes back as a well-formed Chat Completions response. Identity translations (ingress format = provider format) pass through the same IR and middleware path, so behavior is uniform.

What round-trips across formats

The IR preserves, per ingress format: system/developer messages, multi-turn roles, tool definitions + calls + results, image and file inputs, reasoning content where the format expresses it, sampling parameters (temperature, top_p, max output tokens, stop), JSON/response-format mode, and streaming deltas.

How each capability is expressed per format:

Capability	Chat Completions	Responses	Anthropic Messages
System prompt	`system` / `developer` message	`instructions`	top-level `system`
Multi-turn roles	`messages[]`	`input[]` items	`messages[]`
Tool definitions	`tools[].function`	`tools[]`	`tools[]` (`input_schema`)
Tool calls	assistant `tool_calls`	`function_call` output items	`tool_use` content blocks
Tool results	`role: "tool"` messages	`function_call_output` items	`tool_result` content blocks
Images	`image_url` content parts	`input_image`	`image` content blocks
Files (e.g. PDF)	`file` content parts	`input_file`	`document` content blocks
Reasoning content	not expressible — see gaps	`reasoning` output items	`thinking` content blocks
Max output	`max_tokens` / `max_completion_tokens`	`max_output_tokens`	`max_tokens` (required)
Sampling	`temperature`, `top_p`	`temperature`, `top_p`	`temperature`, `top_p`
Stop sequences	`stop`	not expressible — see gaps	`stop_sequences`
JSON mode	`response_format`	`text.format`	not expressible — see gaps
Streaming	`chat.completion.chunk` SSE	semantic SSE events	Anthropic event stream

Capability gaps degrade explicitly

Where a capability cannot be expressed in the ingress format, the gap is documented and explicit — content is never silently dropped:

Reasoning content in Chat Completions: the format has no standard field for reasoning output. Reasoning tokens are still counted and reported in usage; the reasoning text itself is only surfaced by formats that can express it (Responses, Messages).
Stop sequences in Responses: the Responses format does not carry stop sequences; sending them there is rejected as invalid_request rather than ignored.
JSON mode in Messages: the Anthropic format has no native JSON response mode; use a tool definition with an input_schema to enforce structured output.

This table summarizes the translation contract. The authoritative, exhaustive mapping tables are maintained alongside the gateway’s golden-fixture test matrix and the generated OpenAPI schema.