Skip to Content
Models & Routing

Models & Routing

Model slugs

HyperInfer uses the same model slugs as OpenRouterprovider/model, e.g. anthropic/claude-sonnet-4.5 or openai/gpt-4o-mini. If you are migrating from OpenRouter, your model strings work unchanged.

The live catalog — pricing, context length, max output tokens, supported features, datacenter locations — is published at GET /api/v1/models (no auth required) and rendered on the Pricing page.

Any format, any model

The gateway decouples the ingress format (how you talk to us) from the model (who serves the request):

Your request (any ingress format) canonical internal representation (IR) │ middleware chain (server tools, metering) provider's native format ──▶ model ──▶ provider response translated back into YOUR ingress format (incl. the stream)

A Chat Completions request for anthropic/claude-sonnet-4.5 is translated to the provider’s native format upstream and comes back as a well-formed Chat Completions response. Identity translations (ingress format = provider format) pass through the same IR and middleware path, so behavior is uniform.

What round-trips across formats

The IR preserves, per ingress format: system/developer messages, multi-turn roles, tool definitions + calls + results, image and file inputs, reasoning content where the format expresses it, sampling parameters (temperature, top_p, max output tokens, stop), JSON/response-format mode, and streaming deltas.

How each capability is expressed per format:

CapabilityChat CompletionsResponsesAnthropic Messages
System promptsystem / developer messageinstructionstop-level system
Multi-turn rolesmessages[]input[] itemsmessages[]
Tool definitionstools[].functiontools[]tools[] (input_schema)
Tool callsassistant tool_callsfunction_call output itemstool_use content blocks
Tool resultsrole: "tool" messagesfunction_call_output itemstool_result content blocks
Imagesimage_url content partsinput_imageimage content blocks
Files (e.g. PDF)file content partsinput_filedocument content blocks
Reasoning contentnot expressible — see gapsreasoning output itemsthinking content blocks
Max outputmax_tokens / max_completion_tokensmax_output_tokensmax_tokens (required)
Samplingtemperature, top_ptemperature, top_ptemperature, top_p
Stop sequencesstopnot expressible — see gapsstop_sequences
JSON moderesponse_formattext.formatnot expressible — see gaps
Streamingchat.completion.chunk SSEsemantic SSE eventsAnthropic event stream

Capability gaps degrade explicitly

Where a capability cannot be expressed in the ingress format, the gap is documented and explicit — content is never silently dropped:

  • Reasoning content in Chat Completions: the format has no standard field for reasoning output. Reasoning tokens are still counted and reported in usage; the reasoning text itself is only surfaced by formats that can express it (Responses, Messages).
  • Stop sequences in Responses: the Responses format does not carry stop sequences; sending them there is rejected as invalid_request rather than ignored.
  • JSON mode in Messages: the Anthropic format has no native JSON response mode; use a tool definition with an input_schema to enforce structured output.

This table summarizes the translation contract. The authoritative, exhaustive mapping tables are maintained alongside the gateway’s golden-fixture test matrix and the generated OpenAPI schema.