Chat Completions
POST /api/v1/chat/completionsOpenAI Chat Completions format. Works with every model in the catalog — including Anthropic-slugged models — via any-to-any translation.
Request body
Body — application/json (required)
modelstringrequiredmessagesobject[]requiredShow 6 nested fields
rolestringrequiredPossible values: system developer user assistant tool
contentstring | (text | image_url | file)[] | nullShow 2 variants
stringstring(text | image_url | file)[](text | image_url | file)[]Show 3 variants
textobjectShow 2 nested fields
type"text"requiredtextstringrequiredimage_urlobjectShow 2 nested fields
type"image_url"requiredimage_urlobjectrequiredShow 2 nested fields
urlstringrequireddetailstringfileobjectShow 2 nested fields
type"file"requiredfileobjectrequiredShow 4 nested fields
filenamestringfile_datastringfile_urlstringfile_idstringnamestringreasoningstring | nulltool_callsobject[]Show 3 nested fields
idstringrequiredtype"function"requiredfunctionobjectrequiredShow 2 nested fields
namestringrequiredargumentsstringrequiredtool_call_idstringtoolsobject[]Show 2 nested fields
type"function"requiredfunctionobjectrequiredShow 3 nested fields
namestringrequireddescriptionstringparametersobjecttool_choicestring | functionShow 2 variants
stringstringPossible values: auto none required
functionobjectShow 2 nested fields
type"function"requiredfunctionobjectrequiredShow 1 nested field
namestringrequiredmax_tokensintegermax_completion_tokensintegertemperaturenumbertop_pnumberstopstring | string[]response_formatobjectShow 2 nested fields
typestringrequiredPossible values: text json_object json_schema
json_schemaobjectShow 3 nested fields
namestringrequiredschemaobjectrequiredstrictbooleanreasoning_effortstringPossible values: low medium high
streambooleanstream_optionsobjectShow 1 nested field
include_usagebooleanpluginsobject[]Show 2 nested fields
idstringrequiredPossible values: web_search web_fetch pdf datetime image
eventsbooleanhi_tool_eventsbooleanuserstringGenerated at build time from the API's OpenAPI document — the same schemas that validate requests, so this section cannot drift from the API.
max_tokens and max_completion_tokens are aliases — either caps generated tokens at
the model’s max output. stream: true switches to SSE — see Streaming;
stream_options is accepted for compatibility (the final usage chunk is always
emitted). plugins is the HyperInfer extension for server tools.
Response
{
"id": "gen-abc123",
"object": "chat.completion",
"created": 1767312000,
"model": "anthropic/claude-sonnet-4.5",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Hello there, all five words." },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 7,
"total_tokens": 19,
"prompt_tokens_details": { "cached_tokens": 0 },
"completion_tokens_details": { "reasoning_tokens": 0 }
}
}The id is gen-<request id> — pass the request ID (also returned in the
X-Request-Id header) to GET /generation for cost and
latency metadata.
Response schema
200Response
The completion. With `stream: true`, an SSE stream instead (see the text/event-stream variant).
application/json
idstringrequiredgen-<request id> — pass the request id to GET /api/v1/generation.
object"chat.completion"requiredcreatedintegerrequiredUnix seconds.
modelstringrequiredchoicesobject[]requiredShow 4 nested fields
indexintegermessageobjectShow 4 nested fields
role"assistant"contentstring | nullreasoningstringtool_callsobject[]Show 3 nested fields
idstringtype"function"functionobjectShow 2 nested fields
namestringargumentsstringfinish_reasonstringPossible values: stop length tool_calls content_filter
logprobsnullusageobjectrequiredShow 6 nested fields
prompt_tokensintegerrequiredcompletion_tokensintegerrequiredtotal_tokensintegerrequiredprompt_tokens_detailsobjectShow 1 nested field
cached_tokensintegercompletion_tokens_detailsobjectShow 1 nested field
reasoning_tokensintegercostnumberHyperInfer usage-accounting extension: total cost in credits (USD).
text/event-streamSSE streamSSE stream of `chat.completion.chunk` objects (`data:` frames): a role frame, then content/reasoning/tool_call deltas, a finish_reason frame, a final usage-only chunk (always emitted — stream_options.include_usage semantics, 002-R8), and `data: [DONE]`. Keep-alive comments every 15 s.
errorError envelope (any non-2xx status)
Error in this ingress format's native envelope, with the stable taxonomy `code` (002-R7). See the ErrorCode schema for the code → HTTP status mapping.
errorobjectrequiredShow 5 nested fields
messagestringrequiredtypestringrequiredOpenAI-compatible error class (e.g. invalid_request_error).
codestringrequiredStable error taxonomy (002-R7), identical across ingress formats. HTTP status per code: provider_auth=502, provider_rate_limit=429, provider_overloaded=529, context_length_exceeded=400, content_filter=400, provider_timeout=504, provider_unavailable=502, insufficient_credits=402, key_limit_exceeded=402, model_not_allowed=403, invalid_api_key=401, workspace_locked=403, rate_limit_exceeded=429, invalid_request=400, payload_too_large=413, internal_error=500.
Possible values: provider_auth provider_rate_limit provider_overloaded context_length_exceeded content_filter provider_timeout provider_unavailable insufficient_credits key_limit_exceeded model_not_allowed invalid_api_key workspace_locked rate_limit_exceeded invalid_request payload_too_large internal_error
paramstring | nullrequest_idstringGenerated at build time from the API's OpenAPI document — the same schemas that validate requests, so this section cannot drift from the API.
Examples
curl
curl https://api.hyperinfer.ai/api/v1/chat/completions \
-H "Authorization: Bearer $HYPERINFER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4.5",
"messages": [
{ "role": "system", "content": "You are a concise assistant." },
{ "role": "user", "content": "In one sentence: what is speculative decoding?" }
],
"max_tokens": 256
}'Errors
OpenAI error object with the stable taxonomy, e.g. HTTP 402:
{ "error": { "message": "…", "type": "insufficient_credits", "code": "insufficient_credits", "param": null } }