Skip to Content
API ReferencePOST /chat/completions

Chat Completions

POST /api/v1/chat/completions

OpenAI Chat Completions format. Works with every model in the catalog — including Anthropic-slugged models — via any-to-any translation.

Request body

Body — application/json (required)

modelstringrequired
messagesobject[]required
Show 6 nested fields
rolestringrequired

Possible values: system developer user assistant tool

contentstring | (text | image_url | file)[] | null
Show 2 variants
stringstring
(text | image_url | file)[](text | image_url | file)[]
Show 3 variants
textobject
Show 2 nested fields
type"text"required
textstringrequired
image_urlobject
Show 2 nested fields
type"image_url"required
image_urlobjectrequired
Show 2 nested fields
urlstringrequired
detailstring
fileobject
Show 2 nested fields
type"file"required
fileobjectrequired
Show 4 nested fields
filenamestring
file_datastring
file_urlstring
file_idstring
namestring
reasoningstring | null
tool_callsobject[]
Show 3 nested fields
idstringrequired
type"function"required
functionobjectrequired
Show 2 nested fields
namestringrequired
argumentsstringrequired
tool_call_idstring
toolsobject[]
Show 2 nested fields
type"function"required
functionobjectrequired
Show 3 nested fields
namestringrequired
descriptionstring
parametersobject
tool_choicestring | function
Show 2 variants
stringstring

Possible values: auto none required

functionobject
Show 2 nested fields
type"function"required
functionobjectrequired
Show 1 nested field
namestringrequired
max_tokensinteger
max_completion_tokensinteger
temperaturenumber
top_pnumber
stopstring | string[]
response_formatobject
Show 2 nested fields
typestringrequired

Possible values: text json_object json_schema

json_schemaobject
Show 3 nested fields
namestringrequired
schemaobjectrequired
strictboolean
reasoning_effortstring

Possible values: low medium high

streamboolean
stream_optionsobject
Show 1 nested field
include_usageboolean
pluginsobject[]
Show 2 nested fields
idstringrequired

Possible values: web_search web_fetch pdf datetime image

eventsboolean
hi_tool_eventsboolean
userstring

Generated at build time from the API's OpenAPI document — the same schemas that validate requests, so this section cannot drift from the API.

max_tokens and max_completion_tokens are aliases — either caps generated tokens at the model’s max output. stream: true switches to SSE — see Streaming; stream_options is accepted for compatibility (the final usage chunk is always emitted). plugins is the HyperInfer extension for server tools.

Response

{ "id": "gen-abc123", "object": "chat.completion", "created": 1767312000, "model": "anthropic/claude-sonnet-4.5", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello there, all five words." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 7, "total_tokens": 19, "prompt_tokens_details": { "cached_tokens": 0 }, "completion_tokens_details": { "reasoning_tokens": 0 } } }

The id is gen-<request id> — pass the request ID (also returned in the X-Request-Id header) to GET /generation for cost and latency metadata.

Response schema

200Response

The completion. With `stream: true`, an SSE stream instead (see the text/event-stream variant).

application/json

idstringrequired

gen-<request id> — pass the request id to GET /api/v1/generation.

object"chat.completion"required
createdintegerrequired

Unix seconds.

modelstringrequired
choicesobject[]required
Show 4 nested fields
indexinteger
messageobject
Show 4 nested fields
role"assistant"
contentstring | null
reasoningstring
tool_callsobject[]
Show 3 nested fields
idstring
type"function"
functionobject
Show 2 nested fields
namestring
argumentsstring
finish_reasonstring

Possible values: stop length tool_calls content_filter

logprobsnull
usageobjectrequired
Show 6 nested fields
prompt_tokensintegerrequired
completion_tokensintegerrequired
total_tokensintegerrequired
prompt_tokens_detailsobject
Show 1 nested field
cached_tokensinteger
completion_tokens_detailsobject
Show 1 nested field
reasoning_tokensinteger
costnumber

HyperInfer usage-accounting extension: total cost in credits (USD).

text/event-streamSSE stream

SSE stream of `chat.completion.chunk` objects (`data:` frames): a role frame, then content/reasoning/tool_call deltas, a finish_reason frame, a final usage-only chunk (always emitted — stream_options.include_usage semantics, 002-R8), and `data: [DONE]`. Keep-alive comments every 15 s.

errorError envelope (any non-2xx status)

Error in this ingress format's native envelope, with the stable taxonomy `code` (002-R7). See the ErrorCode schema for the code → HTTP status mapping.

errorobjectrequired
Show 5 nested fields
messagestringrequired
typestringrequired

OpenAI-compatible error class (e.g. invalid_request_error).

codestringrequired

Stable error taxonomy (002-R7), identical across ingress formats. HTTP status per code: provider_auth=502, provider_rate_limit=429, provider_overloaded=529, context_length_exceeded=400, content_filter=400, provider_timeout=504, provider_unavailable=502, insufficient_credits=402, key_limit_exceeded=402, model_not_allowed=403, invalid_api_key=401, workspace_locked=403, rate_limit_exceeded=429, invalid_request=400, payload_too_large=413, internal_error=500.

Possible values: provider_auth provider_rate_limit provider_overloaded context_length_exceeded content_filter provider_timeout provider_unavailable insufficient_credits key_limit_exceeded model_not_allowed invalid_api_key workspace_locked rate_limit_exceeded invalid_request payload_too_large internal_error

paramstring | null
request_idstring

Generated at build time from the API's OpenAPI document — the same schemas that validate requests, so this section cannot drift from the API.

Examples

curl https://api.hyperinfer.ai/api/v1/chat/completions \ -H "Authorization: Bearer $HYPERINFER_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "anthropic/claude-sonnet-4.5", "messages": [ { "role": "system", "content": "You are a concise assistant." }, { "role": "user", "content": "In one sentence: what is speculative decoding?" } ], "max_tokens": 256 }'

Errors

OpenAI error object with the stable taxonomy, e.g. HTTP 402:

{ "error": { "message": "…", "type": "insufficient_credits", "code": "insufficient_credits", "param": null } }

Playground

Checking session…
POST /api/v1/chat/completions
Request as curl
curl https://api.hyperinfer.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $HYPERINFER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "deepseek/deepseek-v4-flash",
  "messages": [
    {
      "role": "system",
      "content": "You are a concise assistant."
    },
    {
      "role": "user",
      "content": "In one sentence: what is speculative decoding?"
    }
  ],
  "max_tokens": 256,
  "temperature": 0.7
}'