Skip to Content
API ReferencePOST /messages

Messages

POST /api/v1/messages

Anthropic Messages format. Works with every model in the catalog — including OpenAI-slugged models — via any-to-any translation.

Authentication is Authorization: Bearer like every HyperInfer endpoint — use the Anthropic SDKs’ authToken option instead of x-api-key. No anthropic-version header is required.

Request body

Body — application/json (required)

modelstringrequired
max_tokensintegerrequired
messagesobject[]required
Show 2 nested fields
rolestringrequired

Possible values: user assistant

contentstring | (text | thinking | image | document | tool_use | tool_result)[]required
Show 2 variants
stringstring
(text | thinking | image | document | tool_use | tool_result)[](text | thinking | image | document | tool_use | tool_result)[]
Show 6 variants
textobject
Show 2 nested fields
type"text"required
textstringrequired
thinkingobject
Show 3 nested fields
type"thinking"required
thinkingstringrequired
signaturestring
imageobject
Show 2 nested fields
type"image"required
sourcebase64 | urlrequired
Show 2 variants
base64object
Show 3 nested fields
type"base64"required
media_typestringrequired
datastringrequired
urlobject
Show 2 nested fields
type"url"required
urlstringrequired
documentobject
Show 3 nested fields
type"document"required
sourcebase64 | urlrequired
Show 2 variants
base64object
Show 3 nested fields
type"base64"required
media_typestringrequired
datastringrequired
urlobject
Show 2 nested fields
type"url"required
urlstringrequired
titlestring
tool_useobject
Show 4 nested fields
type"tool_use"required
idstringrequired
namestringrequired
inputobjectrequired
tool_resultobject
Show 4 nested fields
type"tool_result"required
tool_use_idstringrequired
contentstring | (text | image)[]
Show 2 variants
stringstring
(text | image)[](text | image)[]
Show 2 variants
textobject
Show 2 nested fields
type"text"required
textstringrequired
imageobject
Show 2 nested fields
type"image"required
sourcebase64 | urlrequired
Show 2 variants
base64object
Show 3 nested fields
type"base64"required
media_typestringrequired
datastringrequired
urlobject
Show 2 nested fields
type"url"required
urlstringrequired
is_errorboolean
systemstring | object[]
Show 2 variants
stringstring
object[]object[]
Show 2 nested fields
type"text"required
textstringrequired
toolsobject[]
Show 3 nested fields
namestringrequired
descriptionstring
input_schemaobjectrequired
tool_choiceobject
Show 2 nested fields
typestringrequired

Possible values: auto any tool none

namestring
temperaturenumber
top_pnumber
stop_sequencesstring[]
streamboolean
thinkingobject
Show 2 nested fields
typestringrequired

Possible values: enabled disabled

budget_tokensinteger
metadataobject
pluginsobject[]
Show 2 nested fields
idstringrequired

Possible values: web_search web_fetch pdf datetime image

eventsboolean
hi_tool_eventsboolean

Generated at build time from the API's OpenAPI document — the same schemas that validate requests, so this section cannot drift from the API.

max_tokens is required by this format. system is the top-level system prompt (not a message). stream: true switches to the Anthropic event stream — see Streaming. plugins is the HyperInfer extension for server tools.

JSON mode is not expressible in this format — use a tool definition to enforce structured output (capability gaps).

Response

{ "id": "msg_ghi789", "type": "message", "role": "assistant", "model": "openai/gpt-4o-mini", "content": [ { "type": "text", "text": "Speculative decoding drafts tokens with a small model…" } ], "stop_reason": "end_turn", "stop_sequence": null, "usage": { "input_tokens": 24, "cache_read_input_tokens": 0, "cache_creation_input_tokens": 0, "output_tokens": 31 } }

Reasoning models emit thinking content blocks ahead of the text. The id is msg_<request id> — pass the request ID (also returned in the X-Request-Id header) to GET /generation.

Response schema

200Response

The message. With `stream: true`, an SSE stream instead.

application/json

idstringrequired

msg_<request id>.

type"message"required
role"assistant"required
modelstringrequired
contentobject[]required

Content blocks: `text`, `thinking`, and `tool_use`.

stop_reasonstringrequired

Possible values: end_turn max_tokens tool_use refusal

stop_sequencestring | null
usageobjectrequired

Anthropic semantics: input_tokens excludes cache reads.

Show 4 nested fields
input_tokensinteger
output_tokensinteger
cache_read_input_tokensinteger
cache_creation_input_tokensinteger
text/event-streamSSE stream

Anthropic content-block SSE protocol (`event:` + `data:` frames): `message_start`, then `content_block_start` / `content_block_delta` / `content_block_stop` per block (text_delta, thinking_delta, input_json_delta), `message_delta` with stop_reason + usage (002-R8), and `message_stop`. Keep-alive comments every 15 s.

errorError envelope (any non-2xx status)

Error in this ingress format's native envelope, with the stable taxonomy `code` (002-R7). See the ErrorCode schema for the code → HTTP status mapping.

type"error"required
errorobjectrequired
Show 3 nested fields
typestringrequired

Anthropic-compatible error class (e.g. invalid_request_error).

codestringrequired

Stable error taxonomy (002-R7), identical across ingress formats. HTTP status per code: provider_auth=502, provider_rate_limit=429, provider_overloaded=529, context_length_exceeded=400, content_filter=400, provider_timeout=504, provider_unavailable=502, insufficient_credits=402, key_limit_exceeded=402, model_not_allowed=403, invalid_api_key=401, workspace_locked=403, rate_limit_exceeded=429, invalid_request=400, payload_too_large=413, internal_error=500.

Possible values: provider_auth provider_rate_limit provider_overloaded context_length_exceeded content_filter provider_timeout provider_unavailable insufficient_credits key_limit_exceeded model_not_allowed invalid_api_key workspace_locked rate_limit_exceeded invalid_request payload_too_large internal_error

messagestringrequired
request_idstring

Generated at build time from the API's OpenAPI document — the same schemas that validate requests, so this section cannot drift from the API.

Examples

curl https://api.hyperinfer.ai/api/v1/messages \ -H "Authorization: Bearer $HYPERINFER_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-4o-mini", "max_tokens": 256, "system": "You are a concise assistant.", "messages": [ { "role": "user", "content": "In one sentence: what is speculative decoding?" } ] }'

Errors

Anthropic error envelope with the stable taxonomy, e.g. HTTP 402:

{ "type": "error", "error": { "type": "insufficient_credits", "message": "…" } }

Playground

Checking session…
POST /api/v1/messages
Request as curl
curl https://api.hyperinfer.ai/api/v1/messages \
  -H "Authorization: Bearer $HYPERINFER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "deepseek/deepseek-v4-flash",
  "max_tokens": 256,
  "system": "You are a concise assistant.",
  "messages": [
    {
      "role": "user",
      "content": "In one sentence: what is speculative decoding?"
    }
  ]
}'