Messages
POST /api/v1/messagesAnthropic Messages format. Works with every model in the catalog — including OpenAI-slugged models — via any-to-any translation.
Authentication is Authorization: Bearer like every HyperInfer endpoint —
use the Anthropic SDKs’ authToken option instead of
x-api-key. No anthropic-version header is required.
Request body
Body — application/json (required)
modelstringrequiredmax_tokensintegerrequiredmessagesobject[]requiredShow 2 nested fields
rolestringrequiredPossible values: user assistant
contentstring | (text | thinking | image | document | tool_use | tool_result)[]requiredShow 2 variants
stringstring(text | thinking | image | document | tool_use | tool_result)[](text | thinking | image | document | tool_use | tool_result)[]Show 6 variants
textobjectShow 2 nested fields
type"text"requiredtextstringrequiredthinkingobjectShow 3 nested fields
type"thinking"requiredthinkingstringrequiredsignaturestringimageobjectShow 2 nested fields
type"image"requiredsourcebase64 | urlrequiredShow 2 variants
base64objectShow 3 nested fields
type"base64"requiredmedia_typestringrequireddatastringrequiredurlobjectShow 2 nested fields
type"url"requiredurlstringrequireddocumentobjectShow 3 nested fields
type"document"requiredsourcebase64 | urlrequiredShow 2 variants
base64objectShow 3 nested fields
type"base64"requiredmedia_typestringrequireddatastringrequiredurlobjectShow 2 nested fields
type"url"requiredurlstringrequiredtitlestringtool_useobjectShow 4 nested fields
type"tool_use"requiredidstringrequirednamestringrequiredinputobjectrequiredtool_resultobjectShow 4 nested fields
type"tool_result"requiredtool_use_idstringrequiredcontentstring | (text | image)[]Show 2 variants
stringstring(text | image)[](text | image)[]Show 2 variants
textobjectShow 2 nested fields
type"text"requiredtextstringrequiredimageobjectShow 2 nested fields
type"image"requiredsourcebase64 | urlrequiredShow 2 variants
base64objectShow 3 nested fields
type"base64"requiredmedia_typestringrequireddatastringrequiredurlobjectShow 2 nested fields
type"url"requiredurlstringrequiredis_errorbooleansystemstring | object[]Show 2 variants
stringstringobject[]object[]Show 2 nested fields
type"text"requiredtextstringrequiredtoolsobject[]Show 3 nested fields
namestringrequireddescriptionstringinput_schemaobjectrequiredtool_choiceobjectShow 2 nested fields
typestringrequiredPossible values: auto any tool none
namestringtemperaturenumbertop_pnumberstop_sequencesstring[]streambooleanthinkingobjectShow 2 nested fields
typestringrequiredPossible values: enabled disabled
budget_tokensintegermetadataobjectpluginsobject[]Show 2 nested fields
idstringrequiredPossible values: web_search web_fetch pdf datetime image
eventsbooleanhi_tool_eventsbooleanGenerated at build time from the API's OpenAPI document — the same schemas that validate requests, so this section cannot drift from the API.
max_tokens is required by this format. system is the top-level system prompt
(not a message). stream: true switches to the Anthropic event stream — see
Streaming. plugins is the HyperInfer extension for
server tools.
JSON mode is not expressible in this format — use a tool definition to enforce structured output (capability gaps).
Response
{
"id": "msg_ghi789",
"type": "message",
"role": "assistant",
"model": "openai/gpt-4o-mini",
"content": [
{ "type": "text", "text": "Speculative decoding drafts tokens with a small model…" }
],
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 24,
"cache_read_input_tokens": 0,
"cache_creation_input_tokens": 0,
"output_tokens": 31
}
}Reasoning models emit thinking content blocks ahead of the text. The id is
msg_<request id> — pass the request ID (also returned in the X-Request-Id header)
to GET /generation.
Response schema
200Response
The message. With `stream: true`, an SSE stream instead.
application/json
idstringrequiredmsg_<request id>.
type"message"requiredrole"assistant"requiredmodelstringrequiredcontentobject[]requiredContent blocks: `text`, `thinking`, and `tool_use`.
stop_reasonstringrequiredPossible values: end_turn max_tokens tool_use refusal
stop_sequencestring | nullusageobjectrequiredAnthropic semantics: input_tokens excludes cache reads.
Show 4 nested fields
input_tokensintegeroutput_tokensintegercache_read_input_tokensintegercache_creation_input_tokensintegertext/event-streamSSE streamAnthropic content-block SSE protocol (`event:` + `data:` frames): `message_start`, then `content_block_start` / `content_block_delta` / `content_block_stop` per block (text_delta, thinking_delta, input_json_delta), `message_delta` with stop_reason + usage (002-R8), and `message_stop`. Keep-alive comments every 15 s.
errorError envelope (any non-2xx status)
Error in this ingress format's native envelope, with the stable taxonomy `code` (002-R7). See the ErrorCode schema for the code → HTTP status mapping.
type"error"requirederrorobjectrequiredShow 3 nested fields
typestringrequiredAnthropic-compatible error class (e.g. invalid_request_error).
codestringrequiredStable error taxonomy (002-R7), identical across ingress formats. HTTP status per code: provider_auth=502, provider_rate_limit=429, provider_overloaded=529, context_length_exceeded=400, content_filter=400, provider_timeout=504, provider_unavailable=502, insufficient_credits=402, key_limit_exceeded=402, model_not_allowed=403, invalid_api_key=401, workspace_locked=403, rate_limit_exceeded=429, invalid_request=400, payload_too_large=413, internal_error=500.
Possible values: provider_auth provider_rate_limit provider_overloaded context_length_exceeded content_filter provider_timeout provider_unavailable insufficient_credits key_limit_exceeded model_not_allowed invalid_api_key workspace_locked rate_limit_exceeded invalid_request payload_too_large internal_error
messagestringrequiredrequest_idstringGenerated at build time from the API's OpenAPI document — the same schemas that validate requests, so this section cannot drift from the API.
Examples
curl
curl https://api.hyperinfer.ai/api/v1/messages \
-H "Authorization: Bearer $HYPERINFER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"max_tokens": 256,
"system": "You are a concise assistant.",
"messages": [
{ "role": "user", "content": "In one sentence: what is speculative decoding?" }
]
}'Errors
Anthropic error envelope with the stable taxonomy, e.g. HTTP 402:
{ "type": "error", "error": { "type": "insufficient_credits", "message": "…" } }