Responses


POST /api/v1/responses

OpenAI Responses format. Works with every model in the catalog via any-to-any translation.

Request body

Body — application/json (required)

modelstringrequired

inputstring | (message | function_call | function_call_output | reasoning)[]required

Show 2 variants

stringstring

Show 4 variants

messageobject

Show 3 nested fields

type"message"

rolestringrequired

Possible values: system developer user assistant

contentstring | (input_text | output_text | input_image | input_file)[]required

Show 2 variants

stringstring

Show 4 variants

input_textobject

Show 2 nested fields

type"input_text"required

textstringrequired

output_textobject

Show 2 nested fields

type"output_text"required

textstringrequired

input_imageobject

Show 3 nested fields

type"input_image"required

image_urlstringrequired

detailstring

input_fileobject

Show 4 nested fields

type"input_file"required

filenamestring

file_datastring

file_urlstring

function_callobject

Show 4 nested fields

type"function_call"required

call_idstringrequired

namestringrequired

argumentsstringrequired

function_call_outputobject

Show 3 nested fields

type"function_call_output"required

call_idstringrequired

outputstringrequired

reasoningobject

Show 2 nested fields

type"reasoning"required

summaryobject[]

Default: []

Show 2 nested fields

type"summary_text"required

textstringrequired

instructionsstring | null

tools(function | object)[]

Show 2 variants

functionobject

Show 5 nested fields

type"function"required

namestringrequired

descriptionstring | null

parametersobject | null

strictboolean | null

objectobject

Show 2 nested fields

typestringrequired

Possible values: web_search web_fetch pdf datetime image

eventsboolean

tool_choicestring | function

Show 2 variants

stringstring

Possible values: auto none required

functionobject

Show 2 nested fields

type"function"required

namestringrequired

max_output_tokensinteger | null

temperaturenumber | null

top_pnumber | null

streamboolean

textobject

Show 1 nested field

formatobject

Show 4 nested fields

typestringrequired

Possible values: text json_object json_schema

namestring

schemaobject

strictboolean

reasoningobject | null

Show 1 nested field

effortstring | null

Possible values: low medium high

previous_response_idstring | null

storeboolean

hi_tool_eventsboolean

Generated at build time from the API's OpenAPI document — the same schemas that validate requests, so this section cannot drift from the API.

input accepts plain text or a list of input items. stream: true switches to the semantic SSE event protocol — see Streaming. This endpoint is stateless: previous_response_id is rejected — send the full input. Server tools activate as built-in tool types in tools — see server tools.

Stop sequences are not expressible in this format and are rejected as invalid_request — see the capability gaps.

Response


{
  "id": "resp_def456",
  "object": "response",
  "created_at": 1767312000,
  "status": "completed",
  "model": "anthropic/claude-sonnet-4.5",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [{ "type": "output_text", "text": "Speculative decoding drafts tokens…" }]
    }
  ],
  "usage": {
    "input_tokens": 21,
    "output_tokens": 34,
    "total_tokens": 55,
    "input_tokens_details": { "cached_tokens": 0 },
    "output_tokens_details": { "reasoning_tokens": 0 }
  }
}

Reasoning models emit reasoning output items ahead of the message. The id is resp_<request id> — pass the request ID (also returned in the X-Request-Id header) to GET /generation.

Response schema

200Response

The response object. With `stream: true`, an SSE stream instead.

application/json

idstringrequired

resp_<request id>.

object"response"required

created_atintegerrequired

Unix seconds.

statusstringrequired

Possible values: completed incomplete

incomplete_detailsobject | null

Show 1 nested field

reasonstring

Possible values: max_output_tokens content_filter

modelstringrequired

outputobject[]required

Output items in order: optional `reasoning` (summary_text), `message` (output_text content), and one `function_call` per tool call.

usageobjectrequired

Show 5 nested fields

input_tokensinteger

input_tokens_detailsobject

Show 1 nested field

cached_tokensinteger

output_tokensinteger

output_tokens_detailsobject

Show 1 nested field

reasoning_tokensinteger

total_tokensinteger

text/event-streamSSE stream

Semantic SSE event protocol (`event:` + `data:` frames): `response.created` / `response.in_progress`, then per-item events (`response.output_item.added`, `response.output_text.delta`, `response.function_call_arguments.delta`, matching `*.done` events), ending with `response.completed` carrying the full response incl. usage (002-R8). Keep-alive comments every 15 s.

errorError envelope (any non-2xx status)

Error in this ingress format's native envelope, with the stable taxonomy `code` (002-R7). See the ErrorCode schema for the code → HTTP status mapping.

errorobjectrequired

Show 5 nested fields

messagestringrequired

typestringrequired

OpenAI-compatible error class (e.g. invalid_request_error).

codestringrequired

Stable error taxonomy (002-R7), identical across ingress formats. HTTP status per code: provider_auth=502, provider_rate_limit=429, provider_overloaded=529, context_length_exceeded=400, content_filter=400, provider_timeout=504, provider_unavailable=502, insufficient_credits=402, key_limit_exceeded=402, model_not_allowed=403, invalid_api_key=401, workspace_locked=403, rate_limit_exceeded=429, invalid_request=400, payload_too_large=413, internal_error=500.

Possible values: provider_auth provider_rate_limit provider_overloaded context_length_exceeded content_filter provider_timeout provider_unavailable insufficient_credits key_limit_exceeded model_not_allowed invalid_api_key workspace_locked rate_limit_exceeded invalid_request payload_too_large internal_error

paramstring | null

request_idstring

Generated at build time from the API's OpenAPI document — the same schemas that validate requests, so this section cannot drift from the API.

Examples

curl


curl https://api.hyperinfer.ai/api/v1/responses \
  -H "Authorization: Bearer $HYPERINFER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4.5",
    "instructions": "You are a concise assistant.",
    "input": "In one sentence: what is speculative decoding?",
    "max_output_tokens": 256
  }'

TypeScript


import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.hyperinfer.ai/api/v1",
  apiKey: process.env.HYPERINFER_API_KEY,
});
 
const response = await client.responses.create({
  model: "anthropic/claude-sonnet-4.5",
  instructions: "You are a concise assistant.",
  input: "In one sentence: what is speculative decoding?",
  max_output_tokens: 256,
});
 
console.log(response.output_text);

Python


import os
from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.hyperinfer.ai/api/v1",
    api_key=os.environ["HYPERINFER_API_KEY"],
)
 
response = client.responses.create(
    model="anthropic/claude-sonnet-4.5",
    instructions="You are a concise assistant.",
    input="In one sentence: what is speculative decoding?",
    max_output_tokens=256,
)
 
print(response.output_text)

Errors

OpenAI error object with the stable taxonomy.

Playground

Checking session…

POST /api/v1/responsesmodelstream

Request as curl

curl https://api.hyperinfer.ai/api/v1/responses \
  -H "Authorization: Bearer $HYPERINFER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "deepseek/deepseek-v4-flash",
  "input": "In one sentence: what is speculative decoding?",
  "instructions": "You are a concise assistant.",
  "max_output_tokens": 256
}'