Usage & Credits
Usage accounting
Every request — streamed or not — reports token usage in the ingress format’s native usage object, normalized across providers:
- Prompt tokens, broken into cached vs. uncached (cached prompt tokens are
billed at the model’s discounted
cached_promptrate where available), - Completion tokens,
- Reasoning tokens (counted and billed as completion-side output for reasoning models, reported separately).
Chat Completions
{
"usage": {
"prompt_tokens": 2145,
"completion_tokens": 312,
"total_tokens": 2457,
"prompt_tokens_details": { "cached_tokens": 2048 },
"completion_tokens_details": { "reasoning_tokens": 128 }
}
}In streams, the final usage chunk is always emitted — see Streaming. If your client aborts a stream, the tokens generated up to the abort are still billed and recorded.
Cost of a request
cost = prompt_tokens × prompt rate ($/Mtok)
+ cached_prompt_tokens × cached rate ($/Mtok)
+ output tokens (incl. reasoning) × completion rate ($/Mtok)
+ server-tool executions (itemized separately)Per-model rates are on the Pricing page, sourced from the same catalog the
API serves at /models.
Post-hoc metadata: GET /generation
Every response carries a request ID (the id field of the response object). Query it
for post-hoc metadata — model, token breakdown, cost in credits, latency:
curl "https://api.hyperinfer.ai/api/v1/generation?id=gen-abc123" \
-H "Authorization: Bearer $HYPERINFER_API_KEY"See the GET /generation reference for the response shape, and try it in the playground there.
The credits system
Credits are workspace-scoped, prepaid, denominated in USD, and stored as an append-only ledger: top-ups add, usage and server-tool charges subtract. Your balance is always the exact ledger sum — every charge of any kind is a visible ledger item.
- A request is admitted only while the workspace balance is positive.
- Single-response overshoot: if one response ends up costing more than your remaining balance, it is still delivered and billed in full — your balance goes negative, and no further requests are admitted until you top up. Top-ups clear the negative amount first, then credit beyond zero.
- Credits never expire.
- Top-ups: card (5.5% service fee) or crypto/USDC (5% total fee), minimum $5 — exact math on the Pricing page.
Enterprise workspaces can be switched to monthly invoicing: no prepaid credits — consumption accrues through the calendar month (UTC) and is invoiced on the 1st. Contact us to enable it.