Skip to Content
Usage & Credits

Usage & Credits

Usage accounting

Every request — streamed or not — reports token usage in the ingress format’s native usage object, normalized across providers:

  • Prompt tokens, broken into cached vs. uncached (cached prompt tokens are billed at the model’s discounted cached_prompt rate where available),
  • Completion tokens,
  • Reasoning tokens (counted and billed as completion-side output for reasoning models, reported separately).
{ "usage": { "prompt_tokens": 2145, "completion_tokens": 312, "total_tokens": 2457, "prompt_tokens_details": { "cached_tokens": 2048 }, "completion_tokens_details": { "reasoning_tokens": 128 } } }

In streams, the final usage chunk is always emitted — see Streaming. If your client aborts a stream, the tokens generated up to the abort are still billed and recorded.

Cost of a request

cost = prompt_tokens × prompt rate ($/Mtok) + cached_prompt_tokens × cached rate ($/Mtok) + output tokens (incl. reasoning) × completion rate ($/Mtok) + server-tool executions (itemized separately)

Per-model rates are on the Pricing page, sourced from the same catalog the API serves at /models.

Post-hoc metadata: GET /generation

Every response carries a request ID (the id field of the response object). Query it for post-hoc metadata — model, token breakdown, cost in credits, latency:

curl "https://api.hyperinfer.ai/api/v1/generation?id=gen-abc123" \ -H "Authorization: Bearer $HYPERINFER_API_KEY"

See the GET /generation reference for the response shape, and try it in the playground there.

The credits system

Credits are workspace-scoped, prepaid, denominated in USD, and stored as an append-only ledger: top-ups add, usage and server-tool charges subtract. Your balance is always the exact ledger sum — every charge of any kind is a visible ledger item.

  • A request is admitted only while the workspace balance is positive.
  • Single-response overshoot: if one response ends up costing more than your remaining balance, it is still delivered and billed in full — your balance goes negative, and no further requests are admitted until you top up. Top-ups clear the negative amount first, then credit beyond zero.
  • Credits never expire.
  • Top-ups: card (5.5% service fee) or crypto/USDC (5% total fee), minimum $5 — exact math on the Pricing page.

Enterprise workspaces can be switched to monthly invoicing: no prepaid credits — consumption accrues through the calendar month (UTC) and is invoiced on the 1st. Contact us to enable it.