Skip to content

Errors & limits

Errors are returned in the native envelope of the surface you called, so your existing SDK’s error handling keeps working.

Any model id is accepted. Legacy, renamed, and not-yet-released ids all resolve — they are treated as intent and served by request shape, then echoed back unchanged. Code written against a model that no longer exists keeps running without a 404.

Each surface returns its own error envelope. The serving model, candidate, and provider are never named — provider_name is always direct-inference.

OpenAI-compatible
{
"error": {
"code": 429,
"message": "Provider returned error",
"metadata": { "provider_name": "direct-inference" }
}
}
Anthropic Messages
{
"type": "error",
"error": {
"type": "invalid_request_error",
"message": "max_tokens: field required"
}
}
Gemini
{
"error": {
"code": 401,
"message": "Missing or invalid API key.",
"status": "UNAUTHENTICATED"
}
}
CodeMeaning
400Invalid request — a malformed body, or an unsupported feature such as a non-PDF or URL document source.
401Missing or invalid API key for the surface you called.
402Payment required — a spend cap was reached, or the balance is exhausted. Raise the cap or top up, then retry.
429Rate limited. Passed through from the serving endpoint — retry with backoff.
5xxTransient error on the endpoint serving your request. Safe to retry.

A 402 means a hard spending cap was reached or your balance ran out — a deliberate stop, not a transient failure. It will not clear on retry until you raise the cap or top up. The exact envelopes and how caps are configured live in Spend & limits.

A 429 reflects pressure on the endpoint serving your request rather than a fixed per-key quota. Treat it as transient: retry with exponential backoff and jitter, and prefer a non-zero max_tokens for reasoning-heavy or document requests so a reply is not truncated mid-thought.