Errors & limits
Errors are returned in the native envelope of the surface you called, so your existing SDK’s error handling keeps working.
Unknown model ids never error
Section titled “Unknown model ids never error”Any model id is accepted. Legacy, renamed, and not-yet-released ids all resolve — they are treated as intent and served by request shape, then echoed back unchanged. Code written against a model that no longer exists keeps running without a 404.
Error shapes
Section titled “Error shapes”Each surface returns its own error envelope. The serving model, candidate, and provider are never named — provider_name is always direct-inference.
{ "error": { "code": 429, "message": "Provider returned error", "metadata": { "provider_name": "direct-inference" } }}{ "type": "error", "error": { "type": "invalid_request_error", "message": "max_tokens: field required" }}{ "error": { "code": 401, "message": "Missing or invalid API key.", "status": "UNAUTHENTICATED" }}Status codes
Section titled “Status codes”| Code | Meaning |
|---|---|
400 | Invalid request — a malformed body, or an unsupported feature such as a non-PDF or URL document source. |
401 | Missing or invalid API key for the surface you called. |
402 | Payment required — a spend cap was reached, or the balance is exhausted. Raise the cap or top up, then retry. |
429 | Rate limited. Passed through from the serving endpoint — retry with backoff. |
5xx | Transient error on the endpoint serving your request. Safe to retry. |
Spend limits (402)
Section titled “Spend limits (402)”A 402 means a hard spending cap was reached or your balance ran out — a deliberate stop, not a transient failure. It will not clear on retry until you raise the cap or top up. The exact envelopes and how caps are configured live in Spend & limits.
Rate limits & retries
Section titled “Rate limits & retries”A 429 reflects pressure on the endpoint serving your request rather than a fixed per-key quota. Treat it as transient: retry with exponential backoff and jitter, and prefer a non-zero max_tokens for reasoning-heavy or document requests so a reply is not truncated mid-thought.