# Errors & limits

Errors are returned in the native envelope of the surface you called, so your existing SDK's error handling keeps working.

## Unknown model ids never error

**Any** model id is accepted. Legacy, renamed, and not-yet-released ids all resolve — they are treated as intent and served by request shape, then echoed back unchanged. Code written against a model that no longer exists keeps running without a `404`.

:::tip[No model allowlist to maintain]
You never have to track which model names are valid. Send what your application already sends; classification adapts behind the endpoint. It is also what makes [switching](https://docs.directinference.com/migrate/) painless — your existing ids just work.
:::

## Error shapes

Each surface returns its own error envelope. The serving model, candidate, and provider are never named — `provider_name` is always `direct-inference`.

```json title="OpenAI-compatible"
{
  "error": {
    "code": 429,
    "message": "Provider returned error",
    "metadata": { "provider_name": "direct-inference" }
  }
}
```

```json title="Anthropic Messages"
{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "max_tokens: field required"
  }
}
```

```json title="Gemini"
{
  "error": {
    "code": 401,
    "message": "Missing or invalid API key.",
    "status": "UNAUTHENTICATED"
  }
}
```

## Status codes

| Code | Meaning |
| --- | --- |
| `400` | Invalid request — a malformed body, or an unsupported feature such as a non-PDF or URL document source. |
| `401` | Missing or invalid API key for the surface you called. |
| `402` | Payment required — a spend cap was reached, or the balance is exhausted. Raise the cap or top up, then retry. |
| `429` | Rate limited. Passed through from the serving endpoint — retry with backoff. |
| `5xx` | Transient error on the endpoint serving your request. Safe to retry. |

## Spend limits (402)

A `402` means a hard spending cap was reached or your balance ran out — a deliberate stop, not a transient failure. It will not clear on retry until you raise the cap or top up. The exact envelopes and how caps are configured live in [Spend & limits](https://docs.directinference.com/spend/).

## Rate limits & retries

A `429` reflects pressure on the endpoint serving your request rather than a fixed per-key quota. Treat it as transient: retry with exponential backoff and jitter, and prefer a non-zero `max_tokens` for reasoning-heavy or document requests so a reply is not truncated mid-thought.