# Effort

One optional knob biases any call toward latency, cost, or quality — no model swap and no code rewrite.

## Setting effort

The preferred form is the `X-DI-Effort` header, which you can set once on the client so it applies to every request. A `?effort=` query parameter works too.

```python
client = OpenAI(
    api_key="llm_live_...",
    base_url="https://app.directinference.com/di/v1",
    default_headers={"X-DI-Effort": "high"},
)
```

```typescript
const client = new OpenAI({
  apiKey: "llm_live_...",
  baseURL: "https://app.directinference.com/di/v1",
  defaultHeaders: { "X-DI-Effort": "high" },
});
```

```bash
curl https://app.directinference.com/di/v1/chat/completions \
  -H "Authorization: Bearer llm_live_..." \
  -H "X-DI-Effort: high" \
  -H "Content-Type: application/json" \
  -d '{ "model": "gpt-5.5-mini", "messages": [{ "role": "user", "content": "..." }] }'

# or, without a custom header:
# POST .../di/v1/chat/completions?effort=high
```

```go
client := openai.NewClient(
	option.WithAPIKey("llm_live_..."),
	option.WithBaseURL("https://app.directinference.com/di/v1"),
	option.WithHeader("X-DI-Effort", "high"),
)
```

## Levels

Lower levels bias toward cost and latency; higher levels bias toward quality — model order, retries, and repair budget. Effort tunes the chosen request type; it does not pick the type.

`fast` → `minimal` → `low` → `medium` → `high` → `xhigh` → `max`

_cheaper / faster → higher quality_

| Level | Bias |
| --- | --- |
| `auto` | Inferred from model intent and use case (the default). |
| `fast` | Lowest latency and cost; can keep simple work on cheaper models. |
| `minimal` | Minimal spend; trims optional steps. |
| `low` | Leans cheaper and faster. |
| `medium` | Balanced cost and quality. |
| `high` | Leans toward quality — model order and retries. |
| `xhigh` | Stronger quality bias; more repair budget. |
| `max` | Maximum quality bias regardless of cost. |

## Where effort comes from

If several effort signals are present, the first one found wins, in this order:

1. X-DI-Effort request header
2. ?effort= query parameter
3. OpenAI reasoning.effort or reasoning_effort
4. Gemini thinking config
5. Anthropic thinking.budget_tokens
6. Model suffix, e.g. claude-sonnet-4-6@xhigh or gpt-5:fast
7. Auto inference from model intent and use case

:::caution[Capability requirements always win]
Effort never overrides a capability. A PDF request at `fast` still uses `document`; image input still uses `vision`; oversized input still uses `long`.
:::

:::note[Used to a cost/quality slider?]
Effort is the single dial that replaces it — auto by default, one value to raise when you want more quality or lower when you want cheaper. If you're moving over from another service, see [Migrate to DirectInference](https://docs.directinference.com/migrate/).
:::