Effort
One optional knob biases any call toward latency, cost, or quality — no model swap and no code rewrite.
Setting effort
Section titled “Setting effort”The preferred form is the X-DI-Effort header, which you can set once on the client so it applies to every request. A ?effort= query parameter works too.
client = OpenAI( api_key="llm_live_...", base_url="https://app.directinference.com/di/v1", default_headers={"X-DI-Effort": "high"},)const client = new OpenAI({ apiKey: "llm_live_...", baseURL: "https://app.directinference.com/di/v1", defaultHeaders: { "X-DI-Effort": "high" },});curl https://app.directinference.com/di/v1/chat/completions \ -H "Authorization: Bearer llm_live_..." \ -H "X-DI-Effort: high" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-5.5-mini", "messages": [{ "role": "user", "content": "..." }] }'
# or, without a custom header:# POST .../di/v1/chat/completions?effort=highclient := openai.NewClient( option.WithAPIKey("llm_live_..."), option.WithBaseURL("https://app.directinference.com/di/v1"), option.WithHeader("X-DI-Effort", "high"),)Levels
Section titled “Levels”Lower levels bias toward cost and latency; higher levels bias toward quality — model order, retries, and repair budget. Effort tunes the chosen request type; it does not pick the type.
fast → minimal → low → medium → high → xhigh → max
cheaper / faster → higher quality
| Level | Bias |
|---|---|
auto | Inferred from model intent and use case (the default). |
fast | Lowest latency and cost; can keep simple work on cheaper models. |
minimal | Minimal spend; trims optional steps. |
low | Leans cheaper and faster. |
medium | Balanced cost and quality. |
high | Leans toward quality — model order and retries. |
xhigh | Stronger quality bias; more repair budget. |
max | Maximum quality bias regardless of cost. |
Where effort comes from
Section titled “Where effort comes from”If several effort signals are present, the first one found wins, in this order:
- X-DI-Effort request header
- ?effort= query parameter
- OpenAI reasoning.effort or reasoning_effort
- Gemini thinking config
- Anthropic thinking.budget_tokens
- Model suffix, e.g. claude-sonnet-4-6@xhigh or gpt-5:fast
- Auto inference from model intent and use case