Effort

One optional knob biases any call toward latency, cost, or quality — no model swap and no code rewrite.

Setting effort

The preferred form is the X-DI-Effort header, which you can set once on the client so it applies to every request. A ?effort= query parameter works too.

client = OpenAI(
    api_key="llm_live_...",
    base_url="https://app.directinference.com/di/v1",
    default_headers={"X-DI-Effort": "high"},
)

const client = new OpenAI({
  apiKey: "llm_live_...",
  baseURL: "https://app.directinference.com/di/v1",
  defaultHeaders: { "X-DI-Effort": "high" },
});

curl https://app.directinference.com/di/v1/chat/completions \
  -H "Authorization: Bearer llm_live_..." \
  -H "X-DI-Effort: high" \
  -H "Content-Type: application/json" \
  -d '{ "model": "gpt-5.5-mini", "messages": [{ "role": "user", "content": "..." }] }'

# or, without a custom header:
# POST .../di/v1/chat/completions?effort=high

client := openai.NewClient(
  option.WithAPIKey("llm_live_..."),
  option.WithBaseURL("https://app.directinference.com/di/v1"),
  option.WithHeader("X-DI-Effort", "high"),
)

Levels

Lower levels bias toward cost and latency; higher levels bias toward quality — model order, retries, and repair budget. Effort tunes the chosen request type; it does not pick the type.

fast → minimal → low → medium → high → xhigh → max

cheaper / faster → higher quality

Level	Bias
`auto`	Inferred from model intent and use case (the default).
`fast`	Lowest latency and cost; can keep simple work on cheaper models.
`minimal`	Minimal spend; trims optional steps.
`low`	Leans cheaper and faster.
`medium`	Balanced cost and quality.
`high`	Leans toward quality — model order and retries.
`xhigh`	Stronger quality bias; more repair budget.
`max`	Maximum quality bias regardless of cost.

Where effort comes from

If several effort signals are present, the first one found wins, in this order:

X-DI-Effort request header
?effort= query parameter
OpenAI reasoning.effort or reasoning_effort
Gemini thinking config
Anthropic thinking.budget_tokens
Model suffix, e.g. claude-sonnet-4-6@xhigh or gpt-5:fast
Auto inference from model intent and use case