Skip to content

Migrate to DirectInference

Already calling a model vendor directly, or through another service? Switching is a base-URL change — your SDK, model ids, and request code stay exactly as they are.

DirectInference speaks the OpenAI, Anthropic, and Gemini wire formats, so you keep your existing client. Point it at the matching base URL and authenticate with your llm_live_… key.

Your clientBase URL
OpenAI-compatible — openai, LangChain, LiteLLM, @ai-sdk/openaihttps://app.directinference.com/di/v1
Anthropic — anthropic, @anthropic-ai/sdk, @ai-sdk/anthropichttps://app.directinference.com/di
— the SDK appends /v1/messages itself
Gemini — google-genai, @google/genaihttps://app.directinference.com/di
— the SDK appends /v1beta/models/… itself
Gemini — Vercel @ai-sdk/googlehttps://app.directinference.com/di/v1beta
— this provider already includes /v1beta

The OpenAI-compatible swap, in full — the diff is two lines:

from openai import OpenAI
client = OpenAI(
api_key="llm_live_...", # your DirectInference key
base_url="https://app.directinference.com/di/v1", # the only required change
)
# Everything below is untouched — same model id, same parameters.
resp = client.chat.completions.create(
model="gpt-5.5-mini",
messages=[{"role": "user", "content": "Ship it."}],
)

Per-surface details (auth headers, streaming, tools, documents) live under API surfaces.

The migration is deliberately boring. None of your application logic has to move.

The difference from a transparent service is the point: you stop managing models and start getting outcomes.

A typical service or providerDirectInference
Returns the model it picked; you keep tracking model slugs.Echoes your id back. The model stays hidden and can change at any time — your code never does.
You build model-selection logic or assemble a model pool.Nothing to build. Each call is classified from its shape automatically.
A 0–10 cost-vs-quality slider to tune.One optional effort knob (auto by default). Capability needs always win.
Pin a session to a model to preserve its cache.The endpoint adapts per request; caching works with no session to pin.
A separate key and bill per provider.One key, one balance, across every surface.
Maintain an allowlist of valid model names.Any id resolves — legacy, renamed, or not-yet-released — and never 404s.
  1. Issue a key on the API Keys page.
  2. Point your client’s base URL at the surface you use (table above).
  3. Swap in the new key; keep every model id your app already sends.
  4. Optional: set X-DI-Effort to bias cost vs. quality — see Effort.
  5. Optional: set X-Title to segment usage by app — see Usage & analytics.
  6. Set a spend cap and you’re live.