Introduction

DirectInference is a drop-in endpoint. Point your existing OpenAI, Anthropic, or Gemini client at it, keep the model string you already send, and each request is served by the model best suited to its shape.

There is one model — di. You never pick a backing model: every call is classified into a request type by its shape, and each request type invokes the model best suited to it. The request type is visible information about your call; the model, candidate, and provider that serve it always stay private.

Endpoints

Client	Base URL
OpenAI-compatible	`https://app.directinference.com/di/v1`
Anthropic SDK	`https://app.directinference.com/di` — the SDK appends /v1/messages itself
Gemini (Google GenAI SDK)	`https://app.directinference.com/di` — the SDK appends /v1beta/models/…

Each surface is documented in full under API surfaces. The authentication header differs per client — see Authentication.

Hello, DirectInference

The smallest possible call over the OpenAI-compatible surface. Only two things change from a normal integration: the base URL and the API key.

from openai import OpenAI

client = OpenAI(
    api_key="llm_live_...",
    base_url="https://app.directinference.com/di/v1",
)

resp = client.chat.completions.create(
    model="gpt-5.5-mini",                      # keep the id your app already sends
    messages=[{"role": "user", "content": "Say hello from DirectInference."}],
)

print(resp.choices[0].message.content)
print(resp.model)                              # -> "gpt-5.5-mini" (echoed back)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "llm_live_...",
  baseURL: "https://app.directinference.com/di/v1",
});

const resp = await client.chat.completions.create({
  model: "gpt-5.5-mini",                       // keep the id your app already sends
  messages: [{ role: "user", content: "Say hello from DirectInference." }],
});

console.log(resp.choices[0].message.content);
console.log(resp.model);                       // -> "gpt-5.5-mini" (echoed back)

curl https://app.directinference.com/di/v1/chat/completions \
  -H "Authorization: Bearer llm_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5-mini",
    "messages": [{ "role": "user", "content": "Say hello from DirectInference." }]
  }'

package main

import (
  "context"
  "fmt"

  "github.com/openai/openai-go"
  "github.com/openai/openai-go/option"
)

func main() {
  client := openai.NewClient(
    option.WithAPIKey("llm_live_..."),
    option.WithBaseURL("https://app.directinference.com/di/v1"),
  )

  resp, err := client.Chat.Completions.New(context.TODO(), openai.ChatCompletionNewParams{
    Model: "gpt-5.5-mini", // keep the id your app already sends
    Messages: []openai.ChatCompletionMessageParamUnion{
      openai.UserMessage("Say hello from DirectInference."),
    },
  })
  if err != nil {
    panic(err)
  }

  fmt.Println(resp.Choices[0].Message.Content)
  fmt.Println(resp.Model) // -> "gpt-5.5-mini" (echoed back)
}

Compatibility guarantees

Your model id is echoed back

Responses carry the exact model string you sent. Logging, dashboards, and eval pipelines keep working unchanged.

Unknown ids never error

Legacy, renamed, and not-yet-released model ids are all accepted. Code written against a model that no longer exists keeps running.

No model identity leaks

Internal classification is hidden. Which model, candidate, and provider served a request stays private by design.

Three SDK shapes, one endpoint

OpenAI chat completions, the Anthropic Messages API, and Gemini generateContent all work. You never commit to a single vendor.

Explore the platform

Beyond the three API surfaces, the portal handles request classification, caching, usage analytics, and cost control — all without changing how you call the endpoint.

Migrate to DirectInference Switch an existing OpenAI, Anthropic, or Gemini app by base URL.

AI coding agents Point Cursor, Claude Code, and CLIs at DI, plus llms.txt.

Prompt caching Cut cost and latency by reusing a stable prompt prefix.

Usage & analytics Spend, request-type mix, traces, and per-application attribution.

Spend & limits Hard caps, balance, top-ups, and the 402 at the limit.

Playground Try DI, sweep effort, and compare against your own endpoint.

Ready to send a request?

Issue a key, then walk through your first call: API Keys · Quickstart.