Gemini API

Speak the Google Gemini generateContent protocol. The Google GenAI SDKs, the Vercel AI SDK’s Google provider, and raw REST callers all work by pointing the base URL here.

generateContent

The model and method live in the URL path — models/{model}:generateContent — not the body. The id you send is echoed back as modelVersion.

from google import genai
from google.genai import types

client = genai.Client(
    api_key="llm_live_...",
    http_options=types.HttpOptions(base_url="https://app.directinference.com/di"),
)

resp = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Name three uses for a paperclip.",
)

print(resp.text)

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({
  apiKey: "llm_live_...",
  httpOptions: { baseUrl: "https://app.directinference.com/di" },
});

const resp = await ai.models.generateContent({
  model: "gemini-2.5-flash",
  contents: "Name three uses for a paperclip.",
});

console.log(resp.text);

curl "https://app.directinference.com/di/v1beta/models/gemini-2.5-flash:generateContent" \
  -H "x-goog-api-key: llm_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{ "parts": [{ "text": "Name three uses for a paperclip." }] }]
  }'

Streaming

Use :streamGenerateContent. Over raw HTTP, add ?alt=sse for Server-Sent Events; the SDKs handle this for you.

for chunk in client.models.generate_content_stream(
    model="gemini-2.5-flash",
    contents="Stream a haiku about latency.",
):
    print(chunk.text, end="", flush=True)

const stream = await ai.models.generateContentStream({
  model: "gemini-2.5-flash",
  contents: "Stream a haiku about latency.",
});

for await (const chunk of stream) {
  process.stdout.write(chunk.text ?? "");
}

curl "https://app.directinference.com/di/v1beta/models/gemini-2.5-flash:streamGenerateContent?alt=sse" \
  -H "x-goog-api-key: llm_live_..." \
  -H "Content-Type: application/json" \
  -d '{ "contents": [{ "parts": [{ "text": "Stream a haiku about latency." }] }] }'

Function calling

Declare functions with functionDeclarations; the reply carries functionCall parts. Custom functions pass through and map to the code request type.

from google.genai import types

weather = types.FunctionDeclaration(
    name="get_weather",
    description="Get the current weather for a city.",
    parameters={
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"],
    },
)

resp = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What is the weather in Paris?",
    config=types.GenerateContentConfig(
        tools=[types.Tool(function_declarations=[weather])],
    ),
)

for part in resp.candidates[0].content.parts:
    if part.function_call:
        print(part.function_call.name, dict(part.function_call.args))

const resp = await ai.models.generateContent({
  model: "gemini-2.5-flash",
  contents: "What is the weather in Paris?",
  config: {
    tools: [{
      functionDeclarations: [{
        name: "get_weather",
        description: "Get the current weather for a city.",
        parameters: {
          type: "object",
          properties: { city: { type: "string" } },
          required: ["city"],
        },
      }],
    }],
  },
});

console.log(resp.functionCalls);

curl "https://app.directinference.com/di/v1beta/models/gemini-2.5-flash:generateContent" \
  -H "x-goog-api-key: llm_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{ "parts": [{ "text": "What is the weather in Paris?" }] }],
    "tools": [{
      "functionDeclarations": [{
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }]
    }]
  }'

Model ids & limits

Any gemini-* id is accepted on this surface and served by intent — flash variants lean toward the flash request type, the rest toward pro. Vision (inlineData), thinking (thinkingConfig.includeThoughts), and JSON mode (responseMimeType / responseSchema) all translate.

Response headers

Every response carries the classified request type in the X-DI-Request-Type header — see Response headers. For how prompt caching is billed and reported across surfaces, see Prompt caching.