Gemini API
Speak the Google Gemini generateContent protocol. The Google GenAI SDKs, the Vercel AI SDK’s Google provider, and raw REST callers all work by pointing the base URL here.
generateContent
Section titled “generateContent”The model and method live in the URL path — models/{model}:generateContent — not the body. The id you send is echoed back as modelVersion.
from google import genaifrom google.genai import types
client = genai.Client( api_key="llm_live_...", http_options=types.HttpOptions(base_url="https://app.directinference.com/di"),)
resp = client.models.generate_content( model="gemini-2.5-flash", contents="Name three uses for a paperclip.",)
print(resp.text)import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: "llm_live_...", httpOptions: { baseUrl: "https://app.directinference.com/di" },});
const resp = await ai.models.generateContent({ model: "gemini-2.5-flash", contents: "Name three uses for a paperclip.",});
console.log(resp.text);curl "https://app.directinference.com/di/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: llm_live_..." \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [{ "text": "Name three uses for a paperclip." }] }] }'Streaming
Section titled “Streaming”Use :streamGenerateContent. Over raw HTTP, add ?alt=sse for Server-Sent Events; the SDKs handle this for you.
for chunk in client.models.generate_content_stream( model="gemini-2.5-flash", contents="Stream a haiku about latency.",): print(chunk.text, end="", flush=True)const stream = await ai.models.generateContentStream({ model: "gemini-2.5-flash", contents: "Stream a haiku about latency.",});
for await (const chunk of stream) { process.stdout.write(chunk.text ?? "");}curl "https://app.directinference.com/di/v1beta/models/gemini-2.5-flash:streamGenerateContent?alt=sse" \ -H "x-goog-api-key: llm_live_..." \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [{ "text": "Stream a haiku about latency." }] }] }'Function calling
Section titled “Function calling”Declare functions with functionDeclarations; the reply carries functionCall parts. Custom functions pass through and map to the code request type.
from google.genai import types
weather = types.FunctionDeclaration( name="get_weather", description="Get the current weather for a city.", parameters={ "type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"], },)
resp = client.models.generate_content( model="gemini-2.5-flash", contents="What is the weather in Paris?", config=types.GenerateContentConfig( tools=[types.Tool(function_declarations=[weather])], ),)
for part in resp.candidates[0].content.parts: if part.function_call: print(part.function_call.name, dict(part.function_call.args))const resp = await ai.models.generateContent({ model: "gemini-2.5-flash", contents: "What is the weather in Paris?", config: { tools: [{ functionDeclarations: [{ name: "get_weather", description: "Get the current weather for a city.", parameters: { type: "object", properties: { city: { type: "string" } }, required: ["city"], }, }], }], },});
console.log(resp.functionCalls);curl "https://app.directinference.com/di/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: llm_live_..." \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [{ "text": "What is the weather in Paris?" }] }], "tools": [{ "functionDeclarations": [{ "name": "get_weather", "description": "Get the current weather for a city.", "parameters": { "type": "object", "properties": { "city": { "type": "string" } }, "required": ["city"] } }] }] }'Model ids & limits
Section titled “Model ids & limits”Any gemini-* id is accepted on this surface and served by intent — flash variants lean toward the flash request type, the rest toward pro. Vision (inlineData), thinking (thinkingConfig.includeThoughts), and JSON mode (responseMimeType / responseSchema) all translate.
Response headers
Section titled “Response headers”Every response carries the classified request type in the X-DI-Request-Type header — see Response headers. For how prompt caching is billed and reported across surfaces, see Prompt caching.