Skip to content

Gemini API

Speak the Google Gemini generateContent protocol. The Google GenAI SDKs, the Vercel AI SDK’s Google provider, and raw REST callers all work by pointing the base URL here.

The model and method live in the URL path — models/{model}:generateContent — not the body. The id you send is echoed back as modelVersion.

from google import genai
from google.genai import types
client = genai.Client(
api_key="llm_live_...",
http_options=types.HttpOptions(base_url="https://app.directinference.com/di"),
)
resp = client.models.generate_content(
model="gemini-2.5-flash",
contents="Name three uses for a paperclip.",
)
print(resp.text)

Use :streamGenerateContent. Over raw HTTP, add ?alt=sse for Server-Sent Events; the SDKs handle this for you.

for chunk in client.models.generate_content_stream(
model="gemini-2.5-flash",
contents="Stream a haiku about latency.",
):
print(chunk.text, end="", flush=True)

Declare functions with functionDeclarations; the reply carries functionCall parts. Custom functions pass through and map to the code request type.

from google.genai import types
weather = types.FunctionDeclaration(
name="get_weather",
description="Get the current weather for a city.",
parameters={
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
)
resp = client.models.generate_content(
model="gemini-2.5-flash",
contents="What is the weather in Paris?",
config=types.GenerateContentConfig(
tools=[types.Tool(function_declarations=[weather])],
),
)
for part in resp.candidates[0].content.parts:
if part.function_call:
print(part.function_call.name, dict(part.function_call.args))

Any gemini-* id is accepted on this surface and served by intent — flash variants lean toward the flash request type, the rest toward pro. Vision (inlineData), thinking (thinkingConfig.includeThoughts), and JSON mode (responseMimeType / responseSchema) all translate.

Every response carries the classified request type in the X-DI-Request-Type header — see Response headers. For how prompt caching is billed and reported across surfaces, see Prompt caching.