# Gemini API

Speak the Google Gemini generateContent protocol. The Google GenAI SDKs, the Vercel AI SDK's Google provider, and raw REST callers all work by pointing the base URL here.

:::caution[Which base URL your SDK wants]
The Google GenAI SDKs (`google-genai`, `@google/genai`) take `https://app.directinference.com/di` and append `/v1beta/models/…` themselves. The Vercel `@ai-sdk/google` provider already includes `/v1beta` in its base URL, so it wants `https://app.directinference.com/di/v1beta`. Authenticate with `x-goog-api-key`.
:::

## generateContent

The model and method live in the URL path — `models/{model}:generateContent` — not the body. The id you send is echoed back as `modelVersion`.

```python
from google import genai
from google.genai import types

client = genai.Client(
    api_key="llm_live_...",
    http_options=types.HttpOptions(base_url="https://app.directinference.com/di"),
)

resp = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Name three uses for a paperclip.",
)

print(resp.text)
```

```typescript
import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({
  apiKey: "llm_live_...",
  httpOptions: { baseUrl: "https://app.directinference.com/di" },
});

const resp = await ai.models.generateContent({
  model: "gemini-2.5-flash",
  contents: "Name three uses for a paperclip.",
});

console.log(resp.text);
```

```bash
curl "https://app.directinference.com/di/v1beta/models/gemini-2.5-flash:generateContent" \
  -H "x-goog-api-key: llm_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{ "parts": [{ "text": "Name three uses for a paperclip." }] }]
  }'
```

## Streaming

Use `:streamGenerateContent`. Over raw HTTP, add `?alt=sse` for Server-Sent Events; the SDKs handle this for you.

```python
for chunk in client.models.generate_content_stream(
    model="gemini-2.5-flash",
    contents="Stream a haiku about latency.",
):
    print(chunk.text, end="", flush=True)
```

```typescript
const stream = await ai.models.generateContentStream({
  model: "gemini-2.5-flash",
  contents: "Stream a haiku about latency.",
});

for await (const chunk of stream) {
  process.stdout.write(chunk.text ?? "");
}
```

```bash
curl "https://app.directinference.com/di/v1beta/models/gemini-2.5-flash:streamGenerateContent?alt=sse" \
  -H "x-goog-api-key: llm_live_..." \
  -H "Content-Type: application/json" \
  -d '{ "contents": [{ "parts": [{ "text": "Stream a haiku about latency." }] }] }'
```

## Function calling

Declare functions with `functionDeclarations`; the reply carries `functionCall` parts. Custom functions pass through and map to the `code` request type.

```python
from google.genai import types

weather = types.FunctionDeclaration(
    name="get_weather",
    description="Get the current weather for a city.",
    parameters={
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"],
    },
)

resp = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What is the weather in Paris?",
    config=types.GenerateContentConfig(
        tools=[types.Tool(function_declarations=[weather])],
    ),
)

for part in resp.candidates[0].content.parts:
    if part.function_call:
        print(part.function_call.name, dict(part.function_call.args))
```

```typescript
const resp = await ai.models.generateContent({
  model: "gemini-2.5-flash",
  contents: "What is the weather in Paris?",
  config: {
    tools: [{
      functionDeclarations: [{
        name: "get_weather",
        description: "Get the current weather for a city.",
        parameters: {
          type: "object",
          properties: { city: { type: "string" } },
          required: ["city"],
        },
      }],
    }],
  },
});

console.log(resp.functionCalls);
```

```bash
curl "https://app.directinference.com/di/v1beta/models/gemini-2.5-flash:generateContent" \
  -H "x-goog-api-key: llm_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{ "parts": [{ "text": "What is the weather in Paris?" }] }],
    "tools": [{
      "functionDeclarations": [{
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }]
    }]
  }'
```

## Model ids & limits

Any `gemini-*` id is accepted on this surface and served by intent — flash variants lean toward the `flash` request type, the rest toward `pro`. Vision (`inlineData`), thinking (`thinkingConfig.includeThoughts`), and JSON mode (`responseMimeType` / `responseSchema`) all translate.

:::note[Intentional omissions]
`:countTokens` is advisory and excluded from usage rollups. Server-side tools (`googleSearch`, `codeExecution`, `urlContext`) and `cachedContent` are not available and return a clear error or are dropped with a diagnostic. Embeddings and batch methods are out of scope. Request-handling details live in [Request types](https://docs.directinference.com/request-types/).
:::

## Response headers

Every response carries the classified request type in the `X-DI-Request-Type` header — see [Response headers](https://docs.directinference.com/headers/). For how prompt caching is billed and reported across surfaces, see [Prompt caching](https://docs.directinference.com/caching/).