Web search
Web search is a tool, not an endpoint. All three completion
endpoints — /chat/completions, /responses, and
/anthropic/messages — can carry a web-search tool, but the wire
shape and the citation format differ per provider.
This page maps each (provider, endpoint) cell to the tool descriptor
you send and the citation field you read back.
Provider × Endpoint matrix
| Provider | Chat Completions | Responses | Anthropic Messages |
|---|---|---|---|
| OpenAI | gpt-*-search-preview model + web_search_options | tools: [{ type: "web_search" }] | n/a (route via Responses or use Anthropic provider) |
| Anthropic | tools: [{ type: "web_search_20250305", name: "web_search" }] | (converts via Responses-side adapter) | tools: [{ type: "web_search_20250305", name: "web_search" }] |
| Gemini | tools: [{ googleSearch: {} }] (auto-routes to native @google/genai) | (converts via Chat Completions) | (converts via Chat Completions) |
| Perplexity | built-in (every Perplexity model) — no tool descriptor needed | (converts via Chat Completions) | (converts via Chat Completions) |
Streaming + Gemini googleSearch: the native @google/genai path doesn't yet wire streaming for googleSearch requests. VM-X returns a clean
400(gemini_native_streaming_unsupported) if you setstream: truetogether with agoogleSearchtool — drop streaming on those calls until the native path covers it.
Where citations land
| Provider | Endpoint | Citation location |
|---|---|---|
| OpenAI | Chat Completions | choices[0].message.annotations[] with type: 'url_citation' |
| OpenAI | Responses | output[].content[].annotations[] with type: 'url_citation' |
| Anthropic | Anthropic Messages | content[] includes a web_search_tool_result block; subsequent text blocks carry citations[] |
| Gemini | (any) | choices[0].message.grounding_metadata (camelCase: searchEntryPoint, groundingChunks[], groundingSupports[]) — forwarded verbatim from @google/genai |
| Perplexity | Chat Completions | citations[] on the top-level response object (one entry per cited URL) |
OpenAI — Chat Completions (gpt-*-search-preview)
OpenAI's Chat Completions web search is a model variant, not a
tool. Use a *-search-preview model name and (optionally) configure
recency / context size via web_search_options.
- Python
- TypeScript
- cURL
response = client.chat.completions.create(
model="my-search-resource", # resolves to e.g. gpt-4o-mini-search-preview
messages=[
{"role": "user", "content": "What's the latest TypeScript release?"},
],
web_search_options={"search_context_size": "medium"},
)
# Inspect annotations on the assistant message.
for ann in response.choices[0].message.annotations or []:
if ann["type"] == "url_citation":
print(ann["url_citation"]["url"], ann["url_citation"]["title"])
const response = await client.chat.completions.create({
model: 'my-search-resource',
messages: [{ role: 'user', content: "What's the latest TypeScript release?" }],
// @ts-expect-error web_search_options is preview-model only
web_search_options: { search_context_size: 'medium' },
});
for (const ann of response.choices[0].message.annotations ?? []) {
if (ann.type === 'url_citation') {
console.log(ann.url_citation.url, ann.url_citation.title);
}
}
curl http://localhost:3000/v1/completion/<workspace>/<environment>/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <vmx-api-key>" \
-d '{
"model": "my-search-resource",
"messages": [{"role":"user","content":"Latest TypeScript release?"}],
"web_search_options": {"search_context_size": "medium"}
}'
OpenAI — Responses
On the Responses endpoint, web search is a hosted tool. Add
{ type: 'web_search' } to tools[]. Pin the resource to a
search-capable Responses model (OpenAI lists which models support the
tool); routes that resolve to OpenAI-compat upstreams without hosted
tool support (Gemini / Groq / Perplexity) get a clean
400 responses_unsupported_tool_type from the per-provider dispatch.
- Python
- TypeScript
- cURL
response = client.responses.create(
model="my-resource",
input="Latest TypeScript release? Cite sources.",
tools=[{"type": "web_search"}],
)
# Citations attach to text content parts inside output[].
for item in response.output:
if item.type != "message":
continue
for part in item.content:
if part.type == "output_text":
for ann in (part.annotations or []):
if ann["type"] == "url_citation":
print(ann["url_citation"]["url"])
const response = await client.responses.create({
model: 'my-resource',
input: 'Latest TypeScript release? Cite sources.',
tools: [{ type: 'web_search' }],
});
for (const item of response.output) {
if (item.type !== 'message') continue;
for (const part of item.content) {
if (part.type !== 'output_text') continue;
for (const ann of (part as { annotations?: { type: string; url_citation?: { url: string } }[] }).annotations ?? []) {
if (ann.type === 'url_citation') console.log(ann.url_citation?.url);
}
}
}
curl http://localhost:3000/v1/completion/<workspace>/<environment>/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <vmx-api-key>" \
-d '{
"model": "my-resource",
"input": "Latest TypeScript release? Cite sources.",
"tools": [{"type":"web_search"}]
}'
Anthropic — Anthropic Messages
Anthropic's web search is a server tool — the model invokes it
autonomously without a function definition. Cap the number of searches
per call with max_uses.
- Python (Anthropic SDK)
- TypeScript
- cURL
import anthropic
client = anthropic.Anthropic(
api_key="<vmx-api-key>",
base_url="http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic",
)
message = client.messages.create(
model="my-claude-resource",
max_tokens=2048,
messages=[{"role": "user", "content": "Latest TypeScript release? Cite sources."}],
tools=[
{
"type": "web_search_20250305",
"name": "web_search",
"max_uses": 3,
}
],
)
# The response contains web_search_tool_result blocks plus text blocks
# whose `citations[]` reference the search results.
for block in message.content:
if block.type == "text":
for citation in (block.citations or []):
print(citation["url"], "—", citation.get("title"))
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({
apiKey: '<vmx-api-key>',
baseURL: 'http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic',
});
const message = await client.messages.create({
model: 'my-claude-resource',
max_tokens: 2048,
messages: [{ role: 'user', content: 'Latest TypeScript release? Cite sources.' }],
tools: [
{
type: 'web_search_20250305',
name: 'web_search',
max_uses: 3,
},
],
});
for (const block of message.content) {
if (block.type === 'text') {
for (const citation of (block as { citations?: { url: string; title?: string }[] }).citations ?? []) {
console.log(citation.url, '—', citation.title);
}
}
}
curl http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic/messages \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <vmx-api-key>" \
-d '{
"model": "my-claude-resource",
"max_tokens": 2048,
"messages": [{"role":"user","content":"Latest TypeScript release? Cite sources."}],
"tools": [{
"type": "web_search_20250305",
"name": "web_search",
"max_uses": 3
}]
}'
Cross-endpoint: Anthropic web search via /responses
You can also use Anthropic's web search through the /responses
endpoint. VM-X carries the Anthropic-only tool descriptor on the
private __vmx_passthrough.anthropic.server_tools envelope through
the conversion path, so the request reaches the Anthropic provider
unchanged.
- Python
response = client.responses.create(
model="my-claude-resource",
input="Latest TypeScript release? Cite sources.",
extra_body={
"__vmx_passthrough": {
"anthropic": {
"server_tools": [
{
"type": "web_search_20250305",
"name": "web_search",
"max_uses": 3,
}
]
}
}
},
)
For most use cases, prefer /anthropic/messages directly when
you need Anthropic web search — it's clearer and avoids the
passthrough plumbing.
Gemini — googleSearch
Gemini's googleSearch is a Gemini-only tool that Google's
OpenAI-compat endpoint rejects. VM-X auto-routes any request carrying
a googleSearch/googleSearchRetrieval/urlContext/codeExecution/fileSearch
tool to Google's native @google/genai SDK, then maps the response
back to the OpenAI Chat Completions / Responses / Anthropic Messages
shape your client expects.
Grounding metadata lands on vertex_ai_grounding_metadata (top-level
on the response object and mirrored on the message object for
clients that read either).
- Python (OpenAI SDK)
- TypeScript
- cURL
response = client.chat.completions.create(
model="my-gemini-resource",
messages=[
{"role": "user", "content": "Latest news about Anthropic from this week?"},
],
tools=[{"googleSearch": {}}],
)
# Grounding metadata lands on the assistant message as `grounding_metadata`,
# forwarded verbatim from @google/genai. Field names are camelCase.
msg = response.choices[0].message.model_dump()
grounding = msg.get("grounding_metadata") or {}
for chunk in grounding.get("groundingChunks", []):
print(chunk.get("web", {}).get("uri"), "—", chunk.get("web", {}).get("title"))
# `groundingSupports[]` ties each cited segment of the assistant text
# back to the grounding chunk indices, e.g.:
# { "segment": { "startIndex": 80, "endIndex": 197, "text": "…" },
# "groundingChunkIndices": [0, 1] }
for support in grounding.get("groundingSupports", []):
seg = support.get("segment", {})
print(seg.get("startIndex"), seg.get("endIndex"), support.get("groundingChunkIndices"))
const response = await client.chat.completions.create({
model: 'my-gemini-resource',
messages: [{ role: 'user', content: 'Latest news about Anthropic from this week?' }],
// @ts-expect-error Gemini-only tool descriptor
tools: [{ googleSearch: {} }],
});
type GroundingMetadata = {
searchEntryPoint?: { renderedContent?: string };
groundingChunks?: { web?: { uri: string; title?: string } }[];
groundingSupports?: {
segment?: { startIndex?: number; endIndex?: number; text?: string };
groundingChunkIndices?: number[];
}[];
};
const msg = response.choices[0].message as unknown as {
grounding_metadata?: GroundingMetadata;
};
const grounding = msg.grounding_metadata;
for (const chunk of grounding?.groundingChunks ?? []) {
console.log(chunk.web?.uri, '—', chunk.web?.title);
}
curl http://localhost:3000/v1/completion/<workspace>/<environment>/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <vmx-api-key>" \
-d '{
"model": "my-gemini-resource",
"messages": [{"role":"user","content":"Latest news about Anthropic from this week?"}],
"tools": [{"googleSearch": {}}]
}'
Native path quirks: the Gemini native dispatch drops a few features the OpenAI-compat path supports (multi-modal parts, function-tool round-trips,
responseSchema). KeepgoogleSearchcalls free of those features, or call withoutgoogleSearchfirst and re-issue with it. Seegemini/native.helpers.tsfor the full supported subset.
Perplexity — built-in
Perplexity searches the web on every request — there's no tool
descriptor to send. Cited sources land on the top-level citations[]
array; tune the search via vmx.providerArgs:
vmx.providerArgs field | Effect |
|---|---|
search_recency_filter | 'day' | 'week' | 'month' | 'year' — recency cap on retrieved sources |
search_domain_filter | string[] — allow-list of domains |
search_after_date_filter | 'YYYY-MM-DD' — only retrieve sources after this date |
search_before_date_filter | 'YYYY-MM-DD' — only retrieve sources before this date |
- Python
- TypeScript
- cURL
response = client.chat.completions.create(
model="my-perplexity-resource",
messages=[{"role": "user", "content": "Latest TypeScript release?"}],
extra_body={
"vmx": {
"providerArgs": {
"search_recency_filter": "week",
"search_domain_filter": ["github.com", "typescriptlang.org"],
}
}
},
)
# `citations` is a top-level field on Perplexity's Chat Completions response.
for url in getattr(response, "citations", []) or []:
print(url)
const response = await client.chat.completions.create({
model: 'my-perplexity-resource',
messages: [{ role: 'user', content: 'Latest TypeScript release?' }],
// @ts-expect-error vmx envelope
vmx: {
providerArgs: {
search_recency_filter: 'week',
search_domain_filter: ['github.com', 'typescriptlang.org'],
},
},
});
for (const url of (response as unknown as { citations?: string[] }).citations ?? []) {
console.log(url);
}
curl http://localhost:3000/v1/completion/<workspace>/<environment>/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <vmx-api-key>" \
-d '{
"model": "my-perplexity-resource",
"messages": [{"role":"user","content":"Latest TypeScript release?"}],
"vmx": {
"providerArgs": {
"search_recency_filter": "week",
"search_domain_filter": ["github.com", "typescriptlang.org"]
}
}
}'
Practical tips
- Phrase the prompt to elicit the tool. Models won't call web search on questions they can answer from training data. Time-sensitive prompts ("today's", "this week's", "latest version of …") fire it reliably.
max_uses(Anthropic) caps cost. Without it, the model can run multiple searches in a single response. Start at3and tune.- Citations vs grounding metadata. OpenAI and Anthropic emit per-text-span citations; Gemini emits grounding metadata at the message/response level (one block referencing all sources). Treat them as functionally equivalent for source-attribution UIs.
- Streaming. OpenAI Chat-Completions search-preview models stream
the same as any other Chat Completions request. Anthropic streams
web-search tool use as
content_block_start/content_block_deltaevents onweb_search_tool_resultblocks. Gemini's googleSearch path is non-streaming today (T5 — clean 400 if you setstream: true). - vmx audit. Search-driven calls show up in the Audit page like
any other call; the resolved provider/model is on
x-vmx-provider/x-vmx-model, and per-call recency / domain filters that you set viavmx.providerArgsround-trip intoproviderRequestPayloadso the audit row reflects the real wire request.
Next steps
- Chat Completions — full Chat Completions reference
- Responses — typed-event streaming + reasoning
- Anthropic Messages — full Anthropic Messages reference
- VM-X envelope —
providerArgsdeep dive (Perplexity recency, etc.)