The vmx envelope
Every endpoint accepts an opt-in vmx extension on the request body.
It's how you attach metadata, group multi-step calls, override resource
config, bound runtime, and inject provider-native fields — without
leaving the SDK shape.
The envelope works identically on /chat/completions,
/anthropic/messages, and /responses. Reach for it whenever the
underlying SDK doesn't have a typed field for what you need.
Quick example
{
"model": "your-resource-name",
"messages": [{ "role": "user", "content": "..." }],
"vmx": {
"correlationId": "agent-run-2026-05-06-abc123",
"metadata": { "team": "growth", "feature": "summarizer" },
"timeoutMs": 30000,
"providerArgs": { "search_recency_filter": "week" },
"secondaryModelIndex": 0,
"resourceConfigOverrides": {
"model": {
"provider": "openai",
"model": "gpt-4o-mini",
"connectionName": "my-openai-prod"
}
}
}
}
Field reference
| Field | Purpose |
|---|---|
correlationId | Free-form string that groups related calls (e.g., a multi-step agent run). Surfaces as a column on the Audit page so you can pull every call from one trace. |
metadata | Record<string, string>. Any keys you like — the Audit page's filter autocomplete picks them up automatically. Indexed for filtering and group-by aggregation on the Usage page. |
timeoutMs | Per-request abort cap (clamped to 10 minutes). VM-X builds an AbortSignal.timeout that propagates all the way down to the provider SDK, so a slow upstream stops costing you tokens. Composes with the per-model timeoutMs (whichever fires first wins). |
providerArgs | Provider-native fields that override the parsed body. Useful when the OpenAI / Anthropic compatible shape can't express a feature your provider offers — Perplexity search_recency_filter, Anthropic top_k, Gemini safetySettings. The merge order is defaultArgs < parsed body < providerArgs, so this wins over both — even on messages. |
secondaryModelIndex | Skip the resource's primary model and use the Nth secondary instead (0-based). Useful for A/B tests, blue/green rollouts, and per-call model pinning without leaving the resource API. |
resourceConfigOverrides | Override the resource's primary model / connection / args / routing for this request only. The merge is deep — caller-supplied keys win, the resource fills in the rest. Inside any model config you can address the connection by either connectionId (UUID) or connectionName (human-readable name); see Addressing a connection by name. |
How to attach vmx from each SDK
The OpenAI and Anthropic SDKs are typed against their respective APIs
and don't know about the vmx field. Use the SDK's "extra body" escape
hatch (extra_body in Python, top-level cast in TypeScript) to pass
it through.
OpenAI SDK (Chat Completions or Responses)
- Python
- TypeScript
from openai import OpenAI
client = OpenAI(
api_key="<vmx-api-key>",
base_url="http://localhost:3000/v1/completion/<workspace>/<environment>",
)
response = client.chat.completions.create(
model="my-resource",
messages=[{"role": "user", "content": "Hello!"}],
extra_body={
"vmx": {
"correlationId": "agent-run-1",
"metadata": {"team": "growth", "user_id": "u_42"},
"timeoutMs": 15_000,
}
},
)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: '<vmx-api-key>',
baseURL: 'http://localhost:3000/v1/completion/<workspace>/<environment>',
});
const completion = await client.chat.completions.create({
model: 'my-resource',
messages: [{ role: 'user', content: 'Hello!' }],
// The OpenAI SDK passes through unknown top-level fields; vmx is one.
// @ts-expect-error custom extra
vmx: {
correlationId: 'agent-run-1',
metadata: { team: 'growth', user_id: 'u_42' },
timeoutMs: 15_000,
},
});
Anthropic SDK (Anthropic Messages)
- Python
- TypeScript
import anthropic
client = anthropic.Anthropic(
api_key="<vmx-api-key>",
base_url="http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic",
)
message = client.messages.create(
model="my-resource",
max_tokens=512,
messages=[{"role": "user", "content": "Hello!"}],
extra_body={
"vmx": {
"correlationId": "agent-run-1",
"metadata": {"team": "growth"},
}
},
)
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({
apiKey: '<vmx-api-key>',
baseURL: 'http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic',
});
const message = await client.messages.create({
model: 'my-resource',
max_tokens: 512,
messages: [{ role: 'user', content: 'Hello!' }],
// @ts-expect-error custom extra
vmx: { correlationId: 'agent-run-1', metadata: { team: 'growth' } },
});
cURL (any endpoint)
vmx is just a top-level JSON field; nothing special:
curl http://localhost:3000/v1/completion/<workspace>/<environment>/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <vmx-api-key>" \
-d '{
"model": "my-resource",
"messages": [{"role":"user","content":"Hello!"}],
"vmx": {
"correlationId": "agent-run-1",
"metadata": {"team": "growth"}
}
}'
Addressing a connection by name
Every model config in resourceConfigOverrides (the primary model,
each fallbackModels[*], each secondaryModels[*], and every
routing.conditions[*].then) accepts EITHER connectionId (a UUID)
OR connectionName (a human-readable name unique within the
workspace + environment). Use connectionName when you don't want to
look up the UUID first — the gateway resolves it before dispatch.
{
"model": "my-resource",
"messages": [{ "role": "user", "content": "..." }],
"vmx": {
"resourceConfigOverrides": {
"model": {
"provider": "openai",
"model": "gpt-4o",
"connectionName": "openai-prod"
},
"fallbackModels": [
{
"provider": "anthropic",
"model": "claude-haiku-4-5",
"connectionName": "anthropic-prod"
}
]
}
}
}
Rules:
- Exactly one of
connectionId/connectionNamemust be set on every model config. If both are set,connectionIdwins (the UUID is unambiguous and cannot drift if a connection is renamed later). - An unknown
connectionNamereturns a clean400 invalid_requestwith an actionable error message — the request is rejected before the provider call. - The same rule applies to persisted resources: when you create or
update an AI Resource via
POST/PATCH /ai-resource/..., the body can carryconnectionNameon any model slot. The service resolves to a UUID and storesconnectionIdonly — so the database is never affected by future renames. - For ad-hoc, resource-less calls, you can also use the
connection-name/modelshortcut directly in the requestmodelfield (e.g."model": "openai-prod/gpt-4o-mini"), which builds an ephemeral resource on the fly.resourceConfigOverridesis the more flexible path when you also need to set fallbacks, routing, defaultArgs, etc.
providerArgs — your escape hatch
When a provider has a feature the OpenAI / Anthropic compatible shape
can't express, drop it on vmx.providerArgs. The gateway merges it on
top of the parsed body so the field reaches the upstream SDK
unchanged.
| Provider | Common providerArgs |
|---|---|
| Perplexity | { "search_recency_filter": "week" }, { "search_domain_filter": [...] }, { "search_after_date_filter": "2024-01-01" } |
| Anthropic | { "top_k": 10 }, { "thinking": { "type": "enabled", "budget_tokens": 5000 } } (note: also expressible natively on /anthropic/messages) |
| Gemini | { "safetySettings": [...] }, { "responseMimeType": "application/json" } |
| OpenAI | { "service_tier": "scale" }, { "user": "user-7e9..." } |
| Groq | { "service_tier": "on_demand" } |
providerArgs wins over both defaultArgs (resource-level) and the
parsed request body — even for structured fields like messages and
tools. If you set providerArgs.messages, that completely replaces
the parsed messages array.
Example: Perplexity recency filter
- Python
- TypeScript
- cURL
response = client.chat.completions.create(
model="my-perplexity-resource",
messages=[{"role": "user", "content": "Latest TypeScript releases"}],
extra_body={
"vmx": {
"providerArgs": {
"search_recency_filter": "week",
"search_domain_filter": ["github.com"],
}
}
},
)
const completion = await client.chat.completions.create({
model: 'my-perplexity-resource',
messages: [{ role: 'user', content: 'Latest TypeScript releases' }],
// @ts-expect-error custom extra
vmx: {
providerArgs: {
search_recency_filter: 'week',
search_domain_filter: ['github.com'],
},
},
});
curl http://localhost:3000/v1/completion/<workspace>/<environment>/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <vmx-api-key>" \
-d '{
"model": "my-perplexity-resource",
"messages": [{"role":"user","content":"Latest TypeScript releases"}],
"vmx": {
"providerArgs": {
"search_recency_filter": "week",
"search_domain_filter": ["github.com"]
}
}
}'
__vmx_passthrough — cross-format field carrier
When you send a request in one format that a fallback provider in a
different format would otherwise drop (e.g., an OpenAI Chat Completions
request that ends up routed to Anthropic where you'd want
cache_control to take effect), VM-X stows the foreign-shape fields on
a private __vmx_passthrough envelope. The right per-pair converter
re-attaches them when the request reaches a provider that understands
them.
You generally don't write __vmx_passthrough by hand — VM-X populates
it during request conversion. But the carrier is part of the wire shape
and shows up in providerRequestPayload audit rows, so it's useful to
know what's there.
{
"model": "my-multi-provider-resource",
"messages": [{ "role": "user", "content": "..." }],
"__vmx_passthrough": {
"anthropic": {
"cache_control": { "type": "ephemeral" },
"thinking": { "type": "enabled", "budget_tokens": 1000 },
"top_k": 10,
"service_tier": "auto",
"metadata": { "user_id": "u_42" }
}
}
}
For the full carrier shape per direction, see the conversion matrix contributor doc.
correlationId and metadata in the audit / usage UIs
Both fields are first-class on the Audit and Usage pages:
- Audit — filter rows by
correlationIdto see every call in one agent run; group by anymetadata.<key>to slice traffic by team, feature, user, environment, etc. - Usage — group time-series by
metadata.<key>to break cost, tokens, requests, or latency down across any dimension you tag.
The metadata filter autocomplete observes which keys are in use
across recent rows; you don't need to register them in advance.
correlationId vs x-request-id
| Header / Field | Set by | Use for |
|---|---|---|
vmx.correlationId | Caller (this field) | Group multiple calls in your application (one agent run, one job). |
x-request-id (header) | Upstream provider | Tie a single call back to provider-side support tickets. |
x-vmx-event-count (resp.) | VM-X | Count of audit events emitted on this request (routing, fallback…). |
Next steps
- Chat Completions —
/chat/completionsreference + examples - Anthropic Messages —
/anthropic/messagesreference + examples - Responses —
/responsesreference + examples - AI Resources — what
modelresolves to and how routing/fallback compose