Chat Completions
The OpenAI Chat Completions endpoint is VM-X's broadest surface — every supported provider speaks it (natively or via conversion). Reach for it when:
- Your application already uses the OpenAI SDK.
- You want maximum provider portability (one shape works for OpenAI, Anthropic, Gemini, Groq, Perplexity, AWS Bedrock).
- You don't need the typed-event streaming shape of
/responsesor the Anthropic-specific features (cache_control, extendedthinking, server tools) of/anthropic/messages.
Endpoint
POST /v1/completion/{workspaceId}/{environmentId}/chat/completions
Headers:
Content-Type: application/json
Authorization: Bearer <vmx-api-key>
Request shape: standard OpenAI Chat Completions body, plus an optional
vmx envelope. Use the VM-X resource name in
model, not the upstream model id.
Quick start
- Python (OpenAI SDK)
- TypeScript (OpenAI SDK)
- cURL
from openai import OpenAI
client = OpenAI(
api_key="<vmx-api-key>",
base_url="http://localhost:3000/v1/completion/<workspace>/<environment>",
)
response = client.chat.completions.create(
model="my-resource",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: '<vmx-api-key>',
baseURL: 'http://localhost:3000/v1/completion/<workspace>/<environment>',
});
const completion = await client.chat.completions.create({
model: 'my-resource',
messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(completion.choices[0].message.content);
curl http://localhost:3000/v1/completion/<workspace>/<environment>/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <vmx-api-key>" \
-d '{
"model": "my-resource",
"messages": [{"role":"user","content":"Hello!"}]
}'
Ad-hoc model addressing — <connection_name>/<model>
If you don't want to pre-create an AI Resource, pass
<connection_name>/<model> in the model field. VM-X looks up the
connection by name in this workspace/environment and dispatches
directly to the upstream model on it. Useful for scratch work and
one-off calls that don't need routing or a fallback chain.
- Python (OpenAI SDK)
- TypeScript (OpenAI SDK)
- cURL
# "openai-prod" is the AI Connection name; "gpt-4o-mini" is the
# upstream OpenAI model id. No resource record required.
response = client.chat.completions.create(
model="openai-prod/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
)
const completion = await client.chat.completions.create({
model: 'openai-prod/gpt-4o-mini',
messages: [{ role: 'user', content: 'Hello!' }],
});
curl http://localhost:3000/v1/completion/<workspace>/<environment>/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <vmx-api-key>" \
-d '{
"model": "openai-prod/gpt-4o-mini",
"messages": [{"role":"user","content":"Hello!"}]
}'
The first / is the separator; anything after it is the upstream
model id verbatim — so
bedrock-prod/anthropic.claude-3-5-sonnet-20241022-v2:0 works as you'd
expect, including the trailing :0. If no connection of that name
exists, VM-X falls back to looking the literal string up as a resource
name, so resource names that legitimately contain / still resolve.
Trade-off: ad-hoc addressing skips the resource layer, which means no resource-level routing, fallback, or capacity. Connection- level capacity still applies and the request is still audited. For routing / fallback / per-resource capacity, define an AI Resource and pass its name instead.
Examples
System prompt + multi-turn conversation
The system message goes first; subsequent messages alternate
user / assistant.
- Python
- TypeScript
- cURL
response = client.chat.completions.create(
model="my-resource",
messages=[
{"role": "system", "content": "You are a concise senior engineer."},
{"role": "user", "content": "My name is Lucas."},
{"role": "assistant", "content": "Got it, Lucas."},
{"role": "user", "content": "What's my name?"},
],
)
const completion = await client.chat.completions.create({
model: 'my-resource',
messages: [
{ role: 'system', content: 'You are a concise senior engineer.' },
{ role: 'user', content: 'My name is Lucas.' },
{ role: 'assistant', content: 'Got it, Lucas.' },
{ role: 'user', content: "What's my name?" },
],
});
curl http://localhost:3000/v1/completion/<workspace>/<environment>/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <vmx-api-key>" \
-d '{
"model": "my-resource",
"messages": [
{"role":"system","content":"You are a concise senior engineer."},
{"role":"user","content":"My name is Lucas."},
{"role":"assistant","content":"Got it, Lucas."},
{"role":"user","content":"What is my name?"}
]
}'
Tool calling
Define your tools in the OpenAI function-calling shape; the assistant
responds with tool_calls when it wants to invoke one. Send the tool
result back as a tool role message keyed by tool_call_id.
- Python
- TypeScript
- cURL
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"],
},
},
}
]
# 1. Model emits a tool_call
first = client.chat.completions.create(
model="my-resource",
messages=[{"role": "user", "content": "Weather in Tokyo?"}],
tools=tools,
tool_choice="required",
)
tc = first.choices[0].message.tool_calls[0]
# 2. Run the tool locally...
weather = {"temp_c": 22, "conditions": "clear"}
# 3. Send the result back
final = client.chat.completions.create(
model="my-resource",
messages=[
{"role": "user", "content": "Weather in Tokyo?"},
first.choices[0].message, # the assistant turn that emitted the tool_call
{"role": "tool", "tool_call_id": tc.id, "content": str(weather)},
],
tools=tools,
)
print(final.choices[0].message.content)
const tools = [
{
type: 'function' as const,
function: {
name: 'get_weather',
description: 'Get the current weather for a city.',
parameters: {
type: 'object',
properties: { location: { type: 'string' } },
required: ['location'],
},
},
},
];
const first = await client.chat.completions.create({
model: 'my-resource',
messages: [{ role: 'user', content: 'Weather in Tokyo?' }],
tools,
tool_choice: 'required',
});
const tc = first.choices[0].message.tool_calls![0];
const weather = { temp_c: 22, conditions: 'clear' };
const final = await client.chat.completions.create({
model: 'my-resource',
messages: [{ role: 'user', content: 'Weather in Tokyo?' }, first.choices[0].message, { role: 'tool', tool_call_id: tc.id, content: JSON.stringify(weather) }],
tools,
});
curl http://localhost:3000/v1/completion/<workspace>/<environment>/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <vmx-api-key>" \
-d '{
"model": "my-resource",
"messages": [{"role":"user","content":"Weather in Tokyo?"}],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {"location": {"type":"string"}},
"required": ["location"]
}
}
}],
"tool_choice": "required"
}'
Streaming
Set stream: true. The wire format is OpenAI Server-Sent Events: each
chunk on its own data: line, terminated by data: [DONE].
To get token usage on the final chunk, set
stream_options: { include_usage: true } (VM-X also adds this
automatically when stream: true is set).
- Python
- TypeScript
- cURL
stream = client.chat.completions.create(
model="my-resource",
messages=[{"role": "user", "content": "Stream a poem."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)
const stream = await client.chat.completions.create({
model: 'my-resource',
messages: [{ role: 'user', content: 'Stream a poem.' }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
curl http://localhost:3000/v1/completion/<workspace>/<environment>/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <vmx-api-key>" \
-N -d '{
"model": "my-resource",
"messages": [{"role":"user","content":"Stream a poem."}],
"stream": true
}'
The -N flag disables curl's output buffering so you see chunks as
they arrive. Each line is data: <json>\n\n until the final
data: [DONE].
Multi-modal — images via image_url
Send images as a data: URL or a public URL on a user message's
content array.
- Python
- TypeScript
- cURL
response = client.chat.completions.create(
model="my-vision-resource",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image."},
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..."
},
},
],
}
],
)
const completion = await client.chat.completions.create({
model: 'my-vision-resource',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'Describe this image.' },
{
type: 'image_url',
image_url: {
url: 'data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...',
},
},
],
},
],
});
curl http://localhost:3000/v1/completion/<workspace>/<environment>/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <vmx-api-key>" \
-d '{
"model": "my-vision-resource",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image."},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..."}}
]
}]
}'
Bedrock-Invoke note: Bedrock-Invoke (Anthropic-on-AWS) can't fetch external image URLs server-side. Use base64
data:URLs instead, or route through Bedrock-Converse which does fetch URLs. VM-X surfaces a clean 400 with codeaws_bedrock_invoke_image_url_unsupportedif you hit this.
JSON mode and JSON Schema
Pin the response to JSON via response_format.
- Python (json_object)
- Python (json_schema)
- TypeScript (json_schema)
- cURL
response = client.chat.completions.create(
model="my-resource",
messages=[
{"role": "system", "content": "Respond ONLY in valid JSON."},
{"role": "user", "content": "Give me a 3-key object describing TypeScript."},
],
response_format={"type": "json_object"},
)
response = client.chat.completions.create(
model="my-resource",
messages=[{"role": "user", "content": "Pick a country: Brazil."}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "country",
"strict": True,
"schema": {
"type": "object",
"properties": {
"city": {"type": "string"},
"country_code": {"type": "string"},
},
"required": ["city", "country_code"],
"additionalProperties": False,
},
},
},
)
const completion = await client.chat.completions.create({
model: 'my-resource',
messages: [{ role: 'user', content: 'Pick a country: Brazil.' }],
response_format: {
type: 'json_schema',
json_schema: {
name: 'country',
strict: true,
schema: {
type: 'object',
properties: {
city: { type: 'string' },
country_code: { type: 'string' },
},
required: ['city', 'country_code'],
additionalProperties: false,
},
},
},
});
curl http://localhost:3000/v1/completion/<workspace>/<environment>/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <vmx-api-key>" \
-d '{
"model": "my-resource",
"messages": [{"role":"user","content":"Pick a country: Brazil."}],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "country",
"strict": true,
"schema": {
"type": "object",
"properties": {
"city": {"type":"string"},
"country_code": {"type":"string"}
},
"required": ["city","country_code"],
"additionalProperties": false
}
}
}
}'
Anthropic note: Anthropic doesn't have a native
response_formatfield. VM-X synthesises a tool call internally so a Chat-Completions JSON-schema request still works when the resource resolves to Anthropic. The model's response is unwrapped back intomessage.contentas a JSON string.
Attaching vmx metadata
- Python
- TypeScript
- cURL
response = client.chat.completions.create(
model="my-resource",
messages=[{"role": "user", "content": "Summarise: ..."}],
extra_body={
"vmx": {
"correlationId": "summarizer-job-2026-05-10-abc",
"metadata": {
"team": "growth",
"feature": "summarizer",
"user_id": "u_42",
},
"timeoutMs": 20_000,
}
},
)
const completion = await client.chat.completions.create({
model: 'my-resource',
messages: [{ role: 'user', content: 'Summarise: ...' }],
// @ts-expect-error custom extra
vmx: {
correlationId: 'summarizer-job-2026-05-10-abc',
metadata: { team: 'growth', feature: 'summarizer', user_id: 'u_42' },
timeoutMs: 20_000,
},
});
curl http://localhost:3000/v1/completion/<workspace>/<environment>/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <vmx-api-key>" \
-d '{
"model": "my-resource",
"messages": [{"role":"user","content":"Summarise: ..."}],
"vmx": {
"correlationId": "summarizer-job-2026-05-10-abc",
"metadata": {
"team": "growth",
"feature": "summarizer",
"user_id": "u_42"
},
"timeoutMs": 20000
}
}'
See the full vmx envelope reference for providerArgs,
secondaryModelIndex, and resourceConfigOverrides.
Provider compatibility
| Provider | Native passthrough? | Notes |
|---|---|---|
| OpenAI | ✅ Yes | Direct dispatch via client.chat.completions.create. |
| Anthropic | ✅ Yes | Anthropic accepts the OpenAI-compat shape natively. cache_control / thinking work via vmx.providerArgs or /anthropic/messages. |
| Gemini | ✅ Yes | Via Google's OpenAI-compat endpoint. Auto-routes to the native @google/genai SDK when the request carries googleSearch/urlContext/codeExecution/fileSearch tools. |
| Groq | ✅ Yes | Via Groq's OpenAI-compat endpoint. |
| Perplexity | ✅ Yes | Via Perplexity's OpenAI-compat endpoint. Web search is built into every model. |
| AWS Bedrock-Converse | Convert | Body converted to Converse shape; cache_control from the __vmx_passthrough envelope re-applied as cachePoint blocks. |
| AWS Bedrock-Invoke | Convert | OpenAI → Anthropic → Bedrock-Invoke wire shape (Anthropic on AWS). |
When the request format doesn't match the upstream's native shape, VM-X
converts. Fields like cache_control and thinking ride on the
private __vmx_passthrough envelope so a fallback can re-apply them
end-to-end. See the conversion matrix
for the per-pair details.
Errors
See the endpoint overview for the full error catalog.
On streaming requests, errors that fire after the first chunk are
emitted as a final data: { "error": {...} } frame followed by
data: [ERROR]\n\n (the [ERROR] sentinel — distinct from the
successful-end [DONE] sentinel — lets clients distinguish a clean
finish from a truncated one).
Next steps
- VM-X envelope —
correlationId,metadata,providerArgs, … - Anthropic Messages —
cache_control, extended thinking, server tools - Web search — provider-by-provider web search guide
- AI Resources — how
modelresolves to a provider + model