Skip to main content

Chat Completions

The OpenAI Chat Completions endpoint is VM-X's broadest surface — every supported provider speaks it (natively or via conversion). Reach for it when:

  • Your application already uses the OpenAI SDK.
  • You want maximum provider portability (one shape works for OpenAI, Anthropic, Gemini, Groq, Perplexity, AWS Bedrock).
  • You don't need the typed-event streaming shape of /responses or the Anthropic-specific features (cache_control, extended thinking, server tools) of /anthropic/messages.

Endpoint

POST /v1/completion/{workspaceId}/{environmentId}/chat/completions

Headers:

Content-Type: application/json
Authorization: Bearer <vmx-api-key>

Request shape: standard OpenAI Chat Completions body, plus an optional vmx envelope. Use the VM-X resource name in model, not the upstream model id.

Quick start

from openai import OpenAI

client = OpenAI(
api_key="<vmx-api-key>",
base_url="http://localhost:3000/v1/completion/<workspace>/<environment>",
)

response = client.chat.completions.create(
model="my-resource",
messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)

Ad-hoc model addressing — <connection_name>/<model>

If you don't want to pre-create an AI Resource, pass <connection_name>/<model> in the model field. VM-X looks up the connection by name in this workspace/environment and dispatches directly to the upstream model on it. Useful for scratch work and one-off calls that don't need routing or a fallback chain.

# "openai-prod" is the AI Connection name; "gpt-4o-mini" is the
# upstream OpenAI model id. No resource record required.
response = client.chat.completions.create(
model="openai-prod/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
)

The first / is the separator; anything after it is the upstream model id verbatim — so bedrock-prod/anthropic.claude-3-5-sonnet-20241022-v2:0 works as you'd expect, including the trailing :0. If no connection of that name exists, VM-X falls back to looking the literal string up as a resource name, so resource names that legitimately contain / still resolve.

Trade-off: ad-hoc addressing skips the resource layer, which means no resource-level routing, fallback, or capacity. Connection- level capacity still applies and the request is still audited. For routing / fallback / per-resource capacity, define an AI Resource and pass its name instead.

Examples

System prompt + multi-turn conversation

The system message goes first; subsequent messages alternate user / assistant.

response = client.chat.completions.create(
model="my-resource",
messages=[
{"role": "system", "content": "You are a concise senior engineer."},
{"role": "user", "content": "My name is Lucas."},
{"role": "assistant", "content": "Got it, Lucas."},
{"role": "user", "content": "What's my name?"},
],
)

Tool calling

Define your tools in the OpenAI function-calling shape; the assistant responds with tool_calls when it wants to invoke one. Send the tool result back as a tool role message keyed by tool_call_id.

tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"],
},
},
}
]

# 1. Model emits a tool_call
first = client.chat.completions.create(
model="my-resource",
messages=[{"role": "user", "content": "Weather in Tokyo?"}],
tools=tools,
tool_choice="required",
)
tc = first.choices[0].message.tool_calls[0]

# 2. Run the tool locally...
weather = {"temp_c": 22, "conditions": "clear"}

# 3. Send the result back
final = client.chat.completions.create(
model="my-resource",
messages=[
{"role": "user", "content": "Weather in Tokyo?"},
first.choices[0].message, # the assistant turn that emitted the tool_call
{"role": "tool", "tool_call_id": tc.id, "content": str(weather)},
],
tools=tools,
)
print(final.choices[0].message.content)

Streaming

Set stream: true. The wire format is OpenAI Server-Sent Events: each chunk on its own data: line, terminated by data: [DONE].

To get token usage on the final chunk, set stream_options: { include_usage: true } (VM-X also adds this automatically when stream: true is set).

stream = client.chat.completions.create(
model="my-resource",
messages=[{"role": "user", "content": "Stream a poem."}],
stream=True,
)

for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)

Multi-modal — images via image_url

Send images as a data: URL or a public URL on a user message's content array.

response = client.chat.completions.create(
model="my-vision-resource",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image."},
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..."
},
},
],
}
],
)

Bedrock-Invoke note: Bedrock-Invoke (Anthropic-on-AWS) can't fetch external image URLs server-side. Use base64 data: URLs instead, or route through Bedrock-Converse which does fetch URLs. VM-X surfaces a clean 400 with code aws_bedrock_invoke_image_url_unsupported if you hit this.

JSON mode and JSON Schema

Pin the response to JSON via response_format.

response = client.chat.completions.create(
model="my-resource",
messages=[
{"role": "system", "content": "Respond ONLY in valid JSON."},
{"role": "user", "content": "Give me a 3-key object describing TypeScript."},
],
response_format={"type": "json_object"},
)

Anthropic note: Anthropic doesn't have a native response_format field. VM-X synthesises a tool call internally so a Chat-Completions JSON-schema request still works when the resource resolves to Anthropic. The model's response is unwrapped back into message.content as a JSON string.

Attaching vmx metadata

response = client.chat.completions.create(
model="my-resource",
messages=[{"role": "user", "content": "Summarise: ..."}],
extra_body={
"vmx": {
"correlationId": "summarizer-job-2026-05-10-abc",
"metadata": {
"team": "growth",
"feature": "summarizer",
"user_id": "u_42",
},
"timeoutMs": 20_000,
}
},
)

See the full vmx envelope reference for providerArgs, secondaryModelIndex, and resourceConfigOverrides.

Provider compatibility

ProviderNative passthrough?Notes
OpenAI✅ YesDirect dispatch via client.chat.completions.create.
Anthropic✅ YesAnthropic accepts the OpenAI-compat shape natively. cache_control / thinking work via vmx.providerArgs or /anthropic/messages.
Gemini✅ YesVia Google's OpenAI-compat endpoint. Auto-routes to the native @google/genai SDK when the request carries googleSearch/urlContext/codeExecution/fileSearch tools.
Groq✅ YesVia Groq's OpenAI-compat endpoint.
Perplexity✅ YesVia Perplexity's OpenAI-compat endpoint. Web search is built into every model.
AWS Bedrock-ConverseConvertBody converted to Converse shape; cache_control from the __vmx_passthrough envelope re-applied as cachePoint blocks.
AWS Bedrock-InvokeConvertOpenAI → Anthropic → Bedrock-Invoke wire shape (Anthropic on AWS).

When the request format doesn't match the upstream's native shape, VM-X converts. Fields like cache_control and thinking ride on the private __vmx_passthrough envelope so a fallback can re-apply them end-to-end. See the conversion matrix for the per-pair details.

Errors

See the endpoint overview for the full error catalog. On streaming requests, errors that fire after the first chunk are emitted as a final data: { "error": {...} } frame followed by data: [ERROR]\n\n (the [ERROR] sentinel — distinct from the successful-end [DONE] sentinel — lets clients distinguish a clean finish from a truncated one).

Next steps