Skip to main content

Responses

OpenAI's Responses API is a typed-event interface for agentic loops. It uses input (string or array of items) instead of messages, ships a typed event stream where every event has its own event: line, and adds first-class support for reasoning, hosted tools, and instructions as a system-prompt field.

Reach for /responses when:

  • You're using OpenAI's agentic SDK (client.responses.create).
  • You need typed event streaming with named events (response.output_text.delta, response.completed, …).
  • You want OpenAI's hosted tools — web_search, code_interpreter, file_search, computer_use.
  • You're using reasoning effort (reasoning: { effort: '...' }) on o-series or Opus models.

Endpoint

POST /v1/completion/{workspaceId}/{environmentId}/responses

Headers:

Content-Type: application/json
Authorization: Bearer <vmx-api-key>

Request shape: standard OpenAI Responses body, plus an optional vmx envelope. Use the VM-X resource name in model.

Quick start

from openai import OpenAI

client = OpenAI(
api_key="<vmx-api-key>",
base_url="http://localhost:3000/v1/completion/<workspace>/<environment>",
)

response = client.responses.create(
model="my-resource",
input="Hello!",
instructions="Be concise.",
)

# Pull the assistant text out of the typed `output[]`.
for item in response.output:
if item.type == "message":
for part in item.content:
if part.type == "output_text":
print(part.text)

Ad-hoc model addressing — <connection_name>/<model>

If you don't want to pre-create an AI Resource, pass <connection_name>/<model> in the model field. VM-X looks up the connection by name in this workspace/environment and dispatches directly to the upstream model on it. Useful for scratch work and one-off calls that don't need routing or a fallback chain.

# "openai-prod" is the AI Connection name; "gpt-4o-mini" is the
# upstream OpenAI model id. No resource record required.
response = client.responses.create(
model="openai-prod/gpt-4o-mini",
input="Hello!",
)

The first / is the separator; anything after it is the upstream model id verbatim — so bedrock-prod/anthropic.claude-3-5-sonnet-20241022-v2:0 works as you'd expect, including the trailing :0. If no connection of that name exists, VM-X falls back to looking the literal string up as a resource name, so resource names that legitimately contain / still resolve.

Trade-off: ad-hoc addressing skips the resource layer, which means no resource-level routing, fallback, or capacity. Connection- level capacity still applies and the request is still audited. For routing / fallback / per-resource capacity, define an AI Resource and pass its name instead.

Shape primer — what's different from Chat Completions

FieldChat CompletionsResponses
Conversationmessages[]input (string or input[] of typed items)
System promptmessages[role:system]instructions (top-level)
Tokens capmax_tokensmax_output_tokens
Tool definitiontools[].functiontools[] (type: 'function' | 'web_search')
Reasoning controln/areasoning: { effort: 'low' | 'medium' | 'high' }
Streaming eventsdata: {chunk}event: <type>\ndata: {…}\n\n per event
Stop reasonfinish_reasonstatus (completed / incomplete / …)
Outputchoices[].messageoutput[] (typed items: message, function_call, reasoning)

Examples

Multi-message input items

When you need a multi-turn conversation, send input as an array of message items:

response = client.responses.create(
model="my-resource",
input=[
{
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "My name is Lucas."}],
},
{
"type": "message",
"role": "assistant",
"content": [
{"type": "output_text", "text": "Hello, Lucas. How can I help?"}
],
},
{
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "What's my name?"}],
},
],
)

assistant content part type: assistant messages use output_text (not input_text), even on the input side. VM-X normalises this for you when an OpenAI Responses request lands on a non-OpenAI provider.

Function tools

Responses-shape tools are flatter than Chat Completions — name and parameters live at the top of the tool object.

response = client.responses.create(
model="my-resource",
input="Weather in Tokyo?",
tools=[
{
"type": "function",
"name": "get_weather",
"description": "Get current weather for a city.",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"],
},
}
],
tool_choice="required",
)

# tool calls land as function_call items in output[]
for item in response.output:
if item.type == "function_call":
print(item.name, item.arguments)

Tool result round-trip

Send the function's output back as a function_call_output item keyed by call_id:

# Continuing from the previous example...
fc = next(o for o in response.output if o.type == "function_call")

final = client.responses.create(
model="my-resource",
input=[
{
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "Weather in Tokyo?"}],
},
fc, # the function_call from the previous turn
{
"type": "function_call_output",
"call_id": fc.call_id,
"output": '{"temp_c": 22, "conditions": "clear"}',
},
],
)

Reasoning effort (o-series, Opus)

Set reasoning: { effort: 'low' | 'medium' | 'high' }. The model allocates more or less time to reasoning before producing the final output.

response = client.responses.create(
model="my-reasoning-resource", # e.g. an o4-mini or claude-opus-4-x resource
input="Prove there are infinitely many primes.",
reasoning={"effort": "high"},
max_output_tokens=2000,
)

Cross-provider note: when a Responses request resolves to Anthropic, reasoning.effort maps to Anthropic's thinking.budget_tokens tier (low → 1k, medium → 4k, high → 12k tokens). The reasoning content comes back as reasoning items in output[].

Web search (hosted tool)

Web search is a built-in server-side tool — no function definition needed. Add { type: 'web_search' } to tools[]. The model VM-X dispatches to must support the tool natively (OpenAI Responses-capable search models, Anthropic Claude with the web_search_20250305 server tool, etc.). For routes through OpenAI-compat upstreams that don't speak hosted tools (Gemini / Groq / Perplexity), the gateway returns 400 responses_unsupported_tool_type.

response = client.responses.create(
model="my-resource",
input="What's the latest version of TypeScript? Cite sources.",
tools=[{"type": "web_search"}],
)

See the dedicated web search guide for citations, recency filters, and provider-by-provider behaviour.

Streaming — typed events

Responses streams are typed: every event has its own event: name followed by a data: JSON frame.

stream = client.responses.create(
model="my-resource",
input="Stream a poem.",
stream=True,
)

for event in stream:
# event.type discriminates: response.created,
# response.output_text.delta, response.completed, etc.
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)
elif event.type == "response.completed":
print() # final newline

Common event types VM-X forwards:

EventWhen
response.createdResponse object is registered (with id).
response.in_progressGeneration started.
response.output_item.addedA new top-level output item begins (message/function_call/reasoning).
response.content_part.addedA content part begins inside a message item.
response.output_text.deltaStreaming text delta.
response.function_call_arguments.deltaStreaming JSON args delta on a function call.
response.reasoning_summary_text.deltaStreaming reasoning text delta.
response.output_item.doneAn output item finished.
response.completedStream done; final response object on the event.
errorMid-stream error frame.

Attaching vmx metadata

response = client.responses.create(
model="my-resource",
input="Pick a number 1-10.",
extra_body={
"vmx": {
"correlationId": "agent-run-2026-05-10-abc",
"metadata": {"team": "growth", "experiment": "exp_42"},
"timeoutMs": 30_000,
}
},
)

Provider compatibility

ProviderNative passthrough?Notes
OpenAI✅ YesDirect dispatch via client.responses.create.
AnthropicConvert (D5)Direct Responses↔Anthropic adapter — no internal pivot through Chat Completions. Reasoning effort → thinking.budget_tokens; reasoning content comes back as reasoning items.
AWS Bedrock-ConverseConvertDirect Responses↔Converse adapter.
AWS Bedrock-InvokeConvertResponses → Anthropic (canonical adapter) → Bedrock-Invoke wire shape.
GeminiConvertVia Chat Completions on Google's OpenAI-compat endpoint.
GroqConvertVia Chat Completions.
PerplexityConvertVia Chat Completions.

For the per-pair conversion details (which Responses fields survive each conversion path), see the conversion matrix.

Errors

See the endpoint overview for the full error catalog. On streaming requests, mid-stream errors are emitted as a single typed event:

event: error
data: {"error": {"message": "...", "code": "..."}}

There is no trailing [DONE] sentinel — the stream simply terminates after the error frame. Clients consuming with the OpenAI SDK pick this up via the typed event stream's error discriminator.

Next steps