Skip to main content

API Endpoints

VM-X exposes three completion endpoints under each workspace/environment. All three feed the same routing / fallback / capacity / audit pipeline — so you get the same guarantees regardless of which API shape your client speaks.

EndpointPathRequest shapeNative client
Chat Completions (most common)POST /v1/completion/{workspaceId}/{environmentId}/chat/completionsOpenAI Chat Completionsopenai SDK
Anthropic MessagesPOST /v1/completion/{workspaceId}/{environmentId}/anthropic/messagesAnthropic Messages@anthropic-ai/sdk
ResponsesPOST /v1/completion/{workspaceId}/{environmentId}/responsesOpenAI Responsesopenai SDK

Pick the endpoint that matches the SDK your application already uses. You don't need to bring a third dependency just to talk to VM-X.

Which endpoint should I use?

Already using the OpenAI SDK?
├── Yes → /chat/completions (most common, broadest provider compatibility)
│ …or /responses if you need typed event streaming, agentic loops,
│ or OpenAI-native server tools (web_search, etc.)
└── No
├── Already using @anthropic-ai/sdk? → /anthropic/messages
│ (full access to cache_control, extended thinking, server tools,
│ service_tier, top_k, etc.)
└── Building from scratch? → /chat/completions for the broadest
ecosystem; /responses if you'll lean
into OpenAI's agentic loop primitives.

Why three endpoints?

Different teams standardise on different SDKs:

  • The OpenAI SDK is the most-used one and what most tutorials show.
  • Anthropic's @anthropic-ai/sdk is the canonical client for Claude features that don't have an OpenAI equivalent — cache_control, extended thinking, server tools (web_search, code_execution, bash, text_editor, computer), service_tier, refusal stop details, and so on.
  • The Responses API is OpenAI's newer event-typed shape used by their agentic loops.

VM-X accepts all three on the wire so you can integrate without rewriting the call site, and routes the request through the same gateway pipeline regardless of the format.

Format passthrough — what survives end-to-end

When the request format matches the upstream provider's native shape, VM-X sends the body verbatim with no conversion. Anthropic-only features that have no OpenAI equivalent (cache_control, extended thinking, top_k, server tools, service_tier, …) round-trip losslessly:

EndpointNative passthrough providers
Chat CompletionsOpenAI, Anthropic-via-OpenAI-compat, Gemini, Groq, Perplexity
Anthropic MessagesAnthropic (native SDK), AWS Bedrock-Invoke (Claude on AWS)
ResponsesOpenAI

When the request format does not match the upstream's native shape (e.g., an Anthropic Messages request routed to Gemini), the gateway converts to OpenAI Chat Completions internally for dispatch and converts the response back on the way out. Fields the conversion can't express (cache markers, server tools, …) are stowed on a private __vmx_passthrough envelope so a fallback to a native-format provider later in the chain can re-attach them; fields that are strictly OpenAI-specific are dropped (logged), never silently corrupted.

See the AI provider architecture contributor doc for the full conversion matrix.

Authentication

All three endpoints accept either of two auth header forms:

Authorization: Bearer <vmx-api-key>

…or:

x-api-key: <vmx-api-key>

The API key is scoped to a workspace + environment and an allow-list of AI Resources. Find it under Security → API Keys in the dashboard.

Resource-name model

Every endpoint takes the AI Resource name in the model field — not the upstream provider's model id. The resource decides which provider, which model, what routing rules, what fallback chain.

{
"model": "my-resource", // ← VM-X resource name, NOT 'gpt-4o-mini'
"messages": [{ "role": "user", "content": "..." }]
}

Ad-hoc addressing — <connection_name>/<model>

If you don't want to pre-create an AI Resource, pass <connection_name>/<model> in the model field. The gateway looks up the connection by name in this workspace/environment and dispatches the request directly to that connection + upstream model — no resource record needed.

{
// ← uses the "openai-prod" connection, gpt-4o-mini model
"model": "openai-prod/gpt-4o-mini",
"messages": [{ "role": "user", "content": "..." }]
}

The first / is the separator; anything after it is the upstream model id verbatim (so bedrock-prod/anthropic.claude-3-5-sonnet-20241022-v2:0 works as you'd expect, including the :0 suffix). If no connection of that name exists, VM-X falls back to looking the literal string up as a resource name — so resource names containing / still resolve normally.

Trade-off: ad-hoc addressing bypasses resource-level routing, fallback, and capacity (connection-level capacity still applies, and every request is still audited). Use it for one-off calls, scratch work, or when you've intentionally chosen to skip the resource layer. For routing and fallback chains, define an AI Resource and pass its name instead.

To override an existing resource's model on a single request without the / shortcut, see vmx.resourceConfigOverrides on the vmx envelope page.

Headers VM-X adds to every response

HeaderValue
x-vmx-modelThe model that actually ran (after routing + fallback resolution).
x-vmx-providerProvider id (openai / anthropic / aws-bedrock / …).
x-vmx-connection-idUUID of the AI Connection used.
x-vmx-gate-duration-msTime the gate took to evaluate capacity + prioritization (ms).
x-vmx-routing-duration-msTime the routing service took to pick a model (ms). Absent when no routing was evaluated.
x-vmx-event-countNumber of audit events emitted on the request (routing, fallback, …).
x-vmx-metadata-<key>Echo of every vmx.metadata entry, lower-cased and CRLF-stripped.
x-request-idForwarded from the upstream provider when present.
x-ratelimit-*Rate-limit headers from the upstream, normalised to OpenAI's shape.

Errors

The gateway's error responses follow the OpenAI error shape:

{
"error": {
"message": "Resource has reached the limit of requests",
"code": "resource_exhausted"
}
}

Two flavours of error.code show up on the wire:

  • OpenAI-compatible codes (lower-case, set on CompletionError.openAICompatibleError) for completion-time failures — these are the codes you'll match on in client retry logic.
  • VM-X service-error codes (UPPER_SNAKE_CASE, the ErrorCode enum) for resource / auth / lookup failures raised before dispatch.

Common codes:

HTTP statuserror.codeMeaning
400invalid_requestMalformed upstream-provider body, unsupported parameter, or unsafe outbound URL (url-safety guard).
400blocked_by_routing_conditionA routing rule with action: BLOCK matched.
400aws_bedrock_invoke_image_url_unsupportedBedrock-Invoke can't fetch external image URLs server-side; use base64 data: URLs.
400WORKSPACE_NOT_MEMBERThe authenticated principal isn't a member of the addressed workspace.
401 / 403(NestJS guard message)API key missing / invalid, or principal lacks RoleGuard permission. No error.code is set.
404AI_RESOURCE_NOT_FOUNDResource name doesn't exist in this workspace/environment.
404API_KEY_NOT_FOUNDThe API key id wasn't found.
404API_KEY_RESOURCE_NOT_AUTHORIZEDThe API key isn't allow-listed for the requested resource.
429resource_exhaustedCapacity gate denied (RPM/TPM cap hit) — see Retry-After header on the response.
429prioritization_gate_deniedPrioritization gate denied because the pool was over its share.
5xxprovider-specificUpstream provider error — the gateway propagates the upstream's status + body.

Errors that occur after streaming has started are emitted in the endpoint's native streaming envelope:

EndpointMid-stream error frame
Chat Completionsdata: { "error": {...} }\n\n followed by data: [ERROR]\n\n
Responsesevent: error\ndata: { "error": {...} }\n\n (no [DONE] tail)
Anthropic Messagesevent: error\ndata: { "error": {...} }\n\n (no [DONE] tail)

Pages in this section

Next steps