Skip to main content

Anthropic Messages

Anthropic's Messages API is the canonical client for Claude features that don't have an OpenAI equivalent — cache_control for prompt caching, extended thinking, the full server-tools suite (web_search_*, code_execution_*, bash_*, text_editor_*, computer_*), service_tier, top_k, and refusal stop details.

Reach for /anthropic/messages when:

  • Your application already uses @anthropic-ai/sdk.
  • You want native access to Anthropic-only features without going through a passthrough envelope.
  • You want every typed streaming event with its event: line so you can drive an Anthropic-shaped client end-to-end.

Endpoint

POST /v1/completion/{workspaceId}/{environmentId}/anthropic/messages

Headers:

Content-Type: application/json
Authorization: Bearer <vmx-api-key>

Request shape: standard Anthropic Messages body, plus an optional vmx envelope. Use the VM-X resource name in model.

max_tokens is required. Unlike OpenAI Chat Completions where max_tokens is optional, Anthropic always requires it. The gateway enforces this at the validation boundary — a request without max_tokens returns a 400.

Quick start

import anthropic

client = anthropic.Anthropic(
api_key="<vmx-api-key>",
base_url="http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic",
)

message = client.messages.create(
model="my-resource",
max_tokens=512,
messages=[{"role": "user", "content": "Hello!"}],
)

# content is an array of typed blocks (text, tool_use, thinking, …)
for block in message.content:
if block.type == "text":
print(block.text)

Ad-hoc model addressing — <connection_name>/<model>

If you don't want to pre-create an AI Resource, pass <connection_name>/<model> in the model field. VM-X looks up the connection by name in this workspace/environment and dispatches directly to the upstream model on it. Useful for scratch work and one-off calls that don't need routing or a fallback chain.

# "anthropic-prod" is the AI Connection name; the rest is the
# upstream Anthropic model id verbatim. No resource record required.
message = client.messages.create(
model="anthropic-prod/claude-3-5-sonnet-20241022",
max_tokens=512,
messages=[{"role": "user", "content": "Hello!"}],
)

The first / is the separator; anything after it is the upstream model id verbatim — so bedrock-prod/anthropic.claude-3-5-sonnet-20241022-v2:0 works as you'd expect on a Bedrock-Invoke connection (including the trailing :0). If no connection of that name exists, VM-X falls back to looking the literal string up as a resource name, so resource names that legitimately contain / still resolve.

Trade-off: ad-hoc addressing skips the resource layer, which means no resource-level routing, fallback, or capacity. Connection- level capacity still applies and the request is still audited. For routing / fallback / per-resource capacity, define an AI Resource and pass its name instead.

Examples

Top-level system prompt

System prompts go on the top level (not inside messages[]).

message = client.messages.create(
model="my-resource",
max_tokens=256,
system="You are a concise senior engineer.",
messages=[{"role": "user", "content": "Why are mutexes hard?"}],
)

Multi-turn conversation

Like Chat Completions, alternating user / assistant messages.

message = client.messages.create(
model="my-resource",
max_tokens=128,
messages=[
{"role": "user", "content": "My name is Lucas."},
{"role": "assistant", "content": "Hello, Lucas."},
{"role": "user", "content": "What's my name?"},
],
)

Tool use round-trip

Anthropic tools have name, description, and input_schema (JSON Schema). The model emits a tool_use content block; you respond with a user message whose content includes a tool_result block keyed by tool_use_id.

tools = [
{
"name": "get_weather",
"description": "Get the current weather in a city",
"input_schema": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"],
},
}
]

# 1. Model emits a tool_use block.
first = client.messages.create(
model="my-resource",
max_tokens=512,
tools=tools,
tool_choice={"type": "any"},
messages=[{"role": "user", "content": "Weather in Tokyo?"}],
)

# Find the tool_use block.
tu = next(b for b in first.content if b.type == "tool_use")

# 2. Run the tool locally...
result = {"temp_c": 22, "conditions": "clear"}

# 3. Send the result back as a tool_result on a user turn.
final = client.messages.create(
model="my-resource",
max_tokens=512,
tools=tools,
messages=[
{"role": "user", "content": "Weather in Tokyo?"},
{"role": "assistant", "content": first.content}, # the assistant's full reply
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": tu.id,
"content": str(result),
}
],
},
],
)

tool_choice accepts:

  • { "type": "auto" } — model decides (default when tools is set).
  • { "type": "any" } — model must use a tool, but picks which one.
  • { "type": "tool", "name": "get_weather" } — force a specific tool.
  • { "type": "none" } — model must not use tools. Native Anthropic and Bedrock-Invoke pass this through verbatim. On Bedrock-Converse (which has no equivalent), VM-X strips the tools array from the wire body so the model can't call them (T11).

Prompt caching with cache_control

Mark a content block with cache_control: { type: 'ephemeral' } so Anthropic can cache the prefix and skip re-tokenising on subsequent calls. Cacheable on system, tools, and messages.

SYSTEM = "You are answering questions about a single, large document. " * 200

# First call writes the cache. Look at usage.cache_creation_input_tokens.
first = client.messages.create(
model="my-resource",
max_tokens=128,
system=[
{
"type": "text",
"text": SYSTEM,
"cache_control": {"type": "ephemeral"},
}
],
messages=[{"role": "user", "content": "Who's it about?"}],
)
print("Wrote:", first.usage.cache_creation_input_tokens)

# Second call hits the cache. Look at usage.cache_read_input_tokens.
second = client.messages.create(
model="my-resource",
max_tokens=128,
system=[
{
"type": "text",
"text": SYSTEM,
"cache_control": {"type": "ephemeral"},
}
],
messages=[{"role": "user", "content": "Three keywords?"}],
)
print("Read:", second.usage.cache_read_input_tokens)

Cross-provider caching: when an Anthropic Messages request lands on AWS Bedrock-Converse, VM-X translates cache_control blocks to Bedrock's cachePoint blocks. The cache hit/write tokens come back on usage.cache_creation_input_tokens / cache_read_input_tokens in both directions.

Extended thinking

Set thinking: { type: 'adaptive' } on Opus 4.6+ / Sonnet 4.6+ — the model decides how much to think. For older Claude versions, use thinking: { type: 'enabled', budget_tokens: N } (where budget_tokens < max_tokens, minimum 1024).

message = client.messages.create(
model="my-claude-opus-resource",
max_tokens=4096,
thinking={"type": "adaptive"},
messages=[{"role": "user", "content": "Prove there are infinitely many primes."}],
)

# Inspect the thinking block in content[]
for block in message.content:
if block.type == "thinking":
print("Thinking:", block.thinking[:200], "…")
elif block.type == "text":
print("Final:", block.text)

thinking blocks include a signature that the model uses to verify continuity across turns. When you echo a prior assistant reply back as a messages[].content array, keep the thinking block (with its signature) intact — Anthropic validates it server-side.

Server tools (web search, code execution, …)

Anthropic's hosted tools run on Anthropic's side; you don't implement the function. Add the tool definition to tools[] and the model uses it autonomously.

message = client.messages.create(
model="my-resource",
max_tokens=2048,
messages=[{"role": "user", "content": "Latest TypeScript release? Cite sources."}],
tools=[
{
"type": "web_search_20250305",
"name": "web_search",
"max_uses": 3,
}
],
)

# The response contains web_search_tool_result blocks + text with citations.

See the web search guide for citation handling and the full provider matrix.

Streaming — typed events with event: lines

Anthropic's stream envelope tags every event with its event: name on its own line, followed by a data: JSON frame. VM-X forwards the exact wire format.

with client.messages.stream(
model="my-resource",
max_tokens=512,
messages=[{"role": "user", "content": "Stream a poem."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)

Mid-stream errors are emitted as a typed event: error frame followed by stream termination — clients consuming with the Anthropic SDK pick this up automatically.

betas array (beta-feature opt-in)

Anthropic's beta-features header (anthropic-beta) takes a comma-separated list of feature flags. The Anthropic SDK exposes this as a betas: string[] field on the request; VM-X lifts it off the body and emits it as the anthropic-beta HTTP header before dispatching to Anthropic's API (Anthropic's native API rejects betas as a body field — but Bedrock-Invoke accepts it on the body, so VM-X preserves the body shape and adapts at the wire layer).

message = client.messages.create(
model="my-resource",
max_tokens=4096,
thinking={"type": "enabled", "budget_tokens": 2000},
messages=[{"role": "user", "content": "Reason about this..."}],
extra_body={"betas": ["interleaved-thinking-2025-05-14"]},
)

Attaching vmx metadata

message = client.messages.create(
model="my-resource",
max_tokens=512,
messages=[{"role": "user", "content": "Summarise: ..."}],
extra_body={
"vmx": {
"correlationId": "summarizer-2026-05-10",
"metadata": {"team": "growth", "user_id": "u_42"},
"timeoutMs": 25_000,
}
},
)

Provider compatibility

ProviderNative passthrough?Notes
Anthropic✅ Yes (native)True end-to-end passthrough — cache_control, thinking, server tools, service_tier, refusal stop details all round-trip.
AWS Bedrock-Invoke✅ Yes (native)Claude on AWS — same wire shape, plus the Bedrock anthropic_version discriminator. External image URLs are rejected up-front (aws_bedrock_invoke_image_url_unsupported); use base64 sources.
AWS Bedrock-ConverseConvertDirect Anthropic↔Converse adapter — cache_controlcachePoint, server tools mapped to Converse equivalents where supported.
OpenAIConvert (D5)Direct Anthropic↔Responses adapter (no internal pivot through Chat Completions). thinkingreasoning.effort, tool_usefunction_call.
GeminiConvertVia Chat Completions on Google's OpenAI-compat endpoint.
GroqConvertVia Chat Completions.
PerplexityConvertVia Chat Completions.

For the per-pair conversion details (which Anthropic fields survive each conversion path), see the conversion matrix.

Errors

See the endpoint overview for the full error catalog. On streaming, the gateway emits a typed event: error frame (event: error\ndata: { "error": {...} }\n\n) and terminates the stream — there is no trailing [DONE] sentinel; Anthropic's SDK MessageStream picks the error event up by name. Long-running streams also receive periodic event: ping heartbeats every ~10s (T3) so idle proxies don't close the connection during slow tool use.

The gateway maps Anthropic's anthropic-ratelimit-* response headers to OpenAI's x-ratelimit-* shape so your rate-limit accounting code doesn't need to know which provider it just talked to.

Next steps