Anthropic Messages

Anthropic's Messages API is the canonical client for Claude features that don't have an OpenAI equivalent — cache_control for prompt caching, extended thinking, the full server-tools suite (web_search_*, code_execution_*, bash_*, text_editor_*, computer_*), service_tier, top_k, and refusal stop details.

Reach for /anthropic/messages when:

Your application already uses @anthropic-ai/sdk.
You want native access to Anthropic-only features without going through a passthrough envelope.
You want every typed streaming event with its event: line so you can drive an Anthropic-shaped client end-to-end.

Endpoint

POST /v1/completion/{workspaceId}/{environmentId}/anthropic/messages

Headers:

Content-Type: application/json
Authorization: Bearer <vmx-api-key>

Request shape: standard Anthropic Messages body, plus an optional vmx envelope. Use the VM-X resource name in model.

max_tokens is required. Unlike OpenAI Chat Completions where max_tokens is optional, Anthropic always requires it. The gateway enforces this at the validation boundary — a request without max_tokens returns a 400.

Quick start

Python (Anthropic SDK)
TypeScript (Anthropic SDK)
cURL

import anthropic

client = anthropic.Anthropic(
    api_key="<vmx-api-key>",
    base_url="http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic",
)

message = client.messages.create(
    model="my-resource",
    max_tokens=512,
    messages=[{"role": "user", "content": "Hello!"}],
)

# content is an array of typed blocks (text, tool_use, thinking, …)
for block in message.content:
    if block.type == "text":
        print(block.text)

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: '<vmx-api-key>',
  baseURL: 'http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic',
});

const message = await client.messages.create({
  model: 'my-resource',
  max_tokens: 512,
  messages: [{ role: 'user', content: 'Hello!' }],
});

const text = message.content
  .filter((b) => b.type === 'text')
  .map((b) => (b as { text: string }).text)
  .join('');
console.log(text);

curl http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <vmx-api-key>" \
  -d '{
    "model": "my-resource",
    "max_tokens": 512,
    "messages": [{"role":"user","content":"Hello!"}]
  }'

Ad-hoc model addressing — `<connection_name>/<model>`

If you don't want to pre-create an AI Resource, pass <connection_name>/<model> in the model field. VM-X looks up the connection by name in this workspace/environment and dispatches directly to the upstream model on it. Useful for scratch work and one-off calls that don't need routing or a fallback chain.

Python (Anthropic SDK)
TypeScript (Anthropic SDK)
cURL

# "anthropic-prod" is the AI Connection name; the rest is the
# upstream Anthropic model id verbatim. No resource record required.
message = client.messages.create(
    model="anthropic-prod/claude-3-5-sonnet-20241022",
    max_tokens=512,
    messages=[{"role": "user", "content": "Hello!"}],
)

const message = await client.messages.create({
  model: 'anthropic-prod/claude-3-5-sonnet-20241022',
  max_tokens: 512,
  messages: [{ role: 'user', content: 'Hello!' }],
});

curl http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <vmx-api-key>" \
  -d '{
    "model": "anthropic-prod/claude-3-5-sonnet-20241022",
    "max_tokens": 512,
    "messages": [{"role":"user","content":"Hello!"}]
  }'

The first / is the separator; anything after it is the upstream model id verbatim — so bedrock-prod/anthropic.claude-3-5-sonnet-20241022-v2:0 works as you'd expect on a Bedrock-Invoke connection (including the trailing :0). If no connection of that name exists, VM-X falls back to looking the literal string up as a resource name, so resource names that legitimately contain / still resolve.

Trade-off: ad-hoc addressing skips the resource layer, which means no resource-level routing, fallback, or capacity. Connection- level capacity still applies and the request is still audited. For routing / fallback / per-resource capacity, define an AI Resource and pass its name instead.

Examples

Top-level `system` prompt

System prompts go on the top level (not inside messages[]).

Python
TypeScript
cURL

message = client.messages.create(
    model="my-resource",
    max_tokens=256,
    system="You are a concise senior engineer.",
    messages=[{"role": "user", "content": "Why are mutexes hard?"}],
)

const message = await client.messages.create({
  model: 'my-resource',
  max_tokens: 256,
  system: 'You are a concise senior engineer.',
  messages: [{ role: 'user', content: 'Why are mutexes hard?' }],
});

curl http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <vmx-api-key>" \
  -d '{
    "model": "my-resource",
    "max_tokens": 256,
    "system": "You are a concise senior engineer.",
    "messages": [{"role":"user","content":"Why are mutexes hard?"}]
  }'

Multi-turn conversation

Like Chat Completions, alternating user / assistant messages.

Python
TypeScript

message = client.messages.create(
    model="my-resource",
    max_tokens=128,
    messages=[
        {"role": "user", "content": "My name is Lucas."},
        {"role": "assistant", "content": "Hello, Lucas."},
        {"role": "user", "content": "What's my name?"},
    ],
)

const message = await client.messages.create({
  model: 'my-resource',
  max_tokens: 128,
  messages: [
    { role: 'user', content: 'My name is Lucas.' },
    { role: 'assistant', content: 'Hello, Lucas.' },
    { role: 'user', content: "What's my name?" },
  ],
});

Tool use round-trip

Anthropic tools have name, description, and input_schema (JSON Schema). The model emits a tool_use content block; you respond with a user message whose content includes a tool_result block keyed by tool_use_id.

Python
TypeScript
cURL

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather in a city",
        "input_schema": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"],
        },
    }
]

# 1. Model emits a tool_use block.
first = client.messages.create(
    model="my-resource",
    max_tokens=512,
    tools=tools,
    tool_choice={"type": "any"},
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
)

# Find the tool_use block.
tu = next(b for b in first.content if b.type == "tool_use")

# 2. Run the tool locally...
result = {"temp_c": 22, "conditions": "clear"}

# 3. Send the result back as a tool_result on a user turn.
final = client.messages.create(
    model="my-resource",
    max_tokens=512,
    tools=tools,
    messages=[
        {"role": "user", "content": "Weather in Tokyo?"},
        {"role": "assistant", "content": first.content},  # the assistant's full reply
        {
            "role": "user",
            "content": [
                {
                    "type": "tool_result",
                    "tool_use_id": tu.id,
                    "content": str(result),
                }
            ],
        },
    ],
)

const tools = [
  {
    name: 'get_weather',
    description: 'Get the current weather in a city',
    input_schema: {
      type: 'object',
      properties: { location: { type: 'string' } },
      required: ['location'],
    },
  },
];

const first = await client.messages.create({
  model: 'my-resource',
  max_tokens: 512,
  tools,
  tool_choice: { type: 'any' },
  messages: [{ role: 'user', content: 'Weather in Tokyo?' }],
});

const tu = first.content.find((b) => b.type === 'tool_use')!;
const result = { temp_c: 22, conditions: 'clear' };

const final = await client.messages.create({
  model: 'my-resource',
  max_tokens: 512,
  tools,
  messages: [
    { role: 'user', content: 'Weather in Tokyo?' },
    { role: 'assistant', content: first.content },
    {
      role: 'user',
      content: [
        {
          type: 'tool_result',
          tool_use_id: (tu as { id: string }).id,
          content: JSON.stringify(result),
        },
      ],
    },
  ],
});

curl http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <vmx-api-key>" \
  -d '{
    "model": "my-resource",
    "max_tokens": 512,
    "tools": [{
      "name": "get_weather",
      "description": "Get the current weather in a city",
      "input_schema": {
        "type": "object",
        "properties": {"location": {"type": "string"}},
        "required": ["location"]
      }
    }],
    "tool_choice": {"type": "any"},
    "messages": [{"role":"user","content":"Weather in Tokyo?"}]
  }'

tool_choice accepts:

{ "type": "auto" } — model decides (default when tools is set).
{ "type": "any" } — model must use a tool, but picks which one.
{ "type": "tool", "name": "get_weather" } — force a specific tool.
{ "type": "none" } — model must not use tools. Native Anthropic and Bedrock-Invoke pass this through verbatim. On Bedrock-Converse (which has no equivalent), VM-X strips the tools array from the wire body so the model can't call them (T11).

Prompt caching with `cache_control`

Mark a content block with cache_control: { type: 'ephemeral' } so Anthropic can cache the prefix and skip re-tokenising on subsequent calls. Cacheable on system, tools, and messages.

Python
TypeScript
cURL

SYSTEM = "You are answering questions about a single, large document. " * 200

# First call writes the cache. Look at usage.cache_creation_input_tokens.
first = client.messages.create(
    model="my-resource",
    max_tokens=128,
    system=[
        {
            "type": "text",
            "text": SYSTEM,
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[{"role": "user", "content": "Who's it about?"}],
)
print("Wrote:", first.usage.cache_creation_input_tokens)

# Second call hits the cache. Look at usage.cache_read_input_tokens.
second = client.messages.create(
    model="my-resource",
    max_tokens=128,
    system=[
        {
            "type": "text",
            "text": SYSTEM,
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[{"role": "user", "content": "Three keywords?"}],
)
print("Read:", second.usage.cache_read_input_tokens)

const SYSTEM = 'You are answering questions about a single, large document. '.repeat(200);

const first = await client.messages.create({
  model: 'my-resource',
  max_tokens: 128,
  system: [{ type: 'text', text: SYSTEM, cache_control: { type: 'ephemeral' } }],
  messages: [{ role: 'user', content: "Who's it about?" }],
});
console.log('Wrote:', first.usage.cache_creation_input_tokens);

const second = await client.messages.create({
  model: 'my-resource',
  max_tokens: 128,
  system: [{ type: 'text', text: SYSTEM, cache_control: { type: 'ephemeral' } }],
  messages: [{ role: 'user', content: 'Three keywords?' }],
});
console.log('Read:', second.usage.cache_read_input_tokens);

curl http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <vmx-api-key>" \
  -d '{
    "model": "my-resource",
    "max_tokens": 128,
    "system": [{
      "type": "text",
      "text": "<your long system prompt>",
      "cache_control": {"type": "ephemeral"}
    }],
    "messages": [{"role":"user","content":"Who is it about?"}]
  }'

Cross-provider caching: when an Anthropic Messages request lands on AWS Bedrock-Converse, VM-X translates cache_control blocks to Bedrock's cachePoint blocks. The cache hit/write tokens come back on usage.cache_creation_input_tokens / cache_read_input_tokens in both directions.

Extended thinking

Set thinking: { type: 'adaptive' } on Opus 4.6+ / Sonnet 4.6+ — the model decides how much to think. For older Claude versions, use thinking: { type: 'enabled', budget_tokens: N } (where budget_tokens < max_tokens, minimum 1024).

Python
TypeScript
cURL

message = client.messages.create(
    model="my-claude-opus-resource",
    max_tokens=4096,
    thinking={"type": "adaptive"},
    messages=[{"role": "user", "content": "Prove there are infinitely many primes."}],
)

# Inspect the thinking block in content[]
for block in message.content:
    if block.type == "thinking":
        print("Thinking:", block.thinking[:200], "…")
    elif block.type == "text":
        print("Final:", block.text)

const message = await client.messages.create({
  model: 'my-claude-opus-resource',
  max_tokens: 4096,
  thinking: { type: 'adaptive' },
  messages: [{ role: 'user', content: 'Prove there are infinitely many primes.' }],
});

for (const block of message.content) {
  if (block.type === 'thinking') console.log('Thinking:', (block as { thinking: string }).thinking.slice(0, 200), '…');
  if (block.type === 'text') console.log('Final:', (block as { text: string }).text);
}

curl http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <vmx-api-key>" \
  -d '{
    "model": "my-claude-opus-resource",
    "max_tokens": 4096,
    "thinking": {"type": "adaptive"},
    "messages": [{"role":"user","content":"Prove there are infinitely many primes."}]
  }'

thinking blocks include a signature that the model uses to verify continuity across turns. When you echo a prior assistant reply back as a messages[].content array, keep the thinking block (with its signature) intact — Anthropic validates it server-side.

Server tools (web search, code execution, …)

Anthropic's hosted tools run on Anthropic's side; you don't implement the function. Add the tool definition to tools[] and the model uses it autonomously.

Python (web_search)
TypeScript
cURL

message = client.messages.create(
    model="my-resource",
    max_tokens=2048,
    messages=[{"role": "user", "content": "Latest TypeScript release? Cite sources."}],
    tools=[
        {
            "type": "web_search_20250305",
            "name": "web_search",
            "max_uses": 3,
        }
    ],
)

# The response contains web_search_tool_result blocks + text with citations.

const message = await client.messages.create({
  model: 'my-resource',
  max_tokens: 2048,
  messages: [{ role: 'user', content: 'Latest TypeScript release? Cite sources.' }],
  tools: [
    {
      type: 'web_search_20250305',
      name: 'web_search',
      max_uses: 3,
    },
  ],
});

curl http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <vmx-api-key>" \
  -d '{
    "model": "my-resource",
    "max_tokens": 2048,
    "messages": [{"role":"user","content":"Latest TypeScript release? Cite sources."}],
    "tools": [{"type":"web_search_20250305","name":"web_search","max_uses":3}]
  }'

See the web search guide for citation handling and the full provider matrix.

Streaming — typed events with `event:` lines

Anthropic's stream envelope tags every event with its event: name on its own line, followed by a data: JSON frame. VM-X forwards the exact wire format.

Python
TypeScript
cURL

with client.messages.stream(
    model="my-resource",
    max_tokens=512,
    messages=[{"role": "user", "content": "Stream a poem."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

const stream = await client.messages.stream({
  model: 'my-resource',
  max_tokens: 512,
  messages: [{ role: 'user', content: 'Stream a poem.' }],
});

for await (const event of stream) {
  if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
    process.stdout.write(event.delta.text);
  }
}

curl http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <vmx-api-key>" \
  -N -d '{
    "model": "my-resource",
    "max_tokens": 512,
    "messages": [{"role":"user","content":"Stream a poem."}],
    "stream": true
  }'

Wire format:

event: message_start
data: {"type":"message_start","message":{...}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Roses"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":4}}

event: message_stop
data: {"type":"message_stop"}

Mid-stream errors are emitted as a typed event: error frame followed by stream termination — clients consuming with the Anthropic SDK pick this up automatically.

`betas` array (beta-feature opt-in)

Anthropic's beta-features header (anthropic-beta) takes a comma-separated list of feature flags. The Anthropic SDK exposes this as a betas: string[] field on the request; VM-X lifts it off the body and emits it as the anthropic-beta HTTP header before dispatching to Anthropic's API (Anthropic's native API rejects betas as a body field — but Bedrock-Invoke accepts it on the body, so VM-X preserves the body shape and adapts at the wire layer).

Python
TypeScript
cURL

message = client.messages.create(
    model="my-resource",
    max_tokens=4096,
    thinking={"type": "enabled", "budget_tokens": 2000},
    messages=[{"role": "user", "content": "Reason about this..."}],
    extra_body={"betas": ["interleaved-thinking-2025-05-14"]},
)

const message = await client.messages.create({
  model: 'my-resource',
  max_tokens: 4096,
  thinking: { type: 'enabled', budget_tokens: 2000 },
  messages: [{ role: 'user', content: 'Reason about this...' }],
  // @ts-expect-error custom extra
  betas: ['interleaved-thinking-2025-05-14'],
});

curl http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <vmx-api-key>" \
  -d '{
    "model": "my-resource",
    "max_tokens": 4096,
    "thinking": {"type":"enabled","budget_tokens":2000},
    "messages": [{"role":"user","content":"Reason about this..."}],
    "betas": ["interleaved-thinking-2025-05-14"]
  }'

Attaching `vmx` metadata

Python
TypeScript
cURL

message = client.messages.create(
    model="my-resource",
    max_tokens=512,
    messages=[{"role": "user", "content": "Summarise: ..."}],
    extra_body={
        "vmx": {
            "correlationId": "summarizer-2026-05-10",
            "metadata": {"team": "growth", "user_id": "u_42"},
            "timeoutMs": 25_000,
        }
    },
)

const message = await client.messages.create({
  model: 'my-resource',
  max_tokens: 512,
  messages: [{ role: 'user', content: 'Summarise: ...' }],
  // @ts-expect-error custom extra
  vmx: {
    correlationId: 'summarizer-2026-05-10',
    metadata: { team: 'growth', user_id: 'u_42' },
    timeoutMs: 25_000,
  },
});

curl http://localhost:3000/v1/completion/<workspace>/<environment>/anthropic/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <vmx-api-key>" \
  -d '{
    "model": "my-resource",
    "max_tokens": 512,
    "messages": [{"role":"user","content":"Summarise: ..."}],
    "vmx": {
      "correlationId": "summarizer-2026-05-10",
      "metadata": {"team": "growth", "user_id": "u_42"},
      "timeoutMs": 25000
    }
  }'

Provider compatibility

Provider	Native passthrough?	Notes
Anthropic	✅ Yes (native)	True end-to-end passthrough — `cache_control`, `thinking`, server tools, `service_tier`, refusal stop details all round-trip.
AWS Bedrock-Invoke	✅ Yes (native)	Claude on AWS — same wire shape, plus the Bedrock `anthropic_version` discriminator. External image URLs are rejected up-front (`aws_bedrock_invoke_image_url_unsupported`); use base64 sources.
AWS Bedrock-Converse	Convert	Direct Anthropic↔Converse adapter — `cache_control` → `cachePoint`, server tools mapped to Converse equivalents where supported.
OpenAI	Convert (D5)	Direct Anthropic↔Responses adapter (no internal pivot through Chat Completions). `thinking` → `reasoning.effort`, `tool_use` → `function_call`.
Gemini	Convert	Via Chat Completions on Google's OpenAI-compat endpoint.
Groq	Convert	Via Chat Completions.
Perplexity	Convert	Via Chat Completions.

For the per-pair conversion details (which Anthropic fields survive each conversion path), see the conversion matrix.

Errors

See the endpoint overview for the full error catalog. On streaming, the gateway emits a typed event: error frame (event: error\ndata: { "error": {...} }\n\n) and terminates the stream — there is no trailing [DONE] sentinel; Anthropic's SDK MessageStream picks the error event up by name. Long-running streams also receive periodic event: ping heartbeats every ~10s (T3) so idle proxies don't close the connection during slow tool use.

The gateway maps Anthropic's anthropic-ratelimit-* response headers to OpenAI's x-ratelimit-* shape so your rate-limit accounting code doesn't need to know which provider it just talked to.

Next steps

Web search — Anthropic's web_search_20250305 server tool + the cross-provider matrix
VM-X envelope — correlationId, metadata, providerArgs, …
Chat Completions — when you don't need Anthropic-specific features
Anthropic provider config — connection-level settings

Endpoint​

Quick start​

Ad-hoc model addressing — <connection_name>/<model>​

Examples​

Top-level system prompt​

Multi-turn conversation​

Tool use round-trip​

Prompt caching with cache_control​

Extended thinking​

Server tools (web search, code execution, …)​

Streaming — typed events with event: lines​

betas array (beta-feature opt-in)​

Attaching vmx metadata​

Provider compatibility​

Errors​

Next steps​