Dynamic Routing

Routing allows you to dynamically select different models based on request characteristics. This enables intelligent request distribution, cost optimization, and performance tuning.

Dynamic Routing Configuration

Overview

Dynamic routing evaluates conditions for each request and selects the most appropriate model based on:

Token count: Route small requests to faster/cheaper models
Error rates: Automatically switch providers when error rates are high
Tool usage: Route tool-enabled requests to models that support tools
Content analysis: Route based on message content or patterns
Traffic splitting: A/B test models or gradually roll out new models

Basic Routing

Route based on simple conditions. This example demonstrates token-based routing, where requests with fewer than 100 input tokens are routed to a faster, cost-effective model (Groq with openai/gpt-oss-20b), while larger requests use the default primary model.

Use Case: Optimize costs and latency by routing small, simple queries to faster models while reserving more powerful models for complex requests.

How it works: The routing condition evaluates tokens.input using the LESS_THAN comparator with a value of 100. When a request has fewer than 100 input tokens, it automatically routes to the specified Groq connection and model instead of the primary model.

Basic Routing Configuration

Routing Based on Error Rate

Automatically switch to a different provider when error rates exceed a threshold. This example monitors the error rate over the last 5 minutes and routes to Groq with openai/gpt-oss-20b if the error rate exceeds 10%.

Use Case: Maintain high availability by automatically failing over to a backup provider when the primary provider experiences issues. This is especially useful for production workloads where uptime is critical.

How it works: The routing condition uses errorRate(10) to calculate the error percentage over the last 5 minutes. When this percentage exceeds 10% (using the GREATER_THAN comparator), all subsequent requests are routed to the specified Groq connection and model until the error rate drops below the threshold.

Benefits:

Automatic failover without manual intervention
Reduces downtime during provider outages
Helps maintain service reliability

Error Rate Routing Configuration

Routing Based on Tools Usage

Route requests that include function calling or tool usage to models that support these features. This example routes any request with tools to Groq with openai/gpt-oss-20b, which supports tool/function calling.

Use Case: Ensure requests requiring function calling or tool usage are handled by models with robust tool support, while simpler requests can use more cost-effective models.

How it works: The routing condition checks request.toolsCount using the GREATER_THAN comparator with a value of 0. When a request includes one or more tools (toolsCount > 0), it routes to the specified Groq connection and model.

Benefits:

Guarantees tool-enabled requests use compatible models
Prevents tool-related errors from using incompatible models
Optimizes costs by only using premium models when needed

Tools-Based Routing Configuration

Available Routing Fields and Expressions

Routing conditions evaluate against a small set of request-shaped variables. The OpenAI-shape body is always available under request.* regardless of which endpoint the caller used (Chat Completions, Responses, or Anthropic Messages) — Responses and Anthropic requests are converted to OpenAI shape before routing runs.

Token-Based Conditions

tokens.input: Number of input tokens in the request
- Example: Route to Groq if input tokens < 100

Request Conditions

request.model: The resource name the caller targeted (the request's model field).
request.messagesCount: Number of messages in the request.
request.toolsCount: Number of tools declared on the request. Use GREATER_THAN 0 to check whether the request uses tools at all.
request.firstMessage / request.lastMessage: First and last message objects. Read .content for the text body. Both are undefined when the request has no messages.
request.allMessagesContent: All message contents joined into a single string. Supports CONTAINS and PATTERN. Length-based routing uses request.allMessagesContent.length with a numeric comparator.

Format and Native-Body Conditions

Routing rules can also branch on the input format the client used and read fields the OpenAI conversion would otherwise drop:

request.format: One of "openai", "responses", "anthropic". Lets a single resource apply different rules depending on which endpoint was hit.
request.nativeBody: The original request body, before format conversion. Use this to read provider-native fields that the OpenAI shape doesn't model — for example request.nativeBody.thinking (Anthropic extended thinking), request.nativeBody.cache_control (Anthropic prompt caching), request.nativeBody.instructions or request.nativeBody.reasoning (Responses-only fields).

These fields are evaluated through EJS, so they appear in advanced- mode routing expressions like <%= request.format === 'anthropic' && request.nativeBody.thinking %>.

Error-Rate Function

errorRate(windowMinutes) — async function returning the error-rate percentage for the resource's primary connection/model over the last windowMinutes. Defaults to 10 minutes when called with no argument. Supports the numeric comparators (GREATER_THAN, LESS_THAN, …).
- Example: Switch providers when errorRate(5) GREATER_THAN 10 (more than 10% errors in the last 5 minutes).

Available Comparators

The full set, all of which are valid on numeric and string fields where it makes sense:

Equality: EQUAL, NOT_EQUAL
Numeric: GREATER_THAN, GREATER_THAN_OR_EQUAL, LESS_THAN, LESS_THAN_OR_EQUAL
String: CONTAINS, NOT_CONTAINS, STARTS_WITH, ENDS_WITH, PATTERN (regex)
Membership: IN, NOT_IN (against a comma-delimited list or JSON array value)
Existence: EXISTS (truthy check)

Routing Actions

Each route declares an action. There are two:

CALL_MODEL — when the route matches, the request is dispatched to the configured then model (provider, connection, model, and optional per-model maxRetries / timeoutMs). This is the default action used by token-based, error-rate-based, and traffic-splitting routes.
BLOCK — when the route matches, the gateway short-circuits and returns 400 Bad Request to the caller without calling any provider. Use this to enforce policy at the routing layer (for example, block requests whose prompt matches a known prompt-injection probe).

Traffic Splitting

Use traffic splitting for A/B testing, gradual rollouts, or canary deployments. This example routes 50% of requests matching the condition (input tokens > 0) to Groq with openai/gpt-oss-20b, while the other 50% use the default primary model.

Use Case:

A/B Testing: Compare performance, quality, or cost between different models
Gradual Rollouts: Safely introduce new models by starting with a small percentage of traffic
Canary Deployments: Test new models in production with limited exposure

How it works: The routing condition matches any requests, but the traffic field limits this route to only 50% of matching requests. The remaining 50% of matching requests continue to use the primary model. This creates a controlled split where you can monitor and compare both models' performance.

Best Practices:

Start with a low percentage (10-20%) when testing new models
Monitor error rates, latency, and quality metrics for both routes
Gradually increase the percentage as confidence grows
Use audit logs to track which route each request took

The traffic field is set on the route's then model config and specifies the percentage (0-100) of matching requests that should use this route. Routes are evaluated in declared order; on a match without traffic, the first matching route wins. With traffic, the gateway rolls dice — if the dice roll fails, evaluation continues to the next route, so a downstream route can still pick up the remaining percentage. If no route matches, the resource's primary model is used.

Traffic Splitting Configuration

Best Practices

1. Start with Simple Conditions

Begin with basic routing:

Token-based routing (small vs. large requests)
Tool-based routing (requests with/without tools)
Error rate-based routing (fallback when errors are high)

2. Test Routing Conditions

Before deploying:

Test routing conditions with sample requests
Verify routing logic works as expected
Monitor routing decisions in audit logs

3. Use Traffic Splitting for Rollouts

Gradually roll out new models:

Start with low traffic percentage (10-20%)
Monitor performance and errors
Gradually increase traffic percentage
Fully switch when confident

4. Monitor Routing Decisions

Regularly review:

Which routes are being used most
Routing decision patterns
Performance differences between routes
Error rates per route

Troubleshooting

Routing Not Working

Check Routing Enabled: Ensure routing is enabled in the resource configuration
Verify Conditions: Check routing conditions are correct and match your use case
Review Logs: Check audit logs for routing decisions to see which conditions are being evaluated
Test Conditions: Test routing conditions with sample requests to verify they work as expected

Wrong Model Selected

Check Condition Order: Routing conditions are evaluated in order - ensure conditions are ordered correctly
Verify Expressions: Check that expressions match the request characteristics
Review Traffic Splitting: If using traffic splitting, verify the percentage is set correctly
Check Connection Availability: Ensure the selected connection and model are available and configured correctly

Next Steps

Fallback - Configure automatic fallback
Capacity - Set resource-level capacity limits
AI Resources Overview - Return to AI Resources overview

Overview​

Basic Routing​

Routing Based on Error Rate​

Routing Based on Tools Usage​

Available Routing Fields and Expressions​

Token-Based Conditions​

Request Conditions​

Format and Native-Body Conditions​

Error-Rate Function​

Available Comparators​

Routing Actions​

Traffic Splitting​

Best Practices​

1. Start with Simple Conditions​

2. Test Routing Conditions​

3. Use Traffic Splitting for Rollouts​

4. Monitor Routing Decisions​

Troubleshooting​

Routing Not Working​

Wrong Model Selected​

Next Steps​