Core Components
VM-X AI is built around several fundamental concepts that work together to provide a complete AI management solution. Understanding these components is essential for effectively using the platform.
Workspaces and Environments
VM-X AI uses a hierarchical structure for organization and isolation:
Workspaces
A Workspace is the top-level isolation layer that groups a set of environments. Workspaces provide:
- Multi-tenancy: Complete isolation between different organizations or teams
- Access Control: Workspace-level permissions and member management
- Resource Organization: Logical grouping of related environments
Environments
An Environment is an isolation layer within a workspace that groups resources. Environments provide:
- Resource Isolation: AI Connections, AI Resources, API Keys, and Usage data are scoped to environments
- Environment-based Routing: Different environments can have different configurations
- Deployment Separation: Separate environments for development, staging, and production
Workspace Members
Each workspace can have two types of members:
- Owner: Can do anything in the workspace, including deleting the workspace
- Member: Can create environments, AI connections, and resources, but cannot delete workspaces
Users and Roles
VM-X AI provides fine-grained access control through Users and Roles. For detailed information, see the Security section:
- Roles - Role management and default roles
- Policy - Comprehensive policy guide
- Users - User management
Users
Users represent individuals who can access VM-X AI. Users can:
- Be assigned to workspaces as members or owners
- Have roles assigned for fine-grained permissions
- Access resources based on their permissions
- Authenticate via local credentials or OIDC federated login (SSO)
OIDC Federated Login
VM-X AI supports OIDC (OpenID Connect) federated login for enterprise single sign-on (SSO). This allows users to authenticate using their organization's identity provider (e.g., Okta, Azure AD, Google Workspace).
Configuration:
Set the following environment variables to enable OIDC:
OIDC_FEDERATED_ISSUER: The OIDC issuer URL (required)OIDC_FEDERATED_CLIENT_ID: The OIDC client ID (required)OIDC_FEDERATED_CLIENT_SECRET: The OIDC client secret (optional, depending on provider)OIDC_FEDERATED_SCOPE: OIDC scopes (default:openid profile email)OIDC_FEDERATED_DEFAULT_ROLE: Default role assigned to federated users (default:power-user)
When OIDC is configured, the login page displays an "SSO Login" button that redirects users to the identity provider for authentication. After successful authentication, users are automatically created in VM-X AI (if they don't exist) and assigned the default role.
Roles
Roles manage permissions using granular policies. Each role defines:
- Actions: What operations can be performed (e.g.,
ai-connection:create,workspace:delete) - Resources: What resources can be accessed (e.g.,
workspace:*,environment:production) - Effect: Whether to allow or deny the action (ALLOW or DENY)
Roles support wildcards for flexible permission management:
*matches any value?matches a single character
Default Roles
VM-X AI includes three default roles:
- admin: Full access to everything (
*:*on*) - power-user: Can create workspaces, environments, connections, and resources, but cannot manage roles or users
- read-only: Can only read/list resources (
*:get,*:liston*)
AI Connections and AI Resources
VM-X AI is built around two fundamental concepts: AI Connections and AI Resources.
AI Connections
An AI Connection represents a connection to a specific AI provider with its credentials and capacity configuration.
What is an AI Connection?
An AI Connection encapsulates:
- Provider: The AI provider (OpenAI, Anthropic, Google Gemini, Groq, Perplexity, AWS Bedrock Converse, AWS Bedrock-Invoke)
- Credentials: Encrypted API keys or authentication tokens
- Capacity: Custom capacity limits (e.g., 100 RPM, 100,000 TPM)
- Discovered Capacity: Automatically discovered rate limits from the provider
Key Features
🔐 Secure Credential Storage
Credentials are encrypted at rest using either:
- AWS KMS: For production environments (recommended)
- Libsodium: For local development and small deployments
Credentials are never exposed in API responses or logs.
📊 Capacity Management
Define custom capacity limits per connection:
{
"capacity": [
{
"period": "minute",
"requests": 100,
"tokens": 100000
},
{
"period": "hour",
"requests": 5000,
"tokens": 5000000
},
{
"period": "day",
"requests": 100000,
"tokens": 100000000
}
]
}
🔍 Discovered Capacity
VM-X AI automatically discovers rate limits from provider responses and stores them as "discovered capacity". This helps you understand actual provider limits and optimize your usage.
AI Resources
An AI Resource represents a logical endpoint that your applications use to make AI requests. It defines which provider/model to use, routing rules, fallback behavior, and capacity allocation.
What is an AI Resource?
An AI Resource is the abstraction your applications interact with. It includes:
- Primary Model: The default provider/model to use
- Routing Rules: Conditions for dynamically selecting different models
- Fallback Models: Alternative models to use if the primary fails
- Capacity: Resource-level capacity limits
- API Key Assignment: Which API keys can access this resource
Key Features
connectionName instead of connectionIdThe examples below reference connections by connectionId (UUID), but every model config slot — the primary model, each routing then, and every entry in fallbackModels — also accepts a connectionName field. This lets you reference a connection by its human-readable name (e.g. "openai-prod") instead of its UUID, which is handy for configs that are checked into source control or shared across environments.
If both fields are set, connectionId wins and no name lookup happens. If only connectionName is set, VM-X resolves it to a connection in the request's environment before dispatch and 400s if no match is found.
🎯 Dynamic Routing
Route requests to different models based on conditions. VM-X AI provides a comprehensive set of routing conditions:
Available Routing Expressions:
tokens.input- Number of input tokens in the requestrequest.allMessagesContent.length- Total character length of all messagesrequest.lastMessage.content- Content of the last user messagerequest.allMessagesContent- Combined content of all messagesrequest.toolsCount- Number of tools in the requesterrorRate(5)- Error rate in the last 5 minuteserrorRate(10)- Error rate in the last 10 minutes
Available Comparators:
LESS_THAN- Field is less than valueGREATER_THAN- Field is greater than valueCONTAINS- Field contains value (for strings)PATTERN- Field matches regex pattern (for strings)
connectionId or connectionName — both workThe examples below use connectionId (the connection's UUID), but every
model-config slot — the primary model, every routing then block,
fallbackModels[], and secondaryModels[] — also accepts
connectionName. The gateway resolves the name to a UUID before
dispatch, so name-only configs work end-to-end:
{
"provider": "groq",
"connectionName": "groq-primary",
"model": "llama-3.1-70b-versatile"
}
If both fields are set, connectionId wins (no name lookup). A
connectionName that doesn't exist in this workspace/environment
returns a clean 400 before dispatch.
Example: Route based on input token count
{
"routing": {
"enabled": true,
"conditions": [
{
"description": "Use Groq for small requests",
"expression": "tokens.input",
"comparator": "LESS_THAN",
"value": {
"type": "NUMBER",
"value": 100
},
"then": {
"provider": "groq",
"connectionId": "groq-connection-id",
"model": "llama-3.1-70b-versatile"
}
}
]
}
}
Example: Route based on error rate
{
"routing": {
"enabled": true,
"conditions": [
{
"description": "Switch to Anthropic if error rate is high",
"expression": "errorRate(10)",
"comparator": "GREATER_THAN",
"value": {
"type": "NUMBER",
"value": 10
},
"then": {
"provider": "anthropic",
"connectionId": "anthropic-connection-id",
"model": "claude-3-5-sonnet-20241022"
}
}
]
}
}
Example: Route based on tools usage
{
"routing": {
"enabled": true,
"conditions": [
{
"description": "Use GPT-4 for requests with tools",
"expression": "request.toolsCount",
"comparator": "GREATER_THAN",
"value": {
"type": "NUMBER",
"value": 0,
"readOnly": true
},
"then": {
"provider": "openai",
"connectionId": "openai-connection-id",
"model": "gpt-4o"
}
}
]
}
}
Example: Route based on message content
{
"routing": {
"enabled": true,
"conditions": [
{
"description": "Route urgent requests to GPT-4",
"expression": "request.lastMessage.content",
"comparator": "CONTAINS",
"value": {
"type": "STRING",
"value": "urgent"
},
"then": {
"provider": "openai",
"connectionId": "openai-connection-id",
"model": "gpt-4o"
}
}
]
}
}
Example: Route based on character length
{
"routing": {
"enabled": true,
"conditions": [
{
"description": "Use Groq for short prompts",
"expression": "request.allMessagesContent.length",
"comparator": "LESS_THAN",
"value": {
"type": "NUMBER",
"value": 500
},
"then": {
"provider": "groq",
"connectionId": "groq-connection-id",
"model": "llama-3.1-70b-versatile"
}
}
]
}
}
🔄 Automatic Fallback
Configure fallback models that are automatically used if the primary model fails:
{
"useFallback": true,
"fallbackModels": [
{
"provider": "bedrock",
"connectionId": "bedrock-connection-id",
"model": "anthropic.claude-3-5-sonnet-20241022-v2:0"
},
{
"provider": "openai",
"connectionId": "openai-connection-id",
"model": "gpt-4o-mini"
}
]
}
Fallback happens automatically on:
- Provider errors (5xx status codes)
- Rate limit errors (429)
- Timeout errors
- Network failures
📊 Resource-Level Capacity
Define capacity limits specific to a resource:
{
"capacity": [
{
"period": "minute",
"requests": 50,
"tokens": 50000
}
],
"enforceCapacity": true
}
This allows you to:
- Limit usage per resource independently
- Control costs by resource
- Implement tiered access levels
🔑 API Key Assignment
Assign API keys to resources to control access:
{
"assignApiKeys": ["api-key-id-1", "api-key-id-2"]
}
Only requests with assigned API keys can access the resource.
Relationship Between Components
Roles are global and not scoped to workspaces or environments. A role's permissions apply across all workspaces and environments in the system. Users are assigned roles globally, and those roles determine what actions they can perform throughout the entire VM-X AI instance.
How They Work Together
- Application makes a request to an AI Resource using an API key
- AI Resource evaluates routing conditions to select a model
- AI Resource uses the appropriate AI Connection to make the request
- If the primary model fails, AI Resource automatically tries fallback models
- Capacity is checked at both the connection and resource levels
- All requests are logged for audit and metrics
Best Practices
AI Connections
- One connection per provider account: Create separate connections for different provider accounts or regions
- Set realistic capacity: Base capacity limits on provider quotas and your usage patterns
- Use discovered capacity: Monitor discovered capacity to understand actual provider limits
AI Resources
- Start simple: Begin with a single primary model, add routing and fallback as needed
- Test routing conditions: Verify routing logic works as expected before deploying
- Configure fallback chains: Always have at least one fallback model for critical resources
- Set resource capacity: Use resource-level capacity to control costs and ensure fair usage
- Use API keys for access control: Assign API keys to resources to implement access control