Core Components

VM-X AI is built around several fundamental concepts that work together to provide a complete AI management solution. Understanding these components is essential for effectively using the platform.

Workspaces and Environments

VM-X AI uses a hierarchical structure for organization and isolation:

Workspaces

A Workspace is the top-level isolation layer that groups a set of environments. Workspaces provide:

Multi-tenancy: Complete isolation between different organizations or teams
Access Control: Workspace-level permissions and member management
Resource Organization: Logical grouping of related environments

Environments

An Environment is an isolation layer within a workspace that groups resources. Environments provide:

Resource Isolation: AI Connections, AI Resources, API Keys, and Usage data are scoped to environments
Environment-based Routing: Different environments can have different configurations
Deployment Separation: Separate environments for development, staging, and production

Workspace Members

Each workspace can have two types of members:

Owner: Can do anything in the workspace, including deleting the workspace
Member: Can create environments, AI connections, and resources, but cannot delete workspaces

Users and Roles

VM-X AI provides fine-grained access control through Users and Roles. For detailed information, see the Security section:

Roles - Role management and default roles
Policy - Comprehensive policy guide
Users - User management

Users

Users represent individuals who can access VM-X AI. Users can:

Be assigned to workspaces as members or owners
Have roles assigned for fine-grained permissions
Access resources based on their permissions
Authenticate via local credentials or OIDC federated login (SSO)

VM-X AI supports OIDC (OpenID Connect) federated login for enterprise single sign-on (SSO). This allows users to authenticate using their organization's identity provider (e.g., Okta, Azure AD, Google Workspace).

Configuration:

Set the following environment variables to enable OIDC:

OIDC_FEDERATED_ISSUER: The OIDC issuer URL (required)
OIDC_FEDERATED_CLIENT_ID: The OIDC client ID (required)
OIDC_FEDERATED_CLIENT_SECRET: The OIDC client secret (optional, depending on provider)
OIDC_FEDERATED_SCOPE: OIDC scopes (default: openid profile email)
OIDC_FEDERATED_DEFAULT_ROLE: Default role assigned to federated users (default: power-user)

When OIDC is configured, the login page displays an "SSO Login" button that redirects users to the identity provider for authentication. After successful authentication, users are automatically created in VM-X AI (if they don't exist) and assigned the default role.

Roles

Roles manage permissions using granular policies. Each role defines:

Actions: What operations can be performed (e.g., ai-connection:create, workspace:delete)
Resources: What resources can be accessed (e.g., workspace:*, environment:production)
Effect: Whether to allow or deny the action (ALLOW or DENY)

Roles support wildcards for flexible permission management:

* matches any value
? matches a single character

Default Roles

VM-X AI includes three default roles:

admin: Full access to everything (*:* on *)
power-user: Can create workspaces, environments, connections, and resources, but cannot manage roles or users
read-only: Can only read/list resources (*:get, *:list on *)

AI Connections and AI Resources

VM-X AI is built around two fundamental concepts: AI Connections and AI Resources.

AI Connections

An AI Connection represents a connection to a specific AI provider with its credentials and capacity configuration.

What is an AI Connection?

An AI Connection encapsulates:

Provider: The AI provider (OpenAI, Anthropic, Google Gemini, Groq, Perplexity, AWS Bedrock Converse, AWS Bedrock-Invoke)
Credentials: Encrypted API keys or authentication tokens
Capacity: Custom capacity limits (e.g., 100 RPM, 100,000 TPM)
Discovered Capacity: Automatically discovered rate limits from the provider

Key Features

🔐 Secure Credential Storage

Credentials are encrypted at rest using either:

AWS KMS: For production environments (recommended)
Libsodium: For local development and small deployments

Credentials are never exposed in API responses or logs.

📊 Capacity Management

Define custom capacity limits per connection:

{
  "capacity": [
    {
      "period": "minute",
      "requests": 100,
      "tokens": 100000
    },
    {
      "period": "hour",
      "requests": 5000,
      "tokens": 5000000
    },
    {
      "period": "day",
      "requests": 100000,
      "tokens": 100000000
    }
  ]
}

🔍 Discovered Capacity

VM-X AI automatically discovers rate limits from provider responses and stores them as "discovered capacity". This helps you understand actual provider limits and optimize your usage.

AI Resources

An AI Resource represents a logical endpoint that your applications use to make AI requests. It defines which provider/model to use, routing rules, fallback behavior, and capacity allocation.

What is an AI Resource?

An AI Resource is the abstraction your applications interact with. It includes:

Primary Model: The default provider/model to use
Routing Rules: Conditions for dynamically selecting different models
Fallback Models: Alternative models to use if the primary fails
Capacity: Resource-level capacity limits
API Key Assignment: Which API keys can access this resource

Key Features

Use connectionName instead of connectionId

The examples below reference connections by connectionId (UUID), but every model config slot — the primary model, each routing then, and every entry in fallbackModels — also accepts a connectionName field. This lets you reference a connection by its human-readable name (e.g. "openai-prod") instead of its UUID, which is handy for configs that are checked into source control or shared across environments.

If both fields are set, connectionId wins and no name lookup happens. If only connectionName is set, VM-X resolves it to a connection in the request's environment before dispatch and 400s if no match is found.

🎯 Dynamic Routing

Route requests to different models based on conditions. VM-X AI provides a comprehensive set of routing conditions:

Available Routing Expressions:

tokens.input - Number of input tokens in the request
request.allMessagesContent.length - Total character length of all messages
request.lastMessage.content - Content of the last user message
request.allMessagesContent - Combined content of all messages
request.toolsCount - Number of tools in the request
errorRate(5) - Error rate in the last 5 minutes
errorRate(10) - Error rate in the last 10 minutes

Available Comparators:

LESS_THAN - Field is less than value
GREATER_THAN - Field is greater than value
CONTAINS - Field contains value (for strings)
PATTERN - Field matches regex pattern (for strings)

connectionId or connectionName — both work

The examples below use connectionId (the connection's UUID), but every model-config slot — the primary model, every routing then block, fallbackModels[], and secondaryModels[] — also accepts connectionName. The gateway resolves the name to a UUID before dispatch, so name-only configs work end-to-end:

{
  "provider": "groq",
  "connectionName": "groq-primary",
  "model": "llama-3.1-70b-versatile"
}

If both fields are set, connectionId wins (no name lookup). A connectionName that doesn't exist in this workspace/environment returns a clean 400 before dispatch.

Example: Route based on input token count

{
  "routing": {
    "enabled": true,
    "conditions": [
      {
        "description": "Use Groq for small requests",
        "expression": "tokens.input",
        "comparator": "LESS_THAN",
        "value": {
          "type": "NUMBER",
          "value": 100
        },
        "then": {
          "provider": "groq",
          "connectionId": "groq-connection-id",
          "model": "llama-3.1-70b-versatile"
        }
      }
    ]
  }
}

Example: Route based on error rate

{
  "routing": {
    "enabled": true,
    "conditions": [
      {
        "description": "Switch to Anthropic if error rate is high",
        "expression": "errorRate(10)",
        "comparator": "GREATER_THAN",
        "value": {
          "type": "NUMBER",
          "value": 10
        },
        "then": {
          "provider": "anthropic",
          "connectionId": "anthropic-connection-id",
          "model": "claude-3-5-sonnet-20241022"
        }
      }
    ]
  }
}

Example: Route based on tools usage

{
  "routing": {
    "enabled": true,
    "conditions": [
      {
        "description": "Use GPT-4 for requests with tools",
        "expression": "request.toolsCount",
        "comparator": "GREATER_THAN",
        "value": {
          "type": "NUMBER",
          "value": 0,
          "readOnly": true
        },
        "then": {
          "provider": "openai",
          "connectionId": "openai-connection-id",
          "model": "gpt-4o"
        }
      }
    ]
  }
}

Example: Route based on message content

{
  "routing": {
    "enabled": true,
    "conditions": [
      {
        "description": "Route urgent requests to GPT-4",
        "expression": "request.lastMessage.content",
        "comparator": "CONTAINS",
        "value": {
          "type": "STRING",
          "value": "urgent"
        },
        "then": {
          "provider": "openai",
          "connectionId": "openai-connection-id",
          "model": "gpt-4o"
        }
      }
    ]
  }
}

Example: Route based on character length

{
  "routing": {
    "enabled": true,
    "conditions": [
      {
        "description": "Use Groq for short prompts",
        "expression": "request.allMessagesContent.length",
        "comparator": "LESS_THAN",
        "value": {
          "type": "NUMBER",
          "value": 500
        },
        "then": {
          "provider": "groq",
          "connectionId": "groq-connection-id",
          "model": "llama-3.1-70b-versatile"
        }
      }
    ]
  }
}

🔄 Automatic Fallback

Configure fallback models that are automatically used if the primary model fails:

{
  "useFallback": true,
  "fallbackModels": [
    {
      "provider": "bedrock",
      "connectionId": "bedrock-connection-id",
      "model": "anthropic.claude-3-5-sonnet-20241022-v2:0"
    },
    {
      "provider": "openai",
      "connectionId": "openai-connection-id",
      "model": "gpt-4o-mini"
    }
  ]
}

Fallback happens automatically on:

Provider errors (5xx status codes)
Rate limit errors (429)
Timeout errors
Network failures

📊 Resource-Level Capacity

Define capacity limits specific to a resource:

{
  "capacity": [
    {
      "period": "minute",
      "requests": 50,
      "tokens": 50000
    }
  ],
  "enforceCapacity": true
}

This allows you to:

Limit usage per resource independently
Control costs by resource
Implement tiered access levels

🔑 API Key Assignment

Assign API keys to resources to control access:

{
  "assignApiKeys": ["api-key-id-1", "api-key-id-2"]
}

Only requests with assigned API keys can access the resource.

Relationship Between Components

Roles are Global

Roles are global and not scoped to workspaces or environments. A role's permissions apply across all workspaces and environments in the system. Users are assigned roles globally, and those roles determine what actions they can perform throughout the entire VM-X AI instance.

How They Work Together

Application makes a request to an AI Resource using an API key
AI Resource evaluates routing conditions to select a model
AI Resource uses the appropriate AI Connection to make the request
If the primary model fails, AI Resource automatically tries fallback models
Capacity is checked at both the connection and resource levels
All requests are logged for audit and metrics

Best Practices

AI Connections

One connection per provider account: Create separate connections for different provider accounts or regions
Set realistic capacity: Base capacity limits on provider quotas and your usage patterns
Use discovered capacity: Monitor discovered capacity to understand actual provider limits

AI Resources

Start simple: Begin with a single primary model, add routing and fallback as needed
Test routing conditions: Verify routing logic works as expected before deploying
Configure fallback chains: Always have at least one fallback model for critical resources
Set resource capacity: Use resource-level capacity to control costs and ensure fair usage
Use API keys for access control: Assign API keys to resources to implement access control

Workspaces and Environments​

Workspaces​

Environments​

Workspace Members​

Users and Roles​

Users​

OIDC Federated Login​

Roles​

Default Roles​

AI Connections and AI Resources​

AI Connections​

What is an AI Connection?​

Key Features​

🔐 Secure Credential Storage​

📊 Capacity Management​

🔍 Discovered Capacity​

AI Resources​

What is an AI Resource?​

Key Features​

🎯 Dynamic Routing​

🔄 Automatic Fallback​

📊 Resource-Level Capacity​

🔑 API Key Assignment​

Relationship Between Components​

How They Work Together​

Best Practices​

AI Connections​

AI Resources​

Workspaces and Environments

Workspaces

Environments

Workspace Members

Users and Roles

Users

OIDC Federated Login

Roles

Default Roles

AI Connections and AI Resources

AI Connections

What is an AI Connection?

Key Features

🔐 Secure Credential Storage

📊 Capacity Management

🔍 Discovered Capacity

AI Resources

What is an AI Resource?

Key Features

🎯 Dynamic Routing

🔄 Automatic Fallback

📊 Resource-Level Capacity

🔑 API Key Assignment

Relationship Between Components

How They Work Together

Best Practices

AI Connections

AI Resources