Deploying to AWS ECS

This guide shows you how to deploy VM-X AI to Amazon ECS (Elastic Container Service) using AWS CDK with Fargate.

What gets deployed

VM-X AI itself only requires four runtime components: the API, the UI, PostgreSQL, and Redis. Usage analytics and cost tracking are served directly from the request_audit table in Postgres, so there is no separate time-series database.

The ECS example wraps those four with a production-grade AWS footprint:

Required for VM-X AI to run

ECS Fargate Cluster running the vmxai/api and vmxai/ui containers
VPC with multi-AZ networking
Aurora PostgreSQL as the primary database (also stores request_audit)
ElastiCache Serverless (Valkey) as the Redis-compatible cluster
Application Load Balancers in front of the API and UI services
AWS KMS key (used by the API for envelope-encrypting AI provider credentials)

Optional, deployed by default

OpenTelemetry Collector sidecar on each task — application observability for the gateway, not VM-X AI's product features
AWS X-Ray for distributed traces and CloudWatch EMF for metrics (exporters wired into the OTEL collector)
CloudWatch Logs for centralized container logs

Prerequisites

Before you begin, ensure you have:

AWS CLI configured with appropriate credentials
AWS CDK CLI installed (npm install -g aws-cdk)
Node.js 18+ and pnpm (or npm/yarn)
AWS Permissions to create:
- ECS clusters and services
- VPCs, subnets, and networking resources
- RDS Aurora clusters
- ElastiCache serverless caches
- KMS keys
- IAM roles and policies
- Security groups
- Application Load Balancers
- CloudWatch Log Groups
- SSM Parameters

Quick Start

1. Get the ECS Example

The ECS example is available in the examples/aws-cdk-ecs directory.

If you have the repository cloned, navigate to:

cd examples/aws-cdk-ecs

Otherwise, download or clone the repository to access the example.

2. Install Dependencies

pnpm install

3. Bootstrap CDK (First Time Only)

If this is your first time using CDK in this AWS account/region:

pnpm cdk bootstrap

4. Deploy the Stack

pnpm cdk deploy

This will:

Create the VPC and networking infrastructure
Provision the ECS Fargate cluster
Create the Aurora PostgreSQL database
Create the ElastiCache serverless cache
Create the KMS encryption key
Deploy the API and UI services (each with an OTEL collector sidecar)
Configure all IAM roles and policies
Set up Application Load Balancers

Deployment typically takes 15-30 minutes.

5. Get Application URLs

After deployment, retrieve the application URLs:

aws cloudformation describe-stacks \
  --stack-name vm-x-ai-ecs-example \
  --query 'Stacks[0].Outputs' \
  --output table

Or check the AWS Console for the Load Balancer DNS names:

ApiUrl: API Load Balancer DNS name
UiUrl: UI Load Balancer DNS name

Architecture

The stack creates:

CDK Stack Overview

The ECS stack is defined in examples/aws-cdk-ecs/lib/ecs-stack.ts. Here's a breakdown of the key components:

VPC Configuration

The stack creates a VPC with public subnets:

const vpc = new Vpc(this, 'VPC', {
  vpcName: 'vm-x-ai-example-vpc',
  ipAddresses: IpAddresses.cidr('10.0.0.0/16'),
  maxAzs: 3,
  subnetConfiguration: [
    {
      cidrMask: 24,
      name: 'Public',
      subnetType: SubnetType.PUBLIC,
    },
  ],
});

Key Points:

CIDR: 10.0.0.0/16 provides 65,536 IP addresses
Availability Zones: 3 AZs for high availability
Subnets: Public subnets only (add private subnets with NAT Gateway for production)

Aurora PostgreSQL Database

The stack creates an Aurora PostgreSQL cluster:

const database = new DatabaseCluster(this, 'Database', {
  engine: DatabaseClusterEngine.auroraPostgres({
    version: AuroraPostgresEngineVersion.VER_17_6,
  }),
  vpc,
  clusterIdentifier: 'vm-x-ai-rds-cluster',
  vpcSubnets: {
    subnetType: SubnetType.PUBLIC, // Production: Use PRIVATE_WITH_EGRESS
  },
  writer: ClusterInstance.provisioned('writer', {
    publiclyAccessible: true, // Production: Set to false
    instanceType: InstanceType.of(InstanceClass.BURSTABLE3, InstanceSize.MEDIUM),
  }),
  credentials: Credentials.fromGeneratedSecret('vmxai', {
    secretName: 'vm-x-ai-database-secret',
  }),
  defaultDatabaseName: 'vmxai',
});

Key Points:

Engine: Aurora PostgreSQL 17.6
Instance Type: db.t3.medium (burstable performance)
Credentials: Auto-generated and stored in AWS Secrets Manager
Network: Publicly accessible for development (use private subnets in production)

ElastiCache Serverless (Valkey)

The stack creates a serverless Valkey (Redis-compatible) cache running in cluster mode:

const redisSecurityGroup = new SecurityGroup(this, 'ElastiCacheSecurityGroup', {
  vpc,
  allowAllOutbound: true,
  description: 'ElastiCache Security Group',
});

const redisCluster = new CfnServerlessCache(this, 'ServerlessCache', {
  engine: 'valkey',
  serverlessCacheName: 'vm-x-ai-valkey-serverless-cache',
  securityGroupIds: [redisSecurityGroup.securityGroupId],
  subnetIds: vpc.publicSubnets.map((subnet) => subnet.subnetId),
  majorEngineVersion: '7',
});

Key Points:

Engine: Valkey (Redis-compatible)
Mode: Serverless cluster — the API connects with REDIS_MODE=cluster and REDIS_TLS=true
Network: Public subnets (use private subnets in production)

ECS Fargate Cluster

The stack creates an ECS Fargate cluster:

const cluster = new Cluster(this, 'Cluster', {
  clusterName: 'vm-x-ai-cluster',
  vpc,
});

Key Points:

Launch Type: Fargate (serverless containers)
No EC2 Management: Fargate handles infrastructure

Application Load Balancers

The stack creates separate ALBs for API and UI:

const apiLoadBalancer = new ApplicationLoadBalancer(this, 'API/LoadBalancer', {
  vpc,
  loadBalancerName: 'vm-x-ai-api',
  internetFacing: true,
  vpcSubnets: {
    subnetType: SubnetType.PUBLIC,
  },
  http2Enabled: true,
});

const uiLoadBalancer = new ApplicationLoadBalancer(this, 'UI/LoadBalancer', {
  vpc,
  loadBalancerName: 'vm-x-ai-ui',
  internetFacing: true,
  vpcSubnets: {
    subnetType: SubnetType.PUBLIC,
  },
  http2Enabled: true,
});

Key Points:

Separate ALBs: One for API, one for UI
HTTP/2: Enabled for better performance
Internet-facing: Public access (use internal ALBs in production)

Fargate Task Definitions

The stack creates task definitions for API and UI:

const apiTaskDefinition = new FargateTaskDefinition(this, 'API/TaskDef', {
  memoryLimitMiB: 1024,
  cpu: 512,
  family: 'vm-x-ai-api-task-definition',
});

const uiTaskDefinition = new FargateTaskDefinition(this, 'UI/TaskDef', {
  memoryLimitMiB: 1024,
  cpu: 512,
  family: 'vm-x-ai-ui-task-definition',
});

Key Points:

Memory: 1024 MiB per task
CPU: 512 CPU units (0.5 vCPU)
Containers: Each task includes the application container plus an optional OTEL collector sidecar

Container Configuration

The API container is configured with environment variables and secrets. The vars below are the ones the API requires at boot — they map directly to the schema validated in packages/api/src/config/schema.ts:

apiTaskDefinition.addContainer('API/Container', {
  image: ContainerImage.fromRegistry('vmxai/api:latest'),
  portMappings: [{ containerPort: 3000 }],
  containerName: 'api',
  environment: {
    LOG_LEVEL: 'info',
    NODE_ENV: 'production',
    PORT: '3000',
    BASE_URL: `http://${apiLoadBalancer.loadBalancerDnsName}`,
    // BASE_PATH: '/_api',                  // set if API and UI share a host
    UI_BASE_URL: `http://${uiLoadBalancer.loadBalancerDnsName}`,

    // Database — DATABASE_RO_HOST is required and points at the Aurora reader endpoint.
    // DATABASE_HOST/PORT/DB_NAME/USER/PASSWORD come in via `secrets` below.
    DATABASE_RO_HOST: database.clusterReadEndpoint.hostname,
    DATABASE_SSL: 'true',

    // Redis cluster (ElastiCache Serverless / Valkey)
    REDIS_HOST: redisCluster.attrEndpointAddress,
    REDIS_PORT: redisCluster.attrEndpointPort,
    REDIS_MODE: 'cluster',
    REDIS_TLS: 'true',

    // Encryption — AWS KMS in production, libsodium for local/dev
    ENCRYPTION_PROVIDER: 'aws-kms',
    AWS_KMS_KEY_ID: encryptionKey.keyArn,
    AWS_REGION: this.region,

    // Optional: federated SSO via OIDC
    // OIDC_FEDERATED_ISSUER: 'https://accounts.google.com/.well-known/openid-configuration',
    // OIDC_FEDERATED_CLIENT_ID: '...',
    // OIDC_FEDERATED_CLIENT_SECRET: '...',  // pull from Secrets Manager in production

    // Optional: OpenTelemetry sidecar
    OTEL_ENABLED: 'true',
    OTEL_EXPORTER_OTLP_ENDPOINT: 'http://localhost:4318',
  },
  secrets: {
    DATABASE_HOST: ECSSecret.fromSecretsManager(database.secret!, 'host'),
    DATABASE_PORT: ECSSecret.fromSecretsManager(database.secret!, 'port'),
    DATABASE_DB_NAME: ECSSecret.fromSecretsManager(database.secret!, 'dbname'),
    DATABASE_USER: ECSSecret.fromSecretsManager(database.secret!, 'username'),
    DATABASE_PASSWORD: ECSSecret.fromSecretsManager(database.secret!, 'password'),
  },
});

Key Points:

Image: Uses the published vmxai/api:latest image
Database: write-host/port/user/password/dbname injected from the auto-generated Secrets Manager secret; read host wired explicitly to the Aurora reader endpoint via DATABASE_RO_HOST (the API uses split read/write pools)
Redis: cluster mode against ElastiCache Serverless, TLS on
Encryption: ENCRYPTION_PROVIDER=aws-kms with AWS_KMS_KEY_ID (use libsodium + LIBSODIUM_ENCRYPTION_KEY for non-AWS or local setups)
OpenTelemetry: optional — enable when you want gateway traces/metrics; the sidecar collector exports to AWS X-Ray and CloudWatch EMF

Fargate Services

The stack creates Fargate services:

const apiService = new FargateService(this, 'API/Service', {
  cluster,
  serviceName: 'vm-x-ai-api',
  enableExecuteCommand: true,
  desiredCount: 1,
  vpcSubnets: {
    subnetType: SubnetType.PUBLIC, // Production: Use PRIVATE_WITH_EGRESS
  },
  taskDefinition: apiTaskDefinition,
  assignPublicIp: true, // Production: Set to false
});

Key Points:

Desired Count: 1 task (can be scaled)
Public IP: Enabled for development (disable in production)
Execute Command: Enabled for debugging

Load Balancer Targets

The stack configures ALB target groups:

apiListener.addTargets('API/Target', {
  targetGroupName: 'vm-x-ai-api-target-group',
  port: 3000,
  targets: [
    apiService.loadBalancerTarget({
      containerName: 'api',
      containerPort: 3000,
    }),
  ],
  healthCheck: {
    path: '/healthcheck',
    interval: cdk.Duration.seconds(30),
    healthyHttpCodes: '200',
  },
});

Key Points:

Health Checks: Configured on /healthcheck endpoint
Port: 3000 for API, 3001 for UI
Protocol: HTTP (add HTTPS in production)

Complete Example

For the complete CDK stack implementation, see the ECS example directory.

The example includes:

Complete CDK stack code
All infrastructure components
IAM roles and policies
Task definitions and services
Load balancer configuration

Configuration

Task Resources

Default task configuration:

API Task: 1024 MiB memory, 512 CPU units
UI Task: 1024 MiB memory, 512 CPU units
OTEL Collector (optional sidecar): 512 MiB memory, 256 CPU units

Modify in lib/ecs-stack.ts:

const apiTaskDefinition = new FargateTaskDefinition(this, 'API/TaskDef', {
  memoryLimitMiB: 2048, // Increase memory
  cpu: 1024, // Increase CPU
  family: 'vm-x-ai-api-task-definition',
});

Service Desired Count

Default is 1 task per service. Modify:

const apiService = new FargateService(this, 'API/Service', {
  // ...
  desiredCount: 2, // Scale to 2 tasks
});

OpenTelemetry Configuration (optional)

The OpenTelemetry collector sidecar is purely for gateway observability — VM-X AI's product features (usage analytics, cost tracking, audit) work fine with it disabled. If you don't want it, drop the sidecar container from the task definitions and unset the OTEL_* env vars on the API container.

When enabled, the collector configuration is stored in ecs-otel-config.yaml and uploaded to SSM Parameter Store. Customize by editing the file.

The configuration file defines receivers, processors, and exporters for traces and metrics:

extensions:
  health_check:
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  awsxray:
    endpoint: 0.0.0.0:2000
    transport: udp
  statsd:
    endpoint: 0.0.0.0:8125
    aggregation_interval: 60s

processors:
  batch/traces:
    timeout: 1s
    send_batch_size: 50
  batch/metrics:
    timeout: 60s

exporters:
  awsxray:
  awsemf:
    namespace: ECS/OTEL/VM-X-AI
    log_group_name: '/aws/ecs/application/metrics'

service:
  pipelines:
    traces:
      receivers: [otlp, awsxray]
      processors: [batch/traces]
      exporters: [awsxray]
    metrics:
      receivers: [otlp, statsd]
      processors: [batch/metrics]
      exporters: [awsemf]

  extensions: [health_check]

Key Configuration Points:

Receivers: OTLP (gRPC and HTTP), AWS X-Ray, and StatsD
Processors: Batch processing for traces and metrics
Exporters: AWS X-Ray for traces, CloudWatch EMF for metrics
Namespace: Metrics exported to ECS/OTEL/VM-X-AI namespace in CloudWatch

CloudWatch Metric Costs

OpenTelemetry can generate a large number of metrics with multiple dimensions (labels/tags). CloudWatch charges $0.30 per metric per month, and each unique combination of metric name and dimension values counts as a separate metric.

High-cardinality metrics (metrics with many unique dimension combinations) can quickly become expensive. For example:

A metric with 3 dimensions, each with 10 possible values = up to 1,000 unique metrics
At $0.30/metric/month, this could cost $300/month for a single metric type

Recommendations:

Monitor your CloudWatch metric count regularly
Consider reducing metric dimensions if costs become high
Use metric filtering or aggregation to reduce cardinality
Review and disable unnecessary metrics in your OpenTelemetry configuration
Set up CloudWatch billing alarms to track metric costs

You can check your current metric count:

aws cloudwatch list-metrics --namespace ECS/OTEL/VM-X-AI --query 'length(Metrics)'

Accessing Services

Application

Access the application at the Load Balancer DNS names:

UI: http://<ui-alb-dns-name>
API: http://<api-alb-dns-name>

Default credentials:

Username: admin
Password: admin

CloudWatch Logs

View logs for all services:

# API logs
aws logs tail /aws/ecs/vm-x-ai-api --follow

# UI logs
aws logs tail /aws/ecs/vm-x-ai-ui --follow

# Collector logs
aws logs tail /aws/ecs/vm-x-ai-collector --follow

AWS X-Ray

If the OTEL sidecar is enabled, traces are sent to AWS X-Ray. View them in the AWS X-Ray console or via CLI:

aws xray get-trace-summaries \
  --start-time $(date -u -d '1 hour ago' +%s) \
  --end-time $(date -u +%s)

Secrets Management

The stack uses AWS Secrets Manager and SSM Parameter Store:

Database Credentials: Stored in Secrets Manager (vm-x-ai-database-secret)
- Automatically generated when Aurora cluster is created
- Contains: host, port, dbname, username, password
- Pulled into the task as DATABASE_HOST / DATABASE_PORT / DATABASE_DB_NAME / DATABASE_USER / DATABASE_PASSWORD
UI Auth Secret: Stored in Secrets Manager (vm-x-ai-ui-auth-secret)
- Auto-generated 32-character secret
OpenTelemetry Config: Stored in SSM Parameter Store (vm-x-ai-otel-config)
- Contains collector configuration from ecs-otel-config.yaml
KMS Key: Referenced by ARN via AWS_KMS_KEY_ID (no Secrets Manager entry needed)

The task execution role needs secretsmanager:GetSecretValue and ssm:GetParameters for these resources, plus kms:Decrypt / kms:Encrypt / kms:GenerateDataKey on the encryption key for the API task role. The CDK example wires those policies for you.

Monitoring and Observability

The stack includes:

CloudWatch Logs: All container logs (always on)
AWS X-Ray: Distributed tracing for API requests (only when the OTEL sidecar is enabled)
CloudWatch Metrics: Custom metrics via the OpenTelemetry EMF exporter (only when the OTEL sidecar is enabled)
Health Checks: ALB health checks on /healthcheck endpoints (always on)

CloudWatch Metric Costs

OpenTelemetry metrics exported to CloudWatch can generate high costs due to metric cardinality. Each unique combination of metric name and dimension values is billed as a separate metric at $0.30 per metric per month. Monitor your metric count and consider reducing dimensions if costs become high.

Viewing Metrics

Metrics are exported to CloudWatch under namespace ECS/OTEL/VM-X-AI:

aws cloudwatch list-metrics --namespace ECS/OTEL/VM-X-AI

To check the total number of metrics (which affects billing):

aws cloudwatch list-metrics --namespace ECS/OTEL/VM-X-AI --query 'length(Metrics)'

Cost Considerations

Estimated monthly costs for a minimal production setup:

ECS Fargate: ~$30-50/month (0.04/vCPU-hour + 0.004/GB-hour)
Application Load Balancers: ~$32/month (2 ALBs × $0.0225/hour)
Aurora PostgreSQL: $100-200/month (db.t3.medium)
ElastiCache Serverless: $10-30/month (pay-per-use)
Data Transfer: ~$0.09/GB for outbound
CloudWatch Logs: ~$0.50/GB ingested, $0.03/GB stored
CloudWatch Metrics (only with OTEL sidecar): $0.30 per metric per month (can be significant with high-cardinality OpenTelemetry metrics)

Total: $170-310/month (excluding CloudWatch metrics, which can add $50-500+ depending on metric cardinality)

CloudWatch Metrics Cost

CloudWatch metrics can become a significant cost driver, especially with OpenTelemetry. Each unique combination of metric name and dimension values is billed separately. For example:

100 unique metrics = $30/month
1,000 unique metrics = $300/month
10,000 unique metrics = $3,000/month

Monitor your metric count and consider:

Reducing metric dimensions
Filtering or aggregating metrics
Disabling unnecessary metrics
Setting up CloudWatch billing alarms

To reduce costs:

Use smaller task sizes (reduce CPU/memory)
Reduce desired count to 0 when not in use
Use Aurora Serverless v2 for variable workloads
Disable the OTEL sidecar if you don't need application observability
Use single-AZ deployment (not recommended for production)

Troubleshooting

Check Task Status

# List tasks
aws ecs list-tasks --cluster vm-x-ai-cluster --service-name vm-x-ai-api

# Describe task
aws ecs describe-tasks --cluster vm-x-ai-cluster --tasks <task-arn>

# Task logs
aws logs tail /aws/ecs/vm-x-ai-api --follow

Check Service Status

# Describe service
aws ecs describe-services \
  --cluster vm-x-ai-cluster \
  --services vm-x-ai-api vm-x-ai-ui

Check Load Balancer

# Describe load balancer
aws elbv2 describe-load-balancers --names vm-x-ai-api vm-x-ai-ui

# Check target health
aws elbv2 describe-target-health \
  --target-group-arn <target-group-arn>

Common Issues

Tasks stuck in Pending: Check security group rules and VPC configuration
Tasks failing health checks: Verify health check path and container configuration
Database connection failures: Check security group rules and VPC configuration; verify both DATABASE_HOST (writer) and DATABASE_RO_HOST (reader) resolve from the task's subnets
Secrets not accessible: Verify IAM role permissions for Secrets Manager
API boots but rejects KMS calls: Make sure AWS_REGION is set on the task and the task role has kms:Encrypt / kms:Decrypt / kms:GenerateDataKey on the encryption key

Cleanup

To destroy all resources:

pnpm cdk destroy

Warning: This will delete all resources including the database and cache. Make sure you have backups if needed.

Note: The ElastiCache serverless cache may take several minutes to delete after the stack tears down.

Customization

Add Auto Scaling

Add Application Auto Scaling to automatically scale services:

import { ScalableTarget, ServiceNamespace, MetricType } from 'aws-cdk-lib/aws-applicationautoscaling';

const scalableTarget = apiService.autoScaleTaskCount({
  minCapacity: 1,
  maxCapacity: 10,
});

scalableTarget.scaleOnCpuUtilization('CpuScaling', {
  targetUtilizationPercent: 70,
});

Use Private Subnets (Production)

For production, move resources to private subnets:

vpcSubnets: {
  subnetType: SubnetType.PRIVATE_WITH_EGRESS,  // Instead of PUBLIC
},
assignPublicIp: false,  // Instead of true

Security Best Practices

For production deployments:

Private Subnets: Move all resources to private subnets with NAT Gateway
Security Groups: Implement least-privilege security group rules
Secrets Rotation: Enable automatic secret rotation in Secrets Manager
Encryption: Ensure all data at rest is encrypted
Backup: Enable automated backups for Aurora
Monitoring: Set up CloudWatch alarms for service health
Access Control: Use least-privilege IAM policies
HTTPS: Configure SSL/TLS certificates for load balancers
VPC Endpoints: Use VPC endpoints for AWS services
Network ACLs: Implement network ACLs for additional security

Production Checklist

Before deploying to production:

Next Steps

AWS EKS Deployment - Alternative AWS deployment option
Minikube Deployment - Local Kubernetes deployment
ECS Example README - Detailed example documentation

What gets deployed​

Prerequisites​

Quick Start​

1. Get the ECS Example​

2. Install Dependencies​

3. Bootstrap CDK (First Time Only)​

4. Deploy the Stack​

5. Get Application URLs​

Architecture​

CDK Stack Overview​

VPC Configuration​

Aurora PostgreSQL Database​

ElastiCache Serverless (Valkey)​

ECS Fargate Cluster​

Application Load Balancers​

Fargate Task Definitions​

Container Configuration​

Fargate Services​

Load Balancer Targets​

Complete Example​

Configuration​

Task Resources​

Service Desired Count​

OpenTelemetry Configuration (optional)​

Accessing Services​

Application​

CloudWatch Logs​

AWS X-Ray​

Secrets Management​

Monitoring and Observability​

Viewing Metrics​

Cost Considerations​

Troubleshooting​

Check Task Status​

Check Service Status​

Check Load Balancer​

Common Issues​

Cleanup​

Customization​

Add Auto Scaling​

Use Private Subnets (Production)​

Security Best Practices​

Production Checklist​

Next Steps​

What gets deployed

Prerequisites

Quick Start

1. Get the ECS Example

2. Install Dependencies

3. Bootstrap CDK (First Time Only)

4. Deploy the Stack

5. Get Application URLs

Architecture

CDK Stack Overview

VPC Configuration

Aurora PostgreSQL Database

ElastiCache Serverless (Valkey)

ECS Fargate Cluster

Application Load Balancers

Fargate Task Definitions

Container Configuration

Fargate Services

Load Balancer Targets

Complete Example

Configuration

Task Resources

Service Desired Count

OpenTelemetry Configuration (optional)

Accessing Services

Application

CloudWatch Logs

AWS X-Ray

Secrets Management

Monitoring and Observability

Viewing Metrics

Cost Considerations

Troubleshooting

Check Task Status

Check Service Status

Check Load Balancer

Common Issues

Cleanup

Customization

Add Auto Scaling

Use Private Subnets (Production)

Security Best Practices

Production Checklist

Next Steps