AI Platform Architecture
Overview
This platform follows a Control Plane / Data Plane (CP/DP) architecture for multi-tenant AI agent deployment. This provides strong isolation, independent scaling, and precise metering for each tenant.
Core Security Guarantees
| Guarantee | Description |
|---|---|
| Never runs client code | CP only orchestrates, plans, and enforces policies |
| Never sees secrets | Client API keys, OAuth tokens remain in DP only |
| Stateless policy enforcement | All decisions based on signed tokens and policies |
| Multi-tenant isolation | Per-tenant data stores, network isolation, and policy configs |
Trust Boundaries
| Boundary | Guarantee |
|---|---|
| Control to Data | Signed execution plans only (cryptographically verified) |
| Data to Control | Outcome metadata only (no PII, no prompts, no responses) |
| Secrets | Never cross boundary (remain in DP always) |
| Metering | Control Plane only (usage accounting, quotas, approvals) |
| Execution | Data Plane only (client code, LLM calls, external APIs) |
Terminology
| Term | Definition |
|---|---|
| Workflow | A workflow runtime instance running in the Data Plane. The actual execution unit. |
| Agent | A user-facing concept representing an AI automation capability. Implemented as workflow runtime instances. |
| Tenant | A customer organization with dedicated infrastructure and isolated resources. |
| Execution | A single invocation of a workflow, triggered via chat command or API call. |
| Outcome | Execution result metadata (success/failure, duration, tokens used) sent from DP to CP. No PII. |
Workflow Execution Modes
| Mode | Description | User Experience | Documentation |
|---|---|---|---|
| Static | User uploads pre-built workflow JSON (BYO) | Full workflow runtime features, manual creation | Implementation details vary by deployment |
| Dynamic | AI generates workflows from natural language intent | No workflow runtime knowledge needed, limited to approved tools | Implementation details vary by deployment |
High-Level Architecture Diagram
The Control Plane performs planning, policy enforcement, and metering. The Data Plane performs execution and holds tenant secrets and data.
AI-Powered Planning
The Planner service uses an LLM with comprehensive context injection to generate data-driven execution plans. It gathers historical performance data, approval patterns, and tenant configurations to make informed predictions.
Context Gathering Flow
Example: Outcome Context Data
The Outcome Aggregator returns rich performance metrics:
Aggregated Context Includes:
- Total executions, success/failure counts, success rate
- Duration metrics: average, p50, p99
- Token usage: average and trends
- Error distribution by type
- Success rate trends (improving/degrading)
- Performance stability indicators
- Last error timestamp and type
How AI Uses Context
-
Time prediction: Uses historical duration samples to predict expected runtime and quantify uncertainty.
-
Auto-approval decision: Combines approval history, recent outcomes, and policy configuration to decide whether human review is required.
-
Risk assessment: Uses success rate, recent errors, and performance stability to inform the approval posture.
-
Usage estimation: Produces token and duration estimates to support quota enforcement and operational planning.
Result: Decisions are based on observed patterns and explicit policy constraints, not static heuristics alone.
Component Details
Control Plane Components
| Component | Technology | Purpose |
|---|---|---|
| API Gateway | HTTP API | Request routing, throttling |
| Planner | Compute Service + LLM | Generate execution plans, time prediction |
| Policy Engine | Compute Service + Database | ALLOW/DENY/APPROVE decisions |
| Token Service | Authentication Service + ES256 | Issue & validate JWTs (5min access, 4h refresh) |
| Metering Collector | Compute Service | Receive heartbeats, detect violations |
| Prompt Library | Compute Service + Database | Versioned prompt management |
| Workflow Manager | Compute Service + Database | Manage workflow enable/disable per tenant |
| Admin Panel | Web Application + CDN | Tenant management, Storage Browser, analytics |
| Database | NoSQL Database | 8+ tables (executions, policies, prompts, etc.) |
| Shared ALB | Application Load Balancer | Host-based routing for all tenants |
Data Plane Components
| Component | Technology | Purpose |
|---|---|---|
| Workflow runtime | Compute Instance (spot) | Workflow execution engine (cost-optimized compute) |
| Metering Sidecar | Go + DCGM | GPU/usage reporting, heartbeat |
| Secret Vault | Secrets Manager | Client API keys, OAuth tokens (never leaves DP) |
| vLLM | GPU Compute Instance | LLM inference (Enterprise/GPU Pro tiers) |
Security Model
Token Types
| Type | TTL | Purpose |
|---|---|---|
| Access Token | 5 min | Execute specific workflow |
| Refresh Token | 4 hours | Renew access tokens |
| Metering Token | 30 min | Report usage metrics |
| Admin Token | 60 min | Tenant management |
Token Claims
Token Properties:
- Algorithm: ES256 (ECDSA with SHA-256)
- Standard claims: Issuer, subject, audience, expiration, issued-at, JTI (unique ID)
- Custom claims: Tenant ID, token type, agent/session identifiers
- Expiration: 5-60 minutes depending on token type
- Storage: Hashed (SHA-256) before database storage
- Revocation: Instant via database flag
Token Security
- Tokens hashed (SHA256) before storage
- Instant revocation via database flag
- 1-hour hard timeout (no extensions)
- Never stored in plaintext
CIDR Allocation
| Component | CIDR Range |
|---|---|
| Control Plane | 10.10.0.0/16 |
| Tenant 1 | 10.100.0.0/16 |
| Tenant 2 | 10.101.0.0/16 |
| Tenant N | 10.(100+N).0.0/16 |
Max tenants: 55 (10.100 - 10.154)
Network Peering
- One-way: DP to CP only (security isolation)
- Route tables configured in DP to reach CP
- No routes from CP to DP
Workflow Management
Workflow Registry
All workflows stored in data-plane/workflows/ and registered in the database:
| Workflow | Description | Required Tier |
|---|---|---|
| marketing_content_agent | Social media content generation | starter |
| lead_intake_agent | Lead qualification and routing | starter |
| appointment_scheduler_agent | Calendar management | professional |
| kpi_report_agent | Business metrics reporting | professional |
| rag_assistant_agent | RAG-based document Q&A | enterprise |
Workflow Lifecycle
Metering & Usage
Heartbeat Protocol
- Sidecar sends heartbeat every 60 seconds
- Contains: GPU %, memory, CPU, active workflows
- Control Plane stores in database
- Violation after 5 minutes of missed heartbeats
- Auto-suspend triggered via EventBridge
Outcome Feedback Loop
Execution outcomes collected asynchronously from DPs (no PII) for planner enhancement:
OUTCOME PAYLOAD (No PII):
- execution_id
- workflow_id, tenant_id
- success: boolean
- duration_ms, tokens_used
- gpu_seconds_used
- embedding_vectors
- error_code
- timestamp
EXCLUDED FIELDS:
- user prompts
- generated content
- PII / user data
- API keys / credentials
Technology Stack
Control Plane
- Compute: Serverless functions
- API: API Gateway HTTP API
- Database: NoSQL database (8+ tables with indexes, streams, TTL)
- Storage: Object storage (per-tenant buckets with lifecycle policies)
- Auth: Identity provider, JWT tokens (ES256 signing)
- Scheduling: EventBridge (timeout checker every 5min)
- Queue: SQS with DLQ (completion callback retries)
- AI: LLM (planning, time prediction)
- IaC: Infrastructure as Code
Data Plane
- Compute: Spot compute instances via workflow runtime
- Orchestration: workflow runtime (containerized)
- Networking: Network peering (DP to CP communication)
- Load Balancing: Shared ALB with host-based routing
- Metering: Go sidecar service (heartbeat, tamper detection)
- Monitoring: Metrics and log aggregation
Admin Panel
- Framework: Web application framework
- Storage UI: Object storage browser interface
- Auth: STS temporary credentials (1-hour)
Deployment Flow
Monitoring & Observability
- Metrics: Function metrics, API Gateway latency
- X-Ray: Distributed tracing across CP/DP
- Logging: Centralized logging
- EventBridge: Async event processing
- SNS: Alerts for violations and errors
Future Enhancements
-
ML Model for Predictions
- Train custom model on prediction_error data
- Replace initial LLM planner for efficiency/latency optimization
- Track confidence calibration
-
Multi-Region Deployment
- Active-active across us-east-1, eu-west-1
- Route53 geo-routing for low latency
- Cross-region database replication
-
Public API (Phase 2)
- REST API for direct workflow invocation
- API key management
- Rate limiting per API key
- Usage metering integration
-
Advanced Analytics
- Real-time execution dashboards
- Resource attribution by workflow/tenant
- Anomaly detection for unusual patterns
- Predictive capacity planning