AI Platform Architecture

Overview

This platform follows a Control Plane / Data Plane (CP/DP) architecture for multi-tenant AI agent deployment. This provides strong isolation, independent scaling, and precise metering for each tenant.

Core Security Guarantees

Guarantee	Description
Never runs client code	CP only orchestrates, plans, and enforces policies
Never sees secrets	Client API keys, OAuth tokens remain in DP only
Stateless policy enforcement	All decisions based on signed tokens and policies
Multi-tenant isolation	Per-tenant data stores, network isolation, and policy configs

Trust Boundaries

Boundary	Guarantee
Control to Data	Signed execution plans only (cryptographically verified)
Data to Control	Outcome metadata only (no PII, no prompts, no responses)
Secrets	Never cross boundary (remain in DP always)
Metering	Control Plane only (usage accounting, quotas, approvals)
Execution	Data Plane only (client code, LLM calls, external APIs)

Terminology

Term	Definition
Workflow	A workflow runtime instance running in the Data Plane. The actual execution unit.
Agent	A user-facing concept representing an AI automation capability. Implemented as workflow runtime instances.
Tenant	A customer organization with dedicated infrastructure and isolated resources.
Execution	A single invocation of a workflow, triggered via chat command or API call.
Outcome	Execution result metadata (success/failure, duration, tokens used) sent from DP to CP. No PII.

Workflow Execution Modes

Mode	Description	User Experience	Documentation
Static	User uploads pre-built workflow JSON (BYO)	Full workflow runtime features, manual creation	Implementation details vary by deployment
Dynamic	AI generates workflows from natural language intent	No workflow runtime knowledge needed, limited to approved tools	Implementation details vary by deployment

High-Level Architecture Diagram

The Control Plane performs planning, policy enforcement, and metering. The Data Plane performs execution and holds tenant secrets and data.

AI-Powered Planning

The Planner service uses an LLM with comprehensive context injection to generate data-driven execution plans. It gathers historical performance data, approval patterns, and tenant configurations to make informed predictions.

Context Gathering Flow

Example: Outcome Context Data

The Outcome Aggregator returns rich performance metrics:

Aggregated Context Includes:

Total executions, success/failure counts, success rate
Duration metrics: average, p50, p99
Token usage: average and trends
Error distribution by type
Success rate trends (improving/degrading)
Performance stability indicators
Last error timestamp and type

How AI Uses Context

Time prediction: Uses historical duration samples to predict expected runtime and quantify uncertainty.
Auto-approval decision: Combines approval history, recent outcomes, and policy configuration to decide whether human review is required.
Risk assessment: Uses success rate, recent errors, and performance stability to inform the approval posture.
Usage estimation: Produces token and duration estimates to support quota enforcement and operational planning.

Result: Decisions are based on observed patterns and explicit policy constraints, not static heuristics alone.

Component Details

Control Plane Components

Component	Technology	Purpose
API Gateway	HTTP API	Request routing, throttling
Planner	Compute Service + LLM	Generate execution plans, time prediction
Policy Engine	Compute Service + Database	ALLOW/DENY/APPROVE decisions
Token Service	Authentication Service + ES256	Issue & validate JWTs (5min access, 4h refresh)
Metering Collector	Compute Service	Receive heartbeats, detect violations
Prompt Library	Compute Service + Database	Versioned prompt management
Workflow Manager	Compute Service + Database	Manage workflow enable/disable per tenant
Admin Panel	Web Application + CDN	Tenant management, Storage Browser, analytics
Database	NoSQL Database	8+ tables (executions, policies, prompts, etc.)
Shared ALB	Application Load Balancer	Host-based routing for all tenants

Data Plane Components

Component	Technology	Purpose
Workflow runtime	Compute Instance (spot)	Workflow execution engine (cost-optimized compute)
Metering Sidecar	Go + DCGM	GPU/usage reporting, heartbeat
Secret Vault	Secrets Manager	Client API keys, OAuth tokens (never leaves DP)
vLLM	GPU Compute Instance	LLM inference (Enterprise/GPU Pro tiers)

Security Model

Token Types

Type	TTL	Purpose
Access Token	5 min	Execute specific workflow
Refresh Token	4 hours	Renew access tokens
Metering Token	30 min	Report usage metrics
Admin Token	60 min	Tenant management

Token Claims

Token Properties:

Algorithm: ES256 (ECDSA with SHA-256)
Standard claims: Issuer, subject, audience, expiration, issued-at, JTI (unique ID)
Custom claims: Tenant ID, token type, agent/session identifiers
Expiration: 5-60 minutes depending on token type
Storage: Hashed (SHA-256) before database storage
Revocation: Instant via database flag

Token Security

Tokens hashed (SHA256) before storage
Instant revocation via database flag
1-hour hard timeout (no extensions)
Never stored in plaintext

CIDR Allocation

Component	CIDR Range
Control Plane	10.10.0.0/16
Tenant 1	10.100.0.0/16
Tenant 2	10.101.0.0/16
Tenant N	10.(100+N).0.0/16

Max tenants: 55 (10.100 - 10.154)

Network Peering

One-way: DP to CP only (security isolation)
Route tables configured in DP to reach CP
No routes from CP to DP

Workflow Management

Workflow Registry

All workflows stored in data-plane/workflows/ and registered in the database:

Workflow	Description	Required Tier
marketing_content_agent	Social media content generation	starter
lead_intake_agent	Lead qualification and routing	starter
appointment_scheduler_agent	Calendar management	professional
kpi_report_agent	Business metrics reporting	professional
rag_assistant_agent	RAG-based document Q&A	enterprise

Workflow Lifecycle

Metering & Usage

Heartbeat Protocol

Sidecar sends heartbeat every 60 seconds
Contains: GPU %, memory, CPU, active workflows
Control Plane stores in database
Violation after 5 minutes of missed heartbeats
Auto-suspend triggered via EventBridge

Outcome Feedback Loop

Execution outcomes collected asynchronously from DPs (no PII) for planner enhancement:

OUTCOME PAYLOAD (No PII):
- execution_id
- workflow_id, tenant_id
- success: boolean
- duration_ms, tokens_used
- gpu_seconds_used
- embedding_vectors
- error_code
- timestamp

EXCLUDED FIELDS:
- user prompts
- generated content
- PII / user data
- API keys / credentials

Technology Stack

Control Plane

Compute: Serverless functions
API: API Gateway HTTP API
Database: NoSQL database (8+ tables with indexes, streams, TTL)
Storage: Object storage (per-tenant buckets with lifecycle policies)
Auth: Identity provider, JWT tokens (ES256 signing)
Scheduling: EventBridge (timeout checker every 5min)
Queue: SQS with DLQ (completion callback retries)
AI: LLM (planning, time prediction)
IaC: Infrastructure as Code

Data Plane

Compute: Spot compute instances via workflow runtime
Orchestration: workflow runtime (containerized)
Networking: Network peering (DP to CP communication)
Load Balancing: Shared ALB with host-based routing
Metering: Go sidecar service (heartbeat, tamper detection)
Monitoring: Metrics and log aggregation

Admin Panel

Framework: Web application framework
Storage UI: Object storage browser interface
Auth: STS temporary credentials (1-hour)

Deployment Flow

Monitoring & Observability

Metrics: Function metrics, API Gateway latency
X-Ray: Distributed tracing across CP/DP
Logging: Centralized logging
EventBridge: Async event processing
SNS: Alerts for violations and errors

Future Enhancements

ML Model for Predictions
- Train custom model on prediction_error data
- Replace initial LLM planner for efficiency/latency optimization
- Track confidence calibration
Multi-Region Deployment
- Active-active across us-east-1, eu-west-1
- Route53 geo-routing for low latency
- Cross-region database replication
Public API (Phase 2)
- REST API for direct workflow invocation
- API key management
- Rate limiting per API key
- Usage metering integration
Advanced Analytics
- Real-time execution dashboards
- Resource attribution by workflow/tenant
- Anomaly detection for unusual patterns
- Predictive capacity planning

Overview​

Core Security Guarantees​

Trust Boundaries​

Terminology​

Workflow Execution Modes​

High-Level Architecture Diagram​

AI-Powered Planning​

Context Gathering Flow​

Example: Outcome Context Data​

How AI Uses Context​

Component Details​

Control Plane Components​

Data Plane Components​

Security Model​

Token Types​

Token Claims​

Token Security​

CIDR Allocation​

Network Peering​

Workflow Management​

Workflow Registry​

Workflow Lifecycle​

Metering & Usage​

Heartbeat Protocol​

Outcome Feedback Loop​

Technology Stack​

Control Plane​

Data Plane​

Admin Panel​

Deployment Flow​

Monitoring & Observability​

Future Enhancements​