Service Level Agreement (SLA)
This document defines service level commitments, support tiers, and guarantees for the platform.
Service Tiers
Service tiers describe operational posture and support commitments. Commercial terms and pricing are intentionally excluded from this repository; capacity limits and contract terms are deployment-specific.
| Capability | Starter | Professional | Enterprise |
|---|---|---|---|
| Intended use | Evaluation and prototypes | Production workloads | Mission-critical and regulated workloads |
| Uptime target | 99.0% | 99.9% | 99.95% |
| Support channels | Community forum and issue tracker | Email/ticketing and shared chat | 24/7 on-call, dedicated channel, named contacts |
| Response targets | Best effort | Targeted (business hours) | Targeted (24/7) |
| Disaster recovery | Single region | Single region | Multi-region options |
| Data retention | Short | Medium | Extended / configurable |
| Service credits | Not applicable | Contract-defined | Contract-defined |
Uptime Guarantees
Definition of "Uptime"
Uptime = Percentage of time the API is available and responsive over a calendar month.
Availability Calculation:
Uptime % = (Total Minutes in Month - Downtime Minutes) / Total Minutes in Month × 100
Uptime Targets by Tier
| Tier | Monthly Uptime Target | Downtime Allowed (Monthly) | Downtime Allowed (Weekly) |
|---|---|---|---|
| Starter | 99.0% | 7.3 hours (438 minutes) | 1.68 hours (101 minutes) |
| Professional | 99.9% | 43.8 minutes | 10.1 minutes |
| Enterprise | 99.95% | 21.9 minutes | 5.04 minutes |
What Counts as "Downtime"
Downtime includes:
- API returns 500, 502, 503, 504 errors
- Request timeouts after 30 seconds
- Complete regional outage
- Database unavailability
Downtime excludes:
- Scheduled maintenance (with advance notice)
- Customer configuration errors
- Quota exceeded (customer exceeded agreed limits)
- Third-party API failures
- Force majeure events
Uptime Monitoring
Monitoring Method: Synthetic checks from 3 geographically distributed regions
Health Check Frequency: Every 60 seconds
Failure Criteria: 3 consecutive failures (180 seconds) = downtime starts
Support Response Times
Support Channels by Tier
| Channel | Starter | Professional | Enterprise |
|---|---|---|---|
| Community forum | ✅ | ✅ | ✅ |
| Issue tracker | ✅ | ✅ | ✅ |
| Email support | ❌ | ✅ | ✅ |
| Shared chat channel | ❌ | ✅ | ✅ |
| Phone support | ❌ | ❌ | ✅ (24/7) |
| Dedicated chat channel | ❌ | ❌ | ✅ |
| Customer success manager | ❌ | ❌ | ✅ |
Response Time SLAs
| Severity | Starter | Professional | Enterprise |
|---|---|---|---|
| P0 (Critical) | N/A | 4 hours | 15 minutes |
| P1 (High) | N/A | 4 hours | 1 hour |
| P2 (Medium) | Best effort | 24 hours | 4 hours |
| P3 (Low) | Best effort | 48 hours | 24 hours |
Response time = time to first human reply, NOT time to resolution.
Support Coverage Hours
| Tier | Coverage |
|---|---|
| Starter | Community-driven (no guaranteed hours) |
| Professional | 9 AM - 6 PM ET, Monday-Friday |
| Enterprise | 24/7/365 (on-call rotation) |
Performance Targets
API Latency (P95)
| Endpoint | Professional | Enterprise |
|---|---|---|
| POST /chat/commands | < 500ms | < 300ms |
| POST /webhooks/:tenant_id/:workflow_id | < 800ms | < 500ms |
| GET /tenants | < 200ms | < 100ms |
| POST /outcomes/:execution_id | < 300ms | < 200ms |
Workflow Execution Time
| Workflow Type | Professional | Enterprise |
|---|---|---|
| Simple (no external APIs) | < 5 seconds | < 3 seconds |
| Complex (multiple APIs) | < 30 seconds | < 20 seconds |
| Async (human-in-the-loop) | Up to 24 hours | Up to 24 hours |
Throughput Limits
| Tier | Requests/Second (per tenant) | Burst Capacity |
|---|---|---|
| Starter | 10 req/s | 50 req/s for 10 seconds |
| Professional | 100 req/s | 500 req/s for 30 seconds |
| Enterprise | 1,000 req/s | 5,000 req/s for 60 seconds |
Rate Limiting Behavior:
- Requests exceeding limit receive 429 Too Many Requests
Retry-Afterheader indicates seconds to wait- No executions are lost (requests can be retried)
Service Credits
Service credits, where applicable, are defined in the executed contract/order form.
Eligibility
Service credits may be issued when:
- Monthly uptime falls below the applicable SLA target
- API latency (P95) materially exceeds the stated target
- Support response time targets are missed
Exclusions
Service credits are not issued for:
- Events excluded from the SLA
- Customer-caused issues or misconfiguration
- Quota exceeded
- Third-party outages outside the platform's reasonable control
Scheduled Maintenance
Maintenance Windows
Default Schedule:
- Day: Every Tuesday
- Time: 02:00 - 04:00 UTC (9 PM - 11 PM ET Monday)
- Duration: Up to 2 hours
- Frequency: As needed (typically 1-2x per month)
Advance Notice: 7 days via:
- Email to all tenant admins
- Status page
- Chat/workspace notifications (for Professional/Enterprise)
Maintenance Types
| Type | Activities | Impact | Notice | Counts Toward SLA |
|---|---|---|---|---|
| Low-Impact | Function updates, config changes | None | 7 days | No |
| High-Impact | Database migrations, API updates | 5-10 min downtime | 7 days | No |
| Emergency | Critical security patches | Variable | Best effort | Yes |
Incident Severity Definitions
| Severity | Definition | Examples |
|---|---|---|
| P0 (Critical) | Complete platform outage affecting all tenants | API completely unavailable, data loss |
| P1 (High) | Major feature unavailable affecting many tenants | Workflow execution failing, auth broken |
| P2 (Medium) | Feature degraded or slow for some tenants | High latency, intermittent errors |
| P3 (Low) | Minor issue with workaround | UI glitch, documentation error |
Exclusions
The SLA does not apply to:
- Beta/Preview Features: Experimental features marked as beta
- Customer-Caused Issues: Misconfiguration, invalid API usage
- Quota Exceeded: Customer exceeded agreed limits
- Third-Party Failures: External APIs, chat platforms
- Force Majeure: Natural disasters, war, pandemics
- Scheduled Maintenance: With 7-day advance notice
- Customer Network Issues: ISP outages, firewall blocks
Monitoring & Reporting
Real-Time Monitoring
- Status Page: Public status page with real-time health
- Monitoring Dashboards: Internal metrics for all services
- Synthetic Monitoring: External checks every 60 seconds
Monthly Reports (Enterprise)
Enterprise customers receive monthly reports including:
- Uptime percentage
- P95 latency by endpoint
- Support ticket summary
- Incident post-mortems (if any)
- Usage statistics
Incident Communication
| Severity | Initial Update | Ongoing Updates | Post-Mortem |
|---|---|---|---|
| P0 | 15 minutes | Every 30 minutes | Within 5 business days |
| P1 | 1 hour | Every 2 hours | Within 10 business days |
| P2 | 4 hours | Daily | As needed |
| P3 | 24 hours | Weekly | Not required |