Error Budget Calculator
Calculate and understand your service's error budget
🧮 Calculate Your Error Budget
Calculate Burn Rate
Enter actual downtime this month to calculate your burn rate
🎯 What is an Error Budget?
An error budget is the acceptable amount of downtime your service can have while still meeting your Service Level Objective (SLO). It answers the question: "How much failure can we afford?"
Key Concept
- SLO = 99.9% means you can afford 43.8 minutes of downtime per month
- This is your "budget" to spend on experiments, deployments, and maintenance
- When you've used up your budget, focus on stability instead of new features
📊 Understanding the Calculator
Main Components
1. SLO Target Selection
Choose from preset SLO targets:
- 99% - Large consumer services (okay with ~7 hours downtime/month)
- 99.5% - Most enterprise services
- 99.9% - Critical business services
- 99.95% - Very critical services
- 99.99% - Ultra-critical infrastructure
2. Key Metrics Displayed
| Metric | What It Means | Use Case |
|---|---|---|
| Allowed Downtime | Minutes you can be down per month | Budget planning |
| In Hours | Same metric in hours | Planning maintenance windows |
| Failure Rate | Percentage tolerance per month | Understanding error rate |
3. Progress Bar
Visual representation of your SLO target (100% = perfect uptime)
🔥 Burn Rate: The Critical Metric
Burn Rate = How fast you're consuming your error budget
Burn Rate Categories
| Burn Rate | What It Means | Action |
|---|---|---|
| < 2% | Slow burn (healthy) | ✅ Safe to deploy and experiment |
| 2-5% | Medium burn (caution) | ⚠️ Be thoughtful about deployments |
| 5% to < 10% | Fast burn (alert) | 🔴 Minimize risky changes |
| ≥ 10% | Critical (emergency) | 🚨 Focus on stability only |
How to Calculate Burn Rate
Example:
- Your SLO: 99.9% = 43.8 minutes/month budget
- Downtime this month: 5 minutes
- Burn Rate: (5 / 43.8) × 100 = 11.4% (Critical Burn - focus on stability!)
💡 Using Error Budgets for Deployment Decisions
Decision Framework
Is Burn Rate Low? (< 5%) ├─ YES → Deploy frequently, experiment safely └─ NO → Hold off on risky changes, focus on stability
Practical Examples
Scenario 1: Healthy Budget
- SLO: 99.9% (43.8 min/month)
- Downtime this month: 2 minutes
- Burn Rate: 4.6% (Medium Burn)
- Decision: ✅ Okay to deploy, but be careful
Scenario 2: Budget Exhausted
- SLO: 99.9% (43.8 min/month)
- Downtime this month: 40 minutes
- Burn Rate: 91% (Critical Burn)
- Decision: 🚨 Stop deployments, focus on stability
Scenario 3: Budget Warning
- SLO: 99.99% (4.38 min/month)
- Downtime this month: 0.5 minutes
- Burn Rate: 11% (Critical Burn)
- Decision: ⚠️ Reduce deployment frequency
🧮 The Math Behind It
Calculate Downtime from SLO
Minutes Per Month = (Error Rate / 100) × (365 × 24 × 60) / 12
Example: 99.9% SLO
- Error Rate: 100 - 99.9 = 0.1%
- Minutes Per Month: (0.1 / 100) × 525600 / 12 = 43.8 minutes
Common SLO Reference Table
| SLO | Per Year | Per Month | Per Week | Per Day |
|---|---|---|---|---|
| 99% | 3.65 days | 7.3 hours | 1.7 hours | 1.44 min |
| 99.5% | 1.83 days | 3.65 hours | 51 min | 43 sec |
| 99.9% | 8.76 hours | 43.8 min | 10 min | 86 sec |
| 99.95% | 4.38 hours | 21.9 min | 5 min | 43 sec |
| 99.99% | 52.6 min | 4.38 min | 61 sec | 8.6 sec |
| 99.999% | 5.26 min | 26.3 sec | 6 sec | 0.86 sec |
🎓 Best Practices
✅ DO
- Track burn rate daily - Make it visible to the team
- Use budget wisely - Spend on meaningful improvements
- Reset monthly - Start fresh each month
- Communicate openly - Share burn rate status with stakeholders
- Link to deployment decisions - Let burn rate guide your deployment cadence
❌ DON'T
- Ignore your budget - It exists for a reason
- Save the entire budget - You're leaving productivity on the table
- Wait until 100% burned - Start focusing on stability at ~70%
- Blame ops for missing SLO - It's a team responsibility
- Set SLO too high - Be realistic about your infrastructure
🚀 Advanced Usage
Multi-Service Strategy
If you have multiple services:
| Service | SLO | Budget/Month | Typical Burn Rate |
|---|---|---|---|
| API | 99.99% | 4.38 min | Slow (1-2%) |
| Web UI | 99.9% | 43.8 min | Medium (5%) |
| Background Jobs | 99% | 7.2 hrs | Fast (8-10%) |
Monthly Planning
Week 1: Review last month's burn
- Did we miss SLO? Why?
- What caused outages?
- Start month fresh
Week 2-3: Aggressive deployment phase
- Burn rate is low
- Deploy new features safely
- Run experiments
Week 4: Stability phase
- Burn rate is rising
- Focus on bug fixes
- Stabilize system
📋 Implementation Checklist
🔗 Related Concepts
SLO (Service Level Objective)
The target (e.g., 99.9%)
SLI (Service Level Indicator)
The measurement (e.g., request success rate)
SLA (Service Level Agreement)
The contract (e.g., refund if we miss SLO)
MTTD (Mean Time To Detection)
How fast we notice problems
MTTR (Mean Time To Recovery)
How fast we fix them
💬 Common Questions
Q: Should we always target 99.99%?
A: No. Higher SLO = more operational burden. Target what your customers need, not what's theoretically possible.
Q: What if we never hit our SLO?
A: Either your SLO is too strict, or you need more engineering effort. Consider lowering the SLO or investing in reliability.
Q: Can we use error budget to skip incidents?
A: No. Incidents still happen. Error budget lets you decide when to deploy risky changes, not when to ignore problems.
Q: How do we track burn rate?
A: Calculate: (Downtime This Month / Monthly Budget) × 100. Display it on dashboards, Slack, or status pages.
Q: What's a "good" burn rate?
A: Ideally 0-5% per month. This means you have room to experiment while maintaining reliability.
Last Updated: February 2026
Created For: SRE Teams & Engineering Leaders
Status: Ready to Use