Error Budget Calculator

Calculate and understand your service's error budget

🧮 Calculate Your Error Budget

SLO Target Progress
99.9%
Allowed Downtime
43.8
minutes/month
In Hours
0.73
hours/month
Failure Rate
0.1%
per month

Calculate Burn Rate

Enter actual downtime this month to calculate your burn rate

🎯 What is an Error Budget?

An error budget is the acceptable amount of downtime your service can have while still meeting your Service Level Objective (SLO). It answers the question: "How much failure can we afford?"

Key Concept

  • SLO = 99.9% means you can afford 43.8 minutes of downtime per month
  • This is your "budget" to spend on experiments, deployments, and maintenance
  • When you've used up your budget, focus on stability instead of new features

📊 Understanding the Calculator

Main Components

1. SLO Target Selection

Choose from preset SLO targets:

  • 99% - Large consumer services (okay with ~7 hours downtime/month)
  • 99.5% - Most enterprise services
  • 99.9% - Critical business services
  • 99.95% - Very critical services
  • 99.99% - Ultra-critical infrastructure

2. Key Metrics Displayed

Metric What It Means Use Case
Allowed Downtime Minutes you can be down per month Budget planning
In Hours Same metric in hours Planning maintenance windows
Failure Rate Percentage tolerance per month Understanding error rate

3. Progress Bar

Visual representation of your SLO target (100% = perfect uptime)

🔥 Burn Rate: The Critical Metric

Burn Rate = How fast you're consuming your error budget

Burn Rate Categories

Burn Rate What It Means Action
< 2% Slow burn (healthy) ✅ Safe to deploy and experiment
2-5% Medium burn (caution) ⚠️ Be thoughtful about deployments
5% to < 10% Fast burn (alert) 🔴 Minimize risky changes
≥ 10% Critical (emergency) 🚨 Focus on stability only

How to Calculate Burn Rate

Monthly Burn Rate = (Minutes Down This Month / Total Monthly Budget) × 100
Example:
  • Your SLO: 99.9% = 43.8 minutes/month budget
  • Downtime this month: 5 minutes
  • Burn Rate: (5 / 43.8) × 100 = 11.4% (Critical Burn - focus on stability!)

💡 Using Error Budgets for Deployment Decisions

Decision Framework

Is Burn Rate Low? (< 5%)
├─ YES → Deploy frequently, experiment safely
└─ NO → Hold off on risky changes, focus on stability

Practical Examples

Scenario 1: Healthy Budget

  • SLO: 99.9% (43.8 min/month)
  • Downtime this month: 2 minutes
  • Burn Rate: 4.6% (Medium Burn)
  • Decision: ✅ Okay to deploy, but be careful

Scenario 2: Budget Exhausted

  • SLO: 99.9% (43.8 min/month)
  • Downtime this month: 40 minutes
  • Burn Rate: 91% (Critical Burn)
  • Decision: 🚨 Stop deployments, focus on stability

Scenario 3: Budget Warning

  • SLO: 99.99% (4.38 min/month)
  • Downtime this month: 0.5 minutes
  • Burn Rate: 11% (Critical Burn)
  • Decision: ⚠️ Reduce deployment frequency

🧮 The Math Behind It

Calculate Downtime from SLO

Error Rate = 100 - SLO
Minutes Per Month = (Error Rate / 100) × (365 × 24 × 60) / 12
Example: 99.9% SLO
  • Error Rate: 100 - 99.9 = 0.1%
  • Minutes Per Month: (0.1 / 100) × 525600 / 12 = 43.8 minutes

Common SLO Reference Table

SLO Per Year Per Month Per Week Per Day
99% 3.65 days 7.3 hours 1.7 hours 1.44 min
99.5% 1.83 days 3.65 hours 51 min 43 sec
99.9% 8.76 hours 43.8 min 10 min 86 sec
99.95% 4.38 hours 21.9 min 5 min 43 sec
99.99% 52.6 min 4.38 min 61 sec 8.6 sec
99.999% 5.26 min 26.3 sec 6 sec 0.86 sec

🎓 Best Practices

✅ DO

  • Track burn rate daily - Make it visible to the team
  • Use budget wisely - Spend on meaningful improvements
  • Reset monthly - Start fresh each month
  • Communicate openly - Share burn rate status with stakeholders
  • Link to deployment decisions - Let burn rate guide your deployment cadence

❌ DON'T

  • Ignore your budget - It exists for a reason
  • Save the entire budget - You're leaving productivity on the table
  • Wait until 100% burned - Start focusing on stability at ~70%
  • Blame ops for missing SLO - It's a team responsibility
  • Set SLO too high - Be realistic about your infrastructure

🚀 Advanced Usage

Multi-Service Strategy

If you have multiple services:

Service SLO Budget/Month Typical Burn Rate
API 99.99% 4.38 min Slow (1-2%)
Web UI 99.9% 43.8 min Medium (5%)
Background Jobs 99% 7.2 hrs Fast (8-10%)

Monthly Planning

Week 1: Review last month's burn

  • Did we miss SLO? Why?
  • What caused outages?
  • Start month fresh

Week 2-3: Aggressive deployment phase

  • Burn rate is low
  • Deploy new features safely
  • Run experiments

Week 4: Stability phase

  • Burn rate is rising
  • Focus on bug fixes
  • Stabilize system

📋 Implementation Checklist

🔗 Related Concepts

SLO (Service Level Objective)

The target (e.g., 99.9%)

SLI (Service Level Indicator)

The measurement (e.g., request success rate)

SLA (Service Level Agreement)

The contract (e.g., refund if we miss SLO)

MTTD (Mean Time To Detection)

How fast we notice problems

MTTR (Mean Time To Recovery)

How fast we fix them

💬 Common Questions

Q: Should we always target 99.99%?

A: No. Higher SLO = more operational burden. Target what your customers need, not what's theoretically possible.

Q: What if we never hit our SLO?

A: Either your SLO is too strict, or you need more engineering effort. Consider lowering the SLO or investing in reliability.

Q: Can we use error budget to skip incidents?

A: No. Incidents still happen. Error budget lets you decide when to deploy risky changes, not when to ignore problems.

Q: How do we track burn rate?

A: Calculate: (Downtime This Month / Monthly Budget) × 100. Display it on dashboards, Slack, or status pages.

Q: What's a "good" burn rate?

A: Ideally 0-5% per month. This means you have room to experiment while maintaining reliability.

Last Updated: February 2026

Created For: SRE Teams & Engineering Leaders

Status: Ready to Use