Circuit Breaker Pattern
Stop calling failing services before they take down your entire system.
Tiny Summary
Circuit Breaker automatically stops calling a failing service to prevent cascading failures. Like a house circuit breaker, it "trips" when detecting failures, gives the system time to recover, then carefully tests if it's safe to resume.
The Problem
Your app calls a payment processor. The processor goes down. What happens next?
Without Circuit Breaker:
- Request to payment processor times out (30 seconds)
- Your server waits, blocking a thread
- More requests come in, all wait
- All threads blocked, your server can't handle ANY requests
- Your entire app is down because ONE dependency failed
This is a cascading failure. One service kills everything.
How Circuit Breakers Work
Three states: Closed (working), Open (failing), Half-Open (testing recovery).
Normal Operation (Closed)
→ Failures exceed threshold
→ Open (reject requests immediately)
→ Wait timeout period
→ Half-Open (try 1 request)
→ Success? → Closed
→ Failure? → Open
Closed (Normal):
- Requests flow through normally
- Track failures (last 100 requests, 50% failure rate? Trip!)
Open (Failing):
- Immediately reject requests without calling service
- Fail fast (return error in milliseconds, not 30 second timeout)
- Give failing service time to recover
Half-Open (Testing):
- After timeout (e.g., 60 seconds), try ONE request
- Success → back to Closed, resume normal traffic
- Failure → back to Open, wait longer
Real Example
Stripe payment processing:
10:00 AM - Circuit Closed
Requests: ✅✅✅✅✅ (all succeed)
10:15 AM - Stripe starts failing
Requests: ✅✅❌❌❌❌❌ (50% failure rate)
Circuit trips → OPEN
10:15-10:16 AM - Circuit Open
All payment requests instantly rejected with friendly error:
"Payment processing temporarily unavailable, try again in a moment"
(Your app stays up, users can browse catalog)
10:16 AM - Circuit tries Half-Open
Test request: ✅ Success!
Circuit → CLOSED, resume normal operation
The difference:
- Without breaker: 30 sec timeouts, all threads blocked, site down
- With breaker: Instant failures, users see error but app works, site up
Configuration
Failure threshold: 50% of requests failing in last 100 requests → trip
Timeout period: 60 seconds before trying Half-Open
Success threshold: 1 successful request in Half-Open → close circuit
What counts as failure:
- Timeouts (request took over 5 seconds)
- 500 errors from service
- Network errors
- NOT 400 errors (those are client errors, not service failures)
When to Use
Essential for:
- External API calls (payment processors, email services, SMS)
- Microservice communication (Service A calls Service B)
- Database queries (if database struggles, fail fast)
Not needed for:
- Reading from cache (already fast)
- Local computations (no network involved)
- Critical operations that MUST be attempted (use retries instead)
Graceful Degradation
Circuit breaker enables fallback behavior.
Payment processing down:
- Don't: Block checkout completely
- Do: Queue payment for later, email invoice, let user continue
Recommendation engine down:
- Don't: Show error on homepage
- Do: Show popular items instead of personalized recommendations
Email service down:
- Don't: Fail user signup
- Do: Create account, queue welcome email for later
Users don't notice the degradation — they get a working app with slightly reduced features.
Implementation
Popular libraries:
- Node.js: opossum
- Python: pybreaker
- Java: Resilience4j, Hystrix
- Go: gobreaker
Basic setup (Node.js):
const CircuitBreaker = require('opossum');
const breaker = new CircuitBreaker(callStripeAPI, {
timeout: 5000, // 5 second timeout
errorThreshold: 50, // Trip at 50% errors
resetTimeout: 60000 // Try again after 60 seconds
});
breaker.on('open', () => {
console.log('Circuit opened! Stripe is down.');
alertTeam('Payment processing degraded');
});
breaker.on('close', () => {
console.log('Circuit closed! Stripe recovered.');
});
Common Mistakes
Opening circuit too aggressively
Don't trip on 1 error. Allow some failures (networks are unreliable). Trip at sustained failure rates (50% over 100 requests).
Not monitoring circuit state
If your circuit is open, you need to know. Alert on state changes. Log to metrics.
No fallback behavior
Circuit opens, requests fail... then what? Always have a fallback: cached data, degraded features, queue for later.
Using retries AND circuit breaker without coordination
Retries fight circuit breakers. If circuit is open, don't retry — accept the failure and use fallback.
Key Insights
Circuit breakers prevent your dependencies from taking down your app. One failing service shouldn't kill everything.
Fail fast is better than slow timeouts. Users prefer a quick error over a 30-second loading spinner.
Monitor circuit state — if circuits are frequently open, your dependencies are unreliable. Fix or find alternatives.
Use the simulation to see how circuit breakers prevent cascading failures!