Rate Limiting
Stop attackers from bankrupting your API bill and crashing your servers.
Tiny Summary
Rate limiting restricts how many requests a user/IP can make in a time window. It prevents abuse, protects infrastructure, and controls costs. Without it, one attacker (or buggy client) can take down your entire service.
The Problem
Without rate limiting:
A bot hits your API 10,000 times per second:
- Your servers melt from load
- Your database crashes
- Legitimate users can't access the app
- Your AWS bill explodes to $50,000
- Your business is offline
With rate limiting:
Same bot hits your API:
- First 100 requests/minute: ✅ Accepted
- Remaining 9,900 requests: ❌ Rejected (429 Too Many Requests)
- Your servers handle normal load
- Legitimate users unaffected
- AWS bill stays normal
- Your business stays online
Real Attacks Prevented
Credential stuffing:
- Attacker tries 1M username/password combos
- Without rate limit: All attempts go through, accounts get compromised
- With rate limit: 10 attempts per IP per minute, attack useless
API abuse:
- Someone scrapes your entire product catalog
- Without rate limit: They download everything, resell your data
- With rate limit: Can only fetch 100 products/hour, scraping impractical
DDOS:
- Botnet sends 100k requests per second
- Without rate limit: Servers crash, site goes down
- With rate limit: Excess requests rejected instantly, site stays up
Cost control:
- Your mobile app has a bug, makes infinite API calls
- Without rate limit: $10k AWS bill from one user's broken app
- With rate limit: User hits limit, you get alerted, fix bug before damage
Rate Limiting Strategies
1. Fixed Window
Simple but has edge case issues.
Time windows: 10:00-10:01, 10:01-10:02, etc.
Limit: 100 requests per minute
10:00:00 - 10:00:59: 100 requests ✅
10:01:00 - 10:01:59: 100 requests ✅
Edge case problem:
10:00:30 - 10:00:59: 100 requests ✅
10:01:00 - 10:01:29: 100 requests ✅
= 200 requests in 60 seconds (double the limit!)
2. Sliding Window
More accurate, prevents edge case.
Limit: 100 requests per 60-second window
Track timestamps of all requests.
At any moment, count requests in last 60 seconds.
10:00:30: Check requests since 09:59:30
10:01:00: Check requests since 10:00:00
No edge case — always enforces true rate.
3. Token Bucket
Most flexible, allows bursts.
Bucket holds 100 tokens, refills at 10 tokens/second.
Request comes in:
- Has token? → Accept, remove token
- No token? → Reject (429)
Allows bursts (use 50 tokens instantly if available)
Then throttles to refill rate (10/sec)
4. Leaky Bucket
Smooths out traffic, no bursts.
Requests go into queue (bucket).
Leak out at fixed rate (10/sec).
If bucket full → reject new requests
Guarantees smooth output rate.
Use when you need predictable load.
Implementation
Where to implement:
- Application level (easy, per-server limits)
- API Gateway (AWS API Gateway, Kong, Nginx)
- CDN/Proxy (Cloudflare, Fastly — best for DDOS)
What to limit by:
- IP address (prevent one attacker from spamming)
- User ID (logged-in users, fair usage)
- API key (third-party integrations)
- Endpoint (expensive endpoints get lower limits)
Common limits:
- Anonymous users: 10 requests/minute
- Logged-in users: 100 requests/minute
- Premium users: 1000 requests/minute
- Admin endpoints: 5 requests/minute (more sensitive)
Response to Rate Limit
Return 429 status:
HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1642345678
{
"error": "Rate limit exceeded. Try again in 60 seconds."
}
Headers to include:
X-RateLimit-Limit: Total allowed requestsX-RateLimit-Remaining: Requests left in windowX-RateLimit-Reset: Timestamp when limit resetsRetry-After: Seconds until they can retry
Layered Rate Limiting
Different limits for different severity.
Layer 1: Global (per IP)
- 1000 requests/minute across all endpoints
- Catches broad attacks
Layer 2: Per Endpoint
- Login: 5 attempts/minute (prevent brute force)
- Search: 100 requests/minute
- Create order: 10 requests/minute
- Profile view: 200 requests/minute
Layer 3: Per User
- Free tier: 100 API calls/day
- Pro tier: 10,000 API calls/day
- Enterprise: Unlimited
Real-World Example
Stripe API:
Rate limits:
- Default: 100 requests/second
- Bursts: Up to 1000 requests in short window
- Per endpoint: Some endpoints have lower limits
- Response headers show remaining quota
Twitter API:
Rate limits:
- 15 requests per 15-minute window (user timeline)
- 900 requests per 15-minute window (search)
- Different limits for different tiers (free vs. paid)
Monitoring and Alerting
Track these metrics:
- Requests hitting rate limit (high % = too restrictive or under attack)
- Top IPs by request volume (identify attackers or bugs)
- Rejected requests per endpoint (tune limits)
Alert when:
-
10% of requests hitting rate limit (maybe under attack)
- Single IP hits limit repeatedly (likely attacker, block at firewall)
- Legitimate users complaining (limits too strict, need to raise)
Common Mistakes
No rate limiting at all
"We're too small to be attacked" → Wrong. Bots scan all IPs, you'll be hit eventually.
Limits too strict
Legitimate users hit limits → bad experience → churn
Limits too loose
Attackers still get through → defeats the purpose
Not differentiating users
Treat premium customers same as anonymous users → premium customers annoyed
Counting failed requests
If login fails (bad password), don't count against limit → allows unlimited password attempts
Actually, DO count failed logins against limit → prevents brute force!
Key Insights
Rate limiting is mandatory for production apps. Without it, you're one attack away from a $50k AWS bill or total downtime.
Start conservative, monitor, adjust. Better to have limits too strict and raise them than too loose and get exploited.
Layer your limits: global (prevent DDOS), per-endpoint (protect sensitive operations), per-user (fair usage).
Use the simulation to see how different rate limiting strategies perform under attack!