Back to all topics
system-design

Rate Limiting

Stop attackers from bankrupting your API bill and crashing your servers.

Tiny Summary

Rate limiting restricts how many requests a user/IP can make in a time window. It prevents abuse, protects infrastructure, and controls costs. Without it, one attacker (or buggy client) can take down your entire service.


The Problem

Without rate limiting:

A bot hits your API 10,000 times per second:

  • Your servers melt from load
  • Your database crashes
  • Legitimate users can't access the app
  • Your AWS bill explodes to $50,000
  • Your business is offline

With rate limiting:

Same bot hits your API:

  • First 100 requests/minute: ✅ Accepted
  • Remaining 9,900 requests: ❌ Rejected (429 Too Many Requests)
  • Your servers handle normal load
  • Legitimate users unaffected
  • AWS bill stays normal
  • Your business stays online

Real Attacks Prevented

Credential stuffing:

  • Attacker tries 1M username/password combos
  • Without rate limit: All attempts go through, accounts get compromised
  • With rate limit: 10 attempts per IP per minute, attack useless

API abuse:

  • Someone scrapes your entire product catalog
  • Without rate limit: They download everything, resell your data
  • With rate limit: Can only fetch 100 products/hour, scraping impractical

DDOS:

  • Botnet sends 100k requests per second
  • Without rate limit: Servers crash, site goes down
  • With rate limit: Excess requests rejected instantly, site stays up

Cost control:

  • Your mobile app has a bug, makes infinite API calls
  • Without rate limit: $10k AWS bill from one user's broken app
  • With rate limit: User hits limit, you get alerted, fix bug before damage

Rate Limiting Strategies

1. Fixed Window

Simple but has edge case issues.

Time windows: 10:00-10:01, 10:01-10:02, etc.
Limit: 100 requests per minute

10:00:00 - 10:00:59: 100 requests ✅
10:01:00 - 10:01:59: 100 requests ✅

Edge case problem:
10:00:30 - 10:00:59: 100 requests ✅
10:01:00 - 10:01:29: 100 requests ✅
= 200 requests in 60 seconds (double the limit!)

2. Sliding Window

More accurate, prevents edge case.

Limit: 100 requests per 60-second window

Track timestamps of all requests.
At any moment, count requests in last 60 seconds.

10:00:30: Check requests since 09:59:30
10:01:00: Check requests since 10:00:00

No edge case — always enforces true rate.

3. Token Bucket

Most flexible, allows bursts.

Bucket holds 100 tokens, refills at 10 tokens/second.

Request comes in:
- Has token? → Accept, remove token
- No token? → Reject (429)

Allows bursts (use 50 tokens instantly if available)
Then throttles to refill rate (10/sec)

4. Leaky Bucket

Smooths out traffic, no bursts.

Requests go into queue (bucket).
Leak out at fixed rate (10/sec).

If bucket full → reject new requests

Guarantees smooth output rate.
Use when you need predictable load.

Implementation

Where to implement:

  • Application level (easy, per-server limits)
  • API Gateway (AWS API Gateway, Kong, Nginx)
  • CDN/Proxy (Cloudflare, Fastly — best for DDOS)

What to limit by:

  • IP address (prevent one attacker from spamming)
  • User ID (logged-in users, fair usage)
  • API key (third-party integrations)
  • Endpoint (expensive endpoints get lower limits)

Common limits:

  • Anonymous users: 10 requests/minute
  • Logged-in users: 100 requests/minute
  • Premium users: 1000 requests/minute
  • Admin endpoints: 5 requests/minute (more sensitive)

Response to Rate Limit

Return 429 status:

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1642345678

{
  "error": "Rate limit exceeded. Try again in 60 seconds."
}

Headers to include:

  • X-RateLimit-Limit: Total allowed requests
  • X-RateLimit-Remaining: Requests left in window
  • X-RateLimit-Reset: Timestamp when limit resets
  • Retry-After: Seconds until they can retry

Layered Rate Limiting

Different limits for different severity.

Layer 1: Global (per IP)

  • 1000 requests/minute across all endpoints
  • Catches broad attacks

Layer 2: Per Endpoint

  • Login: 5 attempts/minute (prevent brute force)
  • Search: 100 requests/minute
  • Create order: 10 requests/minute
  • Profile view: 200 requests/minute

Layer 3: Per User

  • Free tier: 100 API calls/day
  • Pro tier: 10,000 API calls/day
  • Enterprise: Unlimited

Real-World Example

Stripe API:

Rate limits:
- Default: 100 requests/second
- Bursts: Up to 1000 requests in short window
- Per endpoint: Some endpoints have lower limits
- Response headers show remaining quota

Twitter API:

Rate limits:
- 15 requests per 15-minute window (user timeline)
- 900 requests per 15-minute window (search)
- Different limits for different tiers (free vs. paid)

Monitoring and Alerting

Track these metrics:

  • Requests hitting rate limit (high % = too restrictive or under attack)
  • Top IPs by request volume (identify attackers or bugs)
  • Rejected requests per endpoint (tune limits)

Alert when:

  • 10% of requests hitting rate limit (maybe under attack)

  • Single IP hits limit repeatedly (likely attacker, block at firewall)
  • Legitimate users complaining (limits too strict, need to raise)

Common Mistakes

No rate limiting at all

"We're too small to be attacked" → Wrong. Bots scan all IPs, you'll be hit eventually.

Limits too strict

Legitimate users hit limits → bad experience → churn

Limits too loose

Attackers still get through → defeats the purpose

Not differentiating users

Treat premium customers same as anonymous users → premium customers annoyed

Counting failed requests

If login fails (bad password), don't count against limit → allows unlimited password attempts

Actually, DO count failed logins against limit → prevents brute force!


Key Insights

Rate limiting is mandatory for production apps. Without it, you're one attack away from a $50k AWS bill or total downtime.

Start conservative, monitor, adjust. Better to have limits too strict and raise them than too loose and get exploited.

Layer your limits: global (prevent DDOS), per-endpoint (protect sensitive operations), per-user (fair usage).

Use the simulation to see how different rate limiting strategies perform under attack!