Back to all topics
system-design

Database Indexes

The difference between a 5ms query and a 5-second query that crashes your app.

Tiny Summary

Database indexes are like a book's index — instead of reading every page to find a word, you look it up instantly. Without indexes, databases scan every row (slow). With indexes, they jump directly to matches (fast).


The Problem

You have 1 million users. Someone logs in with email user@example.com.

Without index on email:

Database reads row 1: email = "a@test.com" → no match
Database reads row 2: email = "b@test.com" → no match
Database reads row 3: email = "c@test.com" → no match
... (reads ALL 1 million rows)
Database reads row 847,234: email = "user@example.com" → MATCH!

Time: 5 seconds (full table scan)

With index on email:

Database checks index → jumps to row 847,234
Database reads row 847,234: email = "user@example.com" → MATCH!

Time: 5 milliseconds (index lookup)

That's 1000x faster.

How Indexes Work

An index is a separate data structure (usually a B-tree) that maintains a sorted copy of a column.

Without index (Table Scan):

Users Table (unsorted):
ID    Email
1     zack@test.com
2     alice@test.com
3     mike@test.com
... (scan all rows to find anything)

With index (B-Tree):

Email Index (sorted):
alice@test.com  → Row 2
mike@test.com   → Row 3
zack@test.com   → Row 1

Query for mike@test.com:
1. Binary search index → finds "mike@test.com" instantly
2. Index says "Row 3"
3. Jump to Row 3 in table
4. Return data

Lookups: O(log n) instead of O(n)

When to Add Indexes

Index columns you query in WHERE clauses:

-- This query scans full table without index:
SELECT * FROM users WHERE email = 'user@example.com';

-- Add index:
CREATE INDEX idx_users_email ON users(email);

-- Now query is instant

Common columns to index:

  • Primary keys (auto-indexed)
  • Foreign keys (user_id, product_id, etc.)
  • Email addresses (for login)
  • Timestamps (created_at for sorting/filtering)
  • Status fields (WHERE status = 'active')
  • Any column used in JOIN conditions

The Tradeoff

Indexes aren't free. They cost disk space and slow down writes.

Reads (Queries):

  • Without index: Slow ❌
  • With index: Fast ✅

Writes (INSERT/UPDATE/DELETE):

  • Without index: Fast ✅
  • With index: Slower ❌ (must update index too)

Rule of thumb:

  • High read, low write → index heavily
  • High write, low read → index sparingly
  • Most apps read >> write → index common queries

Real Performance Impact

Scenario: User login

1M users, no index on email:

  • Query time: 3-5 seconds
  • Timeouts on every login
  • App feels broken

1M users, with index on email:

  • Query time: 3-5 milliseconds
  • Instant login
  • Happy users

At scale:

  • 10M rows without index: Query takes 30+ seconds (or crashes)
  • 10M rows with index: Query still takes 5ms

Indexes scale logarithmically. Table scans scale linearly. Indexes win.


Composite Indexes

Index multiple columns together for complex queries.

-- Query:
SELECT * FROM orders
WHERE user_id = 123
  AND status = 'pending'
ORDER BY created_at DESC;

-- Needs composite index:
CREATE INDEX idx_orders_user_status_date
ON orders(user_id, status, created_at);

Column order matters:

  • Index (user_id, status, created_at) helps queries filtering by:
    • user_id alone ✅
    • user_id + status ✅
    • user_id + status + created_at ✅
    • status alone ❌ (index not used)

Left-to-right rule: Index columns in order of query filters.


Common Mistakes

No indexes on foreign keys

-- Slow without index:
SELECT * FROM orders WHERE user_id = 123;

-- Always index foreign keys:
CREATE INDEX idx_orders_user_id ON orders(user_id);

Indexing low-cardinality columns

Don't index boolean flags or status fields with only 2-3 values:

  • is_active (only true/false) → bad index candidate
  • status (only 'pending', 'completed', 'failed') → bad index
  • email (millions of unique values) → great index

Over-indexing

Don't create an index on every column. Each index:

  • Uses disk space
  • Slows down writes
  • Costs memory

Index only what you actually query.

Not monitoring slow queries

Enable slow query logs. Find queries taking over 100ms. Add indexes for them.


How to Find Missing Indexes

PostgreSQL:

-- Find queries doing table scans:
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'test@example.com';

-- Look for "Seq Scan" → needs index
-- Look for "Index Scan" → has index ✅

MySQL:

-- Enable slow query log:
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 0.1; -- log queries over 100ms

-- Check execution plan:
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';

-- Look for "type: ALL" → full table scan, needs index
-- Look for "type: ref" → using index ✅

Index Maintenance

Indexes need occasional rebuilding:

Over time, indexes get fragmented (data inserts/updates create gaps).

-- PostgreSQL:
REINDEX INDEX idx_users_email;

-- MySQL:
OPTIMIZE TABLE users;

When to rebuild:

  • Database gets slow over time
  • After bulk imports
  • Major schema changes

Key Insights

Indexes are the easiest performance win. Adding the right index can make queries 1000x faster.

Most production slowness comes from missing indexes. Before adding servers, check slow query logs and add indexes.

Don't guess — measure. Use EXPLAIN to see if queries use indexes. Index columns you actually query, not every column.

Use the simulation to see how indexes transform query performance as data grows!