algorithms

Hash Map

How constant time O(1) lookups actually work under the hood.

Tiny Summary

Hash maps provide O(1) average-time inserts and lookups by using a hash function to map keys directly to array indices.

Collisions can occur when multiple keys map to the same index, and performance degrades if the hash function is poor or the table becomes too crowded.

How a Hash Map Actually Works

At its core, a Hash Map is just an array with a clever hashing/indexing trick.

How it works:

You have an array of some size (maybe 10 slots available)
To insert "apple", you run "apple" through a hash function (converts the "key" to a number the same each time)
Hash function outputs a number (the hashcode): hash("apple") = 47823
Then take the hashcode and perform a mod (%) by the size of the array (to find which index in the array to store it): 47823 % 10 = 3
Now we store the value at array[3]

Whate did above:

key = "apple"
hashcode = hash("apple") = 47823
index = 47823 % 10 = 3
array[3] = ("apple", 5)

This is why lookups are O(1) - you compute the index directly from the key (with hash function). No searching through the array needed.

One Problem That Happens: Collisions

What happens when two keys hash to the same index?

hash("apple") = 47823 → index 3
hash("grape") = 10003 → index 3  // Collision!

Both want array[3]. Now what?

Collisions are inevitable because:

Pigeonhole Principle: More possible keys than buckets means some must share
Birthday paradox: With just 23 people, there's a 50% chance two share a birthday

Two Collision Resolution Strategies

1. Separate Chaining

Each array slot stores a linked list of all items that hash to it.

array[3]: → ("apple", 5) → ("grape", 2) → null
array[5]: → ("banana", 7) → null

Lookup: Hash the key → jump to array index → walk the linked list

Performance:

Best: O(1) - key is first in chain
Worst: O(n) - all keys collide into one bucket
Average: O(1 + α) where α = items per bucket

Typically, Hash Maps/Tables resize when α gets too big, so you can think on average that HashMaps inserts and lookups are O(1).

2. Linear Probing

When a collision happens, probe forward to find the next empty slot.

Try array[hash(key)]
If full, try array[hash(key) + 1]
If full, try array[hash(key) + 2]
... until you find empty space

Lookup: Hash the key → check array index → if wrong key, keep probing

Performance:

Best: O(1) - key in ideal position
Worst: O(n) - probe through many slots
Gets worse as table fills up (clustering)

Load Factor & Resizing

Load Factor (α) = items / buckets (how full the Hash Map/Table is on average, lower is faster, but could take more memory to make it (a) lower).

When α hits ~0.75, performance degrades:

Chaining: longer linked lists
Linear probing: more collisions, more probing

Solution: Double the array size and rehash everything (resizing).

8 buckets → 16 buckets
Rehash all items into new positions

This take O(n) time and extra space, but happens rarely, so it amortizes to O(1) per insert.