Hash Maps // CLANKA DSA

Here's a problem: you have an array of a million user IDs and you need to check whether ID "usr_847291" exists. With an array, that's a linear scan — O(n). A million comparisons for one lookup.

Now imagine you could jump directly to where that ID should be, in one step. What would you need to make that possible?

Think about it before reading on. What information would you need to compute a location from a key?

The Trick

A function. One that takes any key and outputs an integer — an index into an array. That function is called a hash function, and the structure built around it is a hash map.

index = hash(key) % buckets.length

Hash the key. Mod by the array size. Store there. Lookup follows the same path. O(1) average.

But here's where the intuition breaks: what happens when two different keys hash to the same index?

Collisions

This is the question that separates "I've used a hash map" from "I understand a hash map." Two keys, same bucket. Now what?

Chaining: The bucket holds a linked list. Multiple key-value pairs coexist at the same index. Lookup scans the list. This is what JavaScript's V8 engine and Python's CPython actually use. Open addressing: On collision, probe forward through the array until an empty slot appears. More cache-friendly, but degrades badly under high load.

With a good hash function and a load factor under 0.75, collisions are rare and chains stay short. O(1) average holds. But a terrible hash function — one that maps everything to the same bucket — turns your hash map into a linked list. O(n) worst case. The data structure is only as good as the function.

Here's a question: if the worst case is O(n), why do we say hash maps are O(1)? What assumption are we making, and when does it break?

The Space-Time Trade-off

Hash maps are O(n) extra memory. Every lookup you make costs space proportional to the data you stored. This is the purest form of a principle you'll use constantly: burn memory to save time.

Is it worth it? In almost every interview context, yes. Memory is cheap; time complexity is what interviewers grade. But sit with the question — can you think of a scenario where O(n) extra space is actually the bottleneck?

Frequency Counting

Before I show you the pattern, try this: given two strings, determine if they're anagrams. What's your approach?

Most people's first instinct is to sort both strings and compare. That's O(n log n). Can you do it in O(n)?

function isAnagram(s: string, t: string): boolean {
  if (s.length !== t.length) return false;
  const count = new Map<string, number>();
  for (const c of s) count.set(c, (count.get(c) ?? 0) + 1);
  for (const c of t) {
    if (!count.has(c)) return false;
    count.set(c, count.get(c)! - 1);
    if (count.get(c)! < 0) return false;
  }
  return true;
}

Count characters in the first string. Decrement for the second. If anything goes negative or is missing, not an anagram. One pass each. O(n).

The pattern — count in the first pass, validate in the second — is the bread and butter of hash map problems. It solves anagrams, duplicate detection, two-sum variants, and anything where the core question is "have I seen this before?"

The Reflex

After enough problems, reaching for a hash map becomes instinctive. But instinct without understanding is dangerous. Before you new Map(), ask yourself:

- What am I using as the key? Is it hashable? - What am I storing as the value? Counts? Indices? Lists? - Am I trading O(n) space to eliminate an O(n) inner loop? - Could a sorted array + binary search work here with O(1) space?

That last question is the one most people skip. A hash map is usually right. But knowing why it's right — and when it's not — is what makes the difference.

Reflection

Without looking up: explain why hash map lookup is O(1) average but O(n) worst case. What determines which one you get? If you can explain the role of the hash function, load factor, and collision strategy in one breath, you own this topic.