Hash Maps & Sets
Leverage constant-time lookups for frequency counting and duplicate detection
Introduction: Why Hash Maps & Sets Are Essential for Array/String Problems
You've just submitted your solution to a LeetCode problem. It works perfectly on the sample test casesโyou feel that rush of accomplishment. Then you click "Submit," and your heart sinks: Time Limit Exceeded. Your solution works, but it's too slow for the larger test cases with thousands or millions of elements. Sound familiar?
This is the exact moment where understanding hash maps and hash sets transforms you from someone who can solve problems to someone who can solve them efficiently. These data structures are the secret weapon that professional developers use to optimize solutions, and mastering them will dramatically improve your performance on coding interviews. In this lesson, you'll discover why these structures are essential, and you'll get free flashcards to help cement these concepts in your memory.
But first, let me ask you a question: What if I told you that the difference between a solution that takes 3 hours to run and one that completes in 3 seconds often comes down to a single strategic choiceโreplacing a nested loop with a hash map lookup?
The Performance Problem: When Nested Loops Betray You
Let's start with a problem you've probably encountered: finding if an array contains duplicate values. Your first instinct might be something like this:
def contains_duplicate(nums):
# Compare each element with every other element
for i in range(len(nums)):
for j in range(i + 1, len(nums)):
if nums[i] == nums[j]:
return True
return False
## Test it out
print(contains_duplicate([1, 2, 3, 4, 5])) # False
print(contains_duplicate([1, 2, 3, 1])) # True
This solution is intuitive and correct. It works beautifully for small arrays. But here's where things get ugly: this approach has O(nยฒ) time complexity. What does that actually mean in practice?
๐ก Real-World Example: Imagine you're building a feature for a social media platform that needs to check for duplicate usernames in a database. With 100 users, your nested loop performs 10,000 comparisons (100ยฒ). Not too bad. With 10,000 users? That's 100,000,000 comparisons. With a million users? You're looking at 1,000,000,000,000 comparisonsโa trillion operations. Your server would grind to a halt.
The problem with nested loops is that they scale quadratically. Double your input size, and your runtime quadruples. This is why LeetCode will hit you with that dreaded "Time Limit Exceeded" error. The test cases deliberately include large datasets to separate solutions that merely work from solutions that work efficiently.
๐ค Did you know? Many developers can solve 60-70% of LeetCode problems but struggle with the remaining 30% simply because they don't recognize when to apply hash-based optimizations. This one skill can be the difference between landing your dream job and another rejection email.
The Hash Table Revolution: O(1) Lookup Changes Everything
Now, let me show you the same problem solved with a hash set:
def contains_duplicate_optimized(nums):
seen = set() # Create an empty hash set
for num in nums:
# Check if we've seen this number before (O(1) operation)
if num in seen:
return True
# Add the number to our set (O(1) operation)
seen.add(num)
return False
## Same tests, dramatically faster execution
print(contains_duplicate_optimized([1, 2, 3, 4, 5])) # False
print(contains_duplicate_optimized([1, 2, 3, 1])) # True
This solution has O(n) time complexity. We make a single pass through the array, and for each element, we perform constant-time operations. With a million users, instead of a trillion comparisons, we make just one million lookups. That's the difference between a solution that never finishes and one that completes in milliseconds.
๐ฏ Key Principle: Hash-based data structures trade space for time. By using extra memory to store a hash table, we achieve near-instantaneous lookups that would otherwise require scanning through entire collections.
But how does this magic work? At their core, hash maps (also called dictionaries or hash tables) and hash sets use a hash function to convert your data into an index position in an underlying array. When you want to check if an element exists, the hash function computes where it would be stored, and you can check that exact locationโno scanning required.
Here's a simplified mental model:
Imagine a library with 1000 books arranged randomly on shelves.
Without hash tables (nested loop approach):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ To find "Moby Dick": โ
โ โ Check shelf 1, book by book โ
โ โ Check shelf 2, book by book โ
โ โ Check shelf 3, book by book โ
โ โ ... potentially check all 1000 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Time: O(n) - proportional to collection size
With hash tables:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ To find "Moby Dick": โ
โ โ Apply hash function to "Moby Dick"โ
โ โ Get shelf number: 347 โ
โ โ Go directly to shelf 347 โ
โ โ Book is there (or it's not) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Time: O(1) - constant time, regardless of size
๐ก Mental Model: Think of a hash table as a filing system with an incredibly smart filing clerk. Tell them what you're looking for, and they instantly know exactly which drawer to checkโthey never need to search.
LeetCode Patterns: When Hash Structures Become Game-Changers
Now that you understand the performance implications, let's explore the common patterns where hash maps and sets transform impossible problems into elegant solutions. Recognizing these patterns is the key to interview success.
Pattern 1: The "Have I Seen This Before?" Pattern
Any time you need to track whether elements have appeared previously, a hash set is your best friend. Beyond the duplicate detection example above, this pattern applies to:
๐ง Finding the first non-repeating character in a string ๐ง Detecting cycles in linked lists (using a set to track visited nodes) ๐ง Finding the intersection of two arrays ๐ง Validating if a string contains all unique characters
Pattern 2: The "Count and Track" Pattern
When you need to count occurrences or track frequencies, a hash map (dictionary) is essential. The key stores the element, and the value stores the count or associated data.
def most_frequent_element(nums):
# Hash map to track element frequencies
frequency = {}
# Count occurrences of each element
for num in nums:
frequency[num] = frequency.get(num, 0) + 1
# Find the element with maximum frequency
max_freq = 0
most_frequent = None
for num, count in frequency.items():
if count > max_freq:
max_freq = count
most_frequent = num
return most_frequent, max_freq
result = most_frequent_element([1, 3, 2, 3, 4, 3, 5, 3])
print(f"Element {result[0]} appears {result[1]} times") # Element 3 appears 4 times
This pattern appears constantly in LeetCode problems:
๐ง Finding anagrams (grouping words by character frequency) ๐ง Top K frequent elements ๐ง Subarray sum problems (tracking cumulative sums) ๐ง Longest substring without repeating characters
Pattern 3: The "Two Sum" Pattern (Complement Lookup)
This is perhaps the most famous hash map pattern, thanks to LeetCode problem #1. The insight: instead of checking every pair of numbers to find two that sum to a target (O(nยฒ)), we can use a hash map to check if each number's complement exists (O(n)).
โ Wrong thinking: "I need to check every possible pair of numbers to find which two sum to the target."
โ Correct thinking: "For each number, I can calculate what its complement would need to be, then check if I've already seen that complement."
This pattern extends to:
๐ง Three sum problems (with modifications) ๐ง Finding pairs with specific differences ๐ง Checking if an array can be split into pairs meeting certain criteria
Pattern 4: The "Index Tracking" Pattern
Sometimes you need to remember not just if you've seen an element, but where you saw it. Hash maps excel at storing element-to-index mappings:
๐ง Finding the first repeating element ๐ง Longest substring problems (tracking last seen position of characters) ๐ง Subarray problems (storing cumulative sum indices) ๐ง Verifying if one string is a rotation of another
๐ก Pro Tip: When a problem asks about "subarrays" or "substrings," there's often a hash map solution that tracks cumulative values or last-seen indices to avoid nested loops.
Hash Maps vs Hash Sets: Choosing Your Weapon
By now you're convinced these structures are powerful, but when should you reach for a hash set versus a hash map?
Choose a Hash Set When:
๐ฏ You only need to track existence (yes/no, true/false) ๐ฏ You're checking for duplicates or unique elements ๐ฏ You need to quickly verify membership in a collection ๐ฏ Order doesn't matter and you want automatic deduplication
Visual Representation:
Hash Set Structure:
โโโโโโโโโโโโโโโโโโโโโโโ
โ Collection of โ
โ Unique Elements โ
โโโโโโโโโโโโโโโโโโโโโโโค
โ "apple" โ
โ "banana" โ
โ "cherry" โ
โ "date" โ
โโโโโโโโโโโโโโโโโโโโโโโ
Operations:
โ add(element)
โ remove(element)
โ contains(element)
Choose a Hash Map When:
๐ฏ You need to associate values with keys (key-value pairs) ๐ฏ You're counting frequencies or occurrences ๐ฏ You need to track additional information about elements (not just presence) ๐ฏ You're building lookups or caches
Visual Representation:
Hash Map Structure:
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ Key โ Value โ
โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโค
โ "apple" โ 3 โ
โ "banana" โ 7 โ
โ "cherry" โ 2 โ
โ "date" โ 5 โ
โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
Operations:
โ put(key, value)
โ get(key)
โ remove(key)
โ containsKey(key)
๐ง Mnemonic: SET for Simple Existence Testโif you only care whether something exists, use a set. MAP for Maintaining Associated Pairsโif you need to link data together, use a map.
๐ Quick Reference Card: Hash Set vs Hash Map
| Criterion | ๐ต Hash Set | ๐ข Hash Map |
|---|---|---|
| ๐ฏ Purpose | Track unique elements | Store key-value associations |
| ๐พ Storage | Just the element | Element (key) + associated data (value) |
| ๐ Lookup | Check if element exists | Get value associated with key |
| โก Common Use | Deduplication, membership testing | Counting, caching, indexing |
| ๐ Example | {1, 2, 3, 5, 8} | {"a": 1, "b": 2, "c": 3} |
| ๐ฎ LeetCode Pattern | "Contains Duplicate" | "Two Sum", "Group Anagrams" |
โ ๏ธ Common Mistake: Using a hash map when a hash set would suffice. If you're mapping elements to True or to dummy values, you probably just need a set. โ ๏ธ
Example of overcomplicating:
## Unnecessarily complex (hash map)
seen = {}
for num in nums:
seen[num] = True
if target in seen:
# ...
## Simpler and clearer (hash set)
seen = set()
for num in nums:
seen.add(num)
if target in seen:
# ...
The Complexity Trade-Off: Space vs Time
Before we move forward, it's crucial to understand that hash tables aren't free. They deliver incredible speed, but at a cost:
Time Complexity Benefits:
- Lookup: O(1) average case vs O(n) for arrays/lists
- Insertion: O(1) average case
- Deletion: O(1) average case
Space Complexity Cost:
- O(n) extra space for the hash table itself
- Additional overhead for hash table internals (typically 2-3x the raw data size)
๐ก Remember: In 95% of coding interviews, this trade-off is absolutely worth it. Modern systems have abundant memory, and the speed gains are dramatic. However, be prepared to discuss this trade-off if your interviewer asks about space complexity.
๐ค Did you know? Some LeetCode problems have a follow-up question: "Can you solve this in O(1) space?" This is asking you to find a solution without using a hash table. These are typically more complex algorithmic challenges that use techniques like two pointers, bit manipulation, or mathematical properties.
Real-World Impact: Beyond LeetCode
You might be thinking, "This is great for interviews, but when would I actually use this in real development?" The answer: constantly.
๐ก Real-World Example: Database indexing is fundamentally based on hash table principles. When you create an index on a database column, you're essentially building a hash map that allows O(1) average-case lookups instead of O(n) table scans. The same optimization you're learning for LeetCode powers queries on databases with billions of records.
Other real-world applications:
๐ Caching systems (Redis, Memcached) use hash tables to provide microsecond lookups ๐ Compilers and interpreters use hash maps for symbol tables ๐ Web frameworks use hash maps for routing (matching URLs to handlers) ๐ Browser rendering engines use hash tables for CSS selector matching ๐ Security systems use hash sets for checking passwords against breached databases
Setting Yourself Up for Success
As you progress through this lesson, you'll develop an intuition for when to reach for hash-based structures. Here are the key questions to ask yourself when approaching any array or string problem:
Decision Tree:
Start: New Array/String Problem
|
v
[Need to find/check something?]
|
โโ Yes: Does it involve comparing elements?
| |
| โโ Comparing pairs? โ Consider hash map (Two Sum pattern)
| โโ Checking duplicates? โ Consider hash set
|
โโ Yes: Does it involve counting?
|
โโ Counting frequencies? โ Hash map for sure
โโ Tracking positions? โ Hash map (element โ index)
If current approach is O(nยฒ) or worse:
โ
[Can I eliminate inner loop with O(1) lookups?]
โ
YES โ Hash table probably helps
๐ฏ Key Principle: The moment you catch yourself writing or even thinking about a nested loop for searching or counting, pause and ask: "Could a hash table eliminate this inner loop?"
In the sections that follow, we'll dive deep into the mechanics of how hash tables work, explore specific implementation patterns with detailed code examples, and walk through real LeetCode problems step by step. You'll see these patterns in action and develop the muscle memory to recognize them instantly.
The difference between struggling with LeetCode and breezing through medium and hard problems often comes down to this one insight: hash tables turn many O(nยฒ) problems into O(n) problems. Once you internalize this principle and learn to recognize the patterns, you'll wonder how you ever solved problems without them.
Let's continue to the next section where we'll uncover exactly how hash functions work their magic and explore the internal mechanics that make these incredible performance gains possible.
Core Concepts: Hash Maps and Hash Sets Fundamentals
To truly master hash-based data structures in LeetCode problems, you need to understand what's happening under the hood. When you write map[key] = value, a fascinating sequence of operations occurs that determines whether your algorithm runs in O(1) or degrades to O(n). Let's build that understanding from the ground up.
The Magic of Hash Functions
At the heart of every hash map lies a hash functionโa mathematical transformation that converts your key (whether it's a string, number, or object) into an integer index. Think of it as a sophisticated filing system where the hash function is your librarian, instantly knowing exactly which shelf holds your book.
Here's the fundamental process:
Key โ Hash Function โ Hash Code โ Modulo (% array_size) โ Bucket Index
"apple" โ hash() โ 98765432 โ 98765432 % 16 โ bucket[8]
The hash code is typically a large integer, but our underlying array has limited size. That's why we use the modulo operation to map that large number into a valid array index. If our internal array has 16 buckets, we compute hash_code % 16 to get an index between 0 and 15.
๐ฏ Key Principle: A good hash function distributes keys uniformly across all buckets. If all keys hash to the same bucket, your O(1) lookup becomes O(n).
Let's visualize how keys map to buckets:
Hash Table (capacity = 8)
Keys Buckets
"apple" โโโโโโโโโ
โ
"banana" โโโโโโ โ [0] โ empty
โ โ [1] โ empty
"cherry" โโโ โ โโโโโโโโโ [2] โ "apple"
โ โโโโโโโโโโโโ [3] โ "banana"
"date" โโโโผโโโโโโโโโโโ [4] โ empty
โโโโโโโโโโ โ [5] โ "cherry"
"grape" โโโโโโโโโโผโโดโโโโ [6] โ "date", "grape" (collision!)
โ [7] โ empty
โโโโโโโ [5] โ (collision with "cherry"!)
๐ก Mental Model: Think of a hash table as an apartment building. The hash function is the address calculation that tells you which floor (bucket) to go to. Ideally, each apartment has one resident, but sometimes (collisions) multiple people need to share.
Understanding Collisions: When Two Keys Want the Same Bucket
No matter how good your hash function, collisions are inevitable. When two different keys produce the same bucket index, we need a strategy to handle them. This is where hash table implementations diverge into two main approaches.
Collision Strategy 1: Chaining (Separate Chaining)
Chaining treats each bucket as a pointer to a linked list (or other data structure) that holds all key-value pairs hashing to that bucket. When you insert a key that collides, you simply append it to the list.
Bucket Array with Chaining:
[0] โ null
[1] โ null
[2] โ ["apple":5] โ null
[3] โ ["banana":3] โ ["blueberry":7] โ null (collision chain)
[4] โ null
[5] โ ["cherry":2] โ null
[6] โ ["date":9] โ ["dragonfruit":1] โ ["durian":4] โ null (long chain!)
[7] โ null
Here's a simplified implementation showing how chaining works:
class HashMapChaining:
def __init__(self, capacity=16):
self.capacity = capacity
# Each bucket is a list of (key, value) tuples
self.buckets = [[] for _ in range(capacity)]
self.size = 0
def _hash(self, key):
"""Simple hash function using Python's built-in hash"""
return hash(key) % self.capacity
def put(self, key, value):
"""Insert or update a key-value pair"""
bucket_idx = self._hash(key)
bucket = self.buckets[bucket_idx]
# Check if key already exists in the chain
for i, (k, v) in enumerate(bucket):
if k == key:
bucket[i] = (key, value) # Update existing
return
# Key doesn't exist, append to chain
bucket.append((key, value))
self.size += 1
def get(self, key):
"""Retrieve value for a key"""
bucket_idx = self._hash(key)
bucket = self.buckets[bucket_idx]
# Search through the chain
for k, v in bucket:
if k == key:
return v
raise KeyError(f"Key '{key}' not found")
def delete(self, key):
"""Remove a key-value pair"""
bucket_idx = self._hash(key)
bucket = self.buckets[bucket_idx]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket.pop(i)
self.size -= 1
return
raise KeyError(f"Key '{key}' not found")
## Usage example
hash_map = HashMapChaining()
hash_map.put("apple", 5)
hash_map.put("banana", 3)
print(hash_map.get("apple")) # Output: 5
โ ๏ธ Common Mistake #1: Assuming hash table operations are always O(1). If all keys collide into the same bucket, your chain becomes a linked list, degrading lookup to O(n). โ ๏ธ
Collision Strategy 2: Open Addressing
Open addressing stores all entries directly in the bucket arrayโno linked lists. When a collision occurs, the algorithm probes for the next available slot using a deterministic sequence.
The most common probing strategies:
๐ง Linear Probing: Check the next bucket (index + 1, index + 2, index + 3...) ๐ง Quadratic Probing: Check at increasing intervals (index + 1ยฒ, index + 2ยฒ, index + 3ยฒ...) ๐ง Double Hashing: Use a second hash function to determine the probe sequence
Open Addressing with Linear Probing:
Insert "apple" โ hash = 3
[0] empty
[1] empty
[2] empty
[3] "apple":5 โ
[4] empty
Insert "avocado" โ hash = 3 (collision!)
[0] empty
[1] empty
[2] empty
[3] "apple":5 โ occupied, probe next
[4] "avocado":8 โ found empty slot
Insert "apricot" โ hash = 3 (collision!)
[0] empty
[1] empty
[2] empty
[3] "apple":5 โ occupied
[4] "avocado":8 โ occupied
[5] "apricot":2 โ found empty slot after probing
๐ก Pro Tip: Open addressing is more cache-friendly than chaining because data is stored contiguously in memory. However, it requires careful management of the load factor (ratio of entries to bucket count) to avoid long probe sequences.
Time and Space Complexity: The Reality Check
You've probably heard that hash tables offer O(1) operations. That's trueโwith important caveats. Let's break down the complexity analysis with brutal honesty.
Time Complexity
Average Case (with good hash function and low load factor):
- Insert: O(1)
- Lookup: O(1)
- Delete: O(1)
Worst Case (all keys collide to same bucket):
- Insert: O(n)
- Lookup: O(n)
- Delete: O(n)
The worst case occurs when:
- Your hash function is poorly designed and creates clustering
- An adversary deliberately chooses keys that collide (hash collision attacks)
- The load factor becomes too high (too many entries, too few buckets)
๐ฏ Key Principle: Most modern hash table implementations (like Python's dict, Java's HashMap, JavaScript's Map) use sophisticated hash functions and automatic resizing to maintain average-case O(1) performance. In practice, you can rely on O(1) for LeetCode problems.
Space Complexity
Hash tables always use O(n) space where n is the number of key-value pairs stored. However, the actual memory usage includes:
- The bucket array itself (size depends on capacity, not just number of entries)
- Additional overhead for collision handling (linked list nodes for chaining, or empty slots for open addressing)
- The load factor determines wasted space: typical load factor is 0.75, meaning the array is 25% larger than needed
Space Usage Example:
Storing 75 entries with load factor 0.75:
- Requires bucket array of size 100
- With chaining: 100 bucket pointers + 75 linked list nodes
- With open addressing: 100 array slots (75 filled, 25 empty)
๐ก Real-World Example: Python dictionaries automatically resize when the load factor exceeds 2/3. During resize, Python creates a new array (typically 4x larger for small dicts, 2x for large) and rehashes all entries. This is why occasionally an insert might take O(n) timeโbut amortized across many operations, it's still O(1).
Hash Maps vs Hash Sets: Cousins with Different Purposes
A hash map (also called dictionary, associative array, or map) stores key-value pairs, allowing you to associate data with unique keys. A hash set stores only keys (or values, depending on perspective), primarily for membership testing.
Structural Differences:
| Feature | Hash Map | Hash Set |
|---|---|---|
| ๐ Stores | Key-value pairs | Keys only |
| ๐ฏ Primary Use | Associate data, count frequencies | Check existence, remove duplicates |
| ๐ Internal Storage | (key, value) tuples | Single values |
| ๐พ Memory | More (stores both) | Less (stores one) |
| ๐ง Typical Operations | map[key] = value, map.get(key) |
set.add(key), key in set |
Here's when to use each:
## Use HASH MAP when you need to associate information
def character_frequency(s):
"""Count how many times each character appears"""
freq_map = {} # char โ count
for char in s:
freq_map[char] = freq_map.get(char, 0) + 1
return freq_map
print(character_frequency("hello"))
## Output: {'h': 1, 'e': 1, 'l': 2, 'o': 1}
## Use HASH SET when you only need to track existence
def contains_duplicate(nums):
"""Check if array has duplicates"""
seen = set()
for num in nums:
if num in seen:
return True # Found duplicate!
seen.add(num)
return False
print(contains_duplicate([1, 2, 3, 1])) # Output: True
โ ๏ธ Common Mistake #2: Using a hash map when a hash set suffices. If you find yourself storing dummy values like map[key] = true or map[key] = 1 just to mark existence, you should be using a set instead. โ ๏ธ
Basic Operations: Code Examples Across Languages
Let's see how to perform fundamental hash table operations in the most common languages for LeetCode problems.
Python: Dictionaries and Sets
## Hash Map (Dictionary) Operations
frequency = {} # Create empty hash map
## INSERT / UPDATE
frequency['apple'] = 5
frequency['banana'] = frequency.get('banana', 0) + 1 # Safe increment
## LOOKUP
if 'apple' in frequency: # Check existence (O(1))
count = frequency['apple'] # Get value
count = frequency.get('grape', 0) # Safe lookup with default
## DELETE
if 'banana' in frequency:
del frequency['banana'] # Remove key-value pair
removed_value = frequency.pop('apple', None) # Remove and return value
## ITERATE
for key in frequency: # Iterate keys
print(f"{key}: {frequency[key]}")
for key, value in frequency.items(): # Iterate key-value pairs
print(f"{key}: {value}")
## Hash Set Operations
unique_nums = set() # Create empty hash set
## INSERT
unique_nums.add(42)
unique_nums.add(17)
## LOOKUP
if 42 in unique_nums: # Membership test (O(1))
print("Found!")
## DELETE
unique_nums.discard(42) # Remove if exists (no error if not found)
unique_nums.remove(17) # Remove (raises KeyError if not found)
## SET OPERATIONS
set_a = {1, 2, 3}
set_b = {2, 3, 4}
intersection = set_a & set_b # {2, 3}
union = set_a | set_b # {1, 2, 3, 4}
difference = set_a - set_b # {1}
๐ก Pro Tip: Python's dict.get(key, default) is your best friend in LeetCode. It prevents KeyError exceptions and makes counting patterns elegant: count[x] = count.get(x, 0) + 1.
Java: HashMap and HashSet
import java.util.*;
// Hash Map Operations
HashMap<String, Integer> frequency = new HashMap<>();
// INSERT / UPDATE
frequency.put("apple", 5);
frequency.put("banana", frequency.getOrDefault("banana", 0) + 1);
// LOOKUP
if (frequency.containsKey("apple")) { // Check existence
int count = frequency.get("apple"); // Get value
}
int count = frequency.getOrDefault("grape", 0); // Safe lookup
// DELETE
frequency.remove("banana"); // Remove key-value pair
Integer removedValue = frequency.remove("apple"); // Remove and return
// ITERATE
for (String key : frequency.keySet()) { // Iterate keys
System.out.println(key + ": " + frequency.get(key));
}
for (Map.Entry<String, Integer> entry : frequency.entrySet()) {
System.out.println(entry.getKey() + ": " + entry.getValue());
}
// Hash Set Operations
HashSet<Integer> uniqueNums = new HashSet<>();
// INSERT
uniqueNums.add(42);
uniqueNums.add(17);
// LOOKUP
if (uniqueNums.contains(42)) { // Membership test
System.out.println("Found!");
}
// DELETE
uniqueNums.remove(42); // Remove element
// SIZE
int size = uniqueNums.size();
๐ค Did you know? Java's HashMap uses separate chaining with linked lists that automatically convert to balanced tree structures (red-black trees) when a chain exceeds 8 elements. This ensures O(log n) worst-case performance instead of O(n).
JavaScript: Map and Set
// Hash Map (Map) Operations
const frequency = new Map(); // Use Map, not plain objects!
// INSERT / UPDATE
frequency.set('apple', 5);
frequency.set('banana', (frequency.get('banana') || 0) + 1);
// LOOKUP
if (frequency.has('apple')) { // Check existence
const count = frequency.get('apple'); // Get value
}
const count = frequency.get('grape') || 0; // Safe lookup with default
// DELETE
frequency.delete('banana'); // Remove key-value pair (returns boolean)
// ITERATE
for (const key of frequency.keys()) { // Iterate keys
console.log(`${key}: ${frequency.get(key)}`);
}
for (const [key, value] of frequency.entries()) { // Iterate pairs
console.log(`${key}: ${value}`);
}
// Hash Set Operations
const uniqueNums = new Set();
// INSERT
uniqueNums.add(42);
uniqueNums.add(17);
// LOOKUP
if (uniqueNums.has(42)) { // Membership test
console.log('Found!');
}
// DELETE
uniqueNums.delete(42); // Remove element
// CONVERT Array to Set (remove duplicates)
const arr = [1, 2, 2, 3, 3, 3];
const unique = [...new Set(arr)]; // [1, 2, 3]
โ ๏ธ Common Mistake #3: Using plain JavaScript objects {} as hash maps. While this works for string keys, Map is superior because: (1) it preserves key insertion order, (2) keys can be any type (including objects), (3) it has a proper size property, and (4) it's optimized for frequent additions/deletions. โ ๏ธ
Load Factor and Dynamic Resizing
The load factor is the ratio of stored entries to bucket capacity. It's the critical metric determining hash table performance:
Load Factor = Number of Entries / Number of Buckets
Example: 75 entries in 100 buckets โ load factor = 0.75
๐ฏ Key Principle: When the load factor exceeds a threshold (typically 0.75), the hash table automatically resizes by:
- Allocating a new, larger bucket array (usually 2x size)
- Rehashing all existing entries into the new array
- Deallocating the old array
This resizing operation is O(n) but happens infrequently enough that the amortized cost per insertion remains O(1).
Hash Table Growth Timeline:
Capacity: 4 โ Insert 4th item โ Load = 4/4 = 1.0 โ RESIZE!
Capacity: 8 โ Insert 7th item โ Load = 7/8 = 0.875 โ RESIZE!
Capacity: 16 โ Insert 13th item โ Load = 13/16 = 0.81 โ RESIZE!
Capacity: 32 โ ... and so on
๐ก Real-World Example: In a LeetCode problem processing 10,000 elements, your hash map might resize about 14 times (24 = 16 to 214 = 16,384). Each resize takes progressively longer, but the total cost across all resizes is still O(n), making the amortized cost per insertion O(1).
Choosing the Right Hash Function
While most programming languages provide excellent built-in hash functions, understanding what makes a good hash function helps you avoid pitfalls when implementing custom objects as keys.
Properties of a Good Hash Function:
โ Deterministic: Same key always produces same hash code โ Uniform Distribution: Keys spread evenly across buckets โ Fast Computation: Hashing shouldn't be the bottleneck โ Avalanche Effect: Small change in key โ large change in hash code
Properties of a Bad Hash Function:
โ Clustering: Many keys map to few buckets โ Predictable Collisions: Easy to deliberately cause collisions โ High Computational Cost: Slows down every operation
For LeetCode problems, you rarely need to implement custom hash functions. However, if you're using custom objects as keys, ensure they're:
- Immutable: Hash code shouldn't change after insertion
- Properly Implemented: Override both
hashCode()(Java) or__hash__()(Python) AND equality methods
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __hash__(self):
# Combine x and y into single hash code
return hash((self.x, self.y))
def __eq__(self, other):
# Two points are equal if coordinates match
return self.x == other.x and self.y == other.y
## Now Point can be used as dictionary key
point_data = {}
point_data[Point(1, 2)] = "Location A"
point_data[Point(3, 4)] = "Location B"
print(point_data[Point(1, 2)]) # Output: "Location A"
๐ก Remember: In 95% of LeetCode problems, you'll use primitive types (integers, strings) or tuples as keys, all of which have excellent built-in hash functions. Custom hashing is an edge case.
Wrapping Up the Fundamentals
You now understand the machinery behind hash maps and setsโfrom hash functions that convert keys to bucket indices, through collision resolution strategies that handle inevitable overlaps, to the dynamic resizing that maintains performance. This foundation prepares you to recognize when hash-based solutions will optimize your algorithms and to debug performance issues when they arise.
The key insights to carry forward:
๐ง Hash tables trade space for time: O(n) memory enables O(1) lookups
๐ง Average case vs worst case matters: Real-world performance is typically O(1), but understanding degradation helps you write robust code
๐ง Choose hash map vs hash set based on your data: Key-value associations need maps; existence checking needs sets
๐ง Language-specific implementations differ: Know the idioms (get() in Python, getOrDefault() in Java, || in JavaScript) for clean, bug-free code
With these fundamentals solidified, you're ready to learn the specific patterns that make hash maps so powerful for solving LeetCode's array and string problemsโwhich we'll explore in the next section.
Implementation Patterns: Common LeetCode Hash Map Techniques
Now that we understand the fundamentals of hash maps and sets, let's explore the five essential patterns that appear repeatedly in LeetCode problems. Mastering these patterns will transform your problem-solving ability, allowing you to recognize opportunities to apply hash-based solutions instantly.
Pattern 1: Frequency Counting
The frequency counting pattern is perhaps the most fundamental hash map technique. Whenever you need to track "how many times" something appears in a collection, reach for a hash map. This pattern transforms counting operations from O(nยฒ) nested loops into elegant O(n) single-pass solutions.
๐ฏ Key Principle: A hash map acts as a dynamic counter where keys represent elements and values represent their occurrence counts.
Let's examine how this works in practice:
def character_frequency(s):
"""
Count frequency of each character in a string.
Returns a dictionary mapping characters to counts.
"""
freq_map = {}
for char in s:
# The get() method provides default value 0 if key doesn't exist
freq_map[char] = freq_map.get(char, 0) + 1
return freq_map
## Example usage
text = "programming"
freq = character_frequency(text)
## Result: {'p': 1, 'r': 2, 'o': 1, 'g': 2, 'a': 1, 'm': 2, 'i': 1, 'n': 1}
## Find the most common character
most_common = max(freq, key=freq.get)
print(f"Most common: '{most_common}' appears {freq[most_common]} times")
This pattern extends beyond simple character counting. You can count array elements, words in text, patterns in sequences, or any discrete items. The beauty lies in the hash map's ability to dynamically create new keys as you encounter new elements.
๐ก Pro Tip: Python's collections.Counter is a specialized hash map for frequency counting, but understanding the manual implementation helps you adapt the pattern to more complex scenarios.
Here's a more sophisticated applicationโfinding the first unique character in a string:
def first_unique_character(s):
"""
Find index of first character that appears only once.
Returns -1 if no unique character exists.
"""
# Step 1: Build frequency map
freq = {}
for char in s:
freq[char] = freq.get(char, 0) + 1
# Step 2: Find first character with frequency 1
for i, char in enumerate(s):
if freq[char] == 1:
return i
return -1
## Example
result = first_unique_character("leetcode") # Returns 0 ('l')
result = first_unique_character("loveleetcode") # Returns 2 ('v')
โ ๏ธ Common Mistake 1: Forgetting to handle the case where a key doesn't exist yet. Always use .get(key, default) or check with if key in map before accessing. โ ๏ธ
Pattern 2: Two-Sum Pattern (Complement Search)
The two-sum pattern revolutionized how we think about pair-finding problems. Instead of using nested loops to check every pair (O(nยฒ)), we use a hash map to find complements in a single pass (O(n)).
๐ง Mental Model: As you iterate through elements, the hash map remembers what you've seen. For each new element, you ask: "Have I seen the complement that would complete my target?"
Here's the visualization of the process:
Target = 9, Array = [2, 7, 11, 15]
Step 1: Element = 2
Complement needed = 9 - 2 = 7
Is 7 in map? No
Add 2 to map: {2: 0}
Step 2: Element = 7
Complement needed = 9 - 7 = 2
Is 2 in map? YES! โ
Return indices [0, 1]
The implementation is remarkably clean:
def two_sum(nums, target):
"""
Find two numbers that add up to target.
Returns their indices.
"""
seen = {} # Maps value -> index
for i, num in enumerate(nums):
complement = target - num
# Check if we've seen the complement before
if complement in seen:
return [seen[complement], i]
# Store current number and its index
seen[num] = i
return [] # No solution found
๐ก Real-World Example: This pattern appears in financial applications that need to find transactions summing to a specific amount, or in recommendation systems finding complementary products.
The two-sum pattern extends to many variations:
๐ง Two-sum variations:
- Three-sum: Combine one loop with two-sum for O(nยฒ)
- Two-difference: Find pairs with specific difference (target = a - b)
- K-sum problems: Generalize to find k numbers that sum to target
- Pair counting: Count how many valid pairs exist
โ ๏ธ Common Mistake 2: Using the same element twice. Always check that seen[complement] has a different index than the current position. โ ๏ธ
Pattern 3: Grouping and Categorization
The grouping pattern uses hash maps to organize elements into categories based on shared characteristics. This pattern shines when solving anagram problems, pattern matching, and data categorization tasks.
๐ฏ Key Principle: The hash map key represents the category identifier, while the value holds all elements belonging to that category (typically a list).
Consider the classic "Group Anagrams" problem:
def group_anagrams(words):
"""
Group words that are anagrams of each other.
Returns a list of grouped anagram lists.
"""
from collections import defaultdict
# Key: sorted characters, Value: list of words
groups = defaultdict(list)
for word in words:
# Create a canonical form (sorted characters)
key = ''.join(sorted(word))
groups[key].append(word)
return list(groups.values())
## Example
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
result = group_anagrams(words)
## Result: [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]
The grouping process visualized:
Input: ["eat", "tea", "tan", "ate", "nat", "bat"]
Hash Map (sorted chars -> words):
โโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ Key โ Value โ
โโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ "aet" โ [eat, tea, ate] โ
โ "ant" โ [tan, nat] โ
โ "abt" โ [bat] โ
โโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโ
Output: [[eat, tea, ate], [tan, nat], [bat]]
๐ค Did you know? For anagram grouping, you can also use character frequency tuples as keys instead of sorted strings. For example, "eat" becomes (1,0,0,0,1,0,...,1,0,0,0) representing counts of 'a' through 'z'.
Another powerful application is pattern matching:
def find_isomorphic_strings(s, t):
"""
Check if two strings follow the same pattern.
Example: "egg" and "add" are isomorphic (e->a, g->d)
"""
if len(s) != len(t):
return False
# Two maps: s->t mapping and t->s mapping (bijection)
map_s_to_t = {}
map_t_to_s = {}
for char_s, char_t in zip(s, t):
# Check s -> t mapping
if char_s in map_s_to_t:
if map_s_to_t[char_s] != char_t:
return False
else:
map_s_to_t[char_s] = char_t
# Check t -> s mapping (ensure bijection)
if char_t in map_t_to_s:
if map_t_to_s[char_t] != char_s:
return False
else:
map_t_to_s[char_t] = char_s
return True
๐ก Pro Tip: When grouping, collections.defaultdict(list) eliminates the need to check if a key exists before appendingโit automatically creates an empty list for new keys.
Pattern 4: Tracking Indices for Position-Aware Problems
Many problems require knowing not just what elements you've seen, but where you saw them. The index tracking pattern stores positions as hash map values, enabling efficient lookups for position-based queries.
๐ฏ Key Principle: Store element -> index mappings when you need to retrieve positions later, or element -> list of indices when elements can appear multiple times.
Here's a practical exampleโfinding the longest substring without repeating characters:
def length_of_longest_substring(s):
"""
Find length of longest substring without repeating characters.
Uses sliding window with hash map to track last seen index.
"""
char_index = {} # Maps character -> its most recent index
max_length = 0
start = 0 # Start of current window
for end, char in enumerate(s):
# If character seen before AND within current window
if char in char_index and char_index[char] >= start:
# Move start to position after previous occurrence
start = char_index[char] + 1
# Update character's position
char_index[char] = end
# Update max length
max_length = max(max_length, end - start + 1)
return max_length
## Example
result = length_of_longest_substring("abcabcbb") # Returns 3 ("abc")
result = length_of_longest_substring("pwwkew") # Returns 3 ("wke")
Visualization of the sliding window process:
String: "abcabcbb"
Step Window char_index max_length
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0 [a] {a:0} 1
1 [ab] {a:0, b:1} 2
2 [abc] {a:0, b:1, c:2} 3
3 [bca] {a:3, b:1, c:2} 3
(start moves from 0 to 1 due to 'a')
4 [cab] {a:3, b:4, c:2} 3
(start moves from 1 to 2 due to 'b')
5 [abc] {a:3, b:4, c:5} 3
(start moves from 2 to 3 due to 'c')
Another common application tracks indices for range queries:
def find_all_anagram_indices(s, p):
"""
Find all start indices of p's anagrams in s.
Returns list of starting indices.
"""
from collections import Counter
if len(p) > len(s):
return []
result = []
p_count = Counter(p)
window_count = Counter(s[:len(p)-1]) # Initialize window
for i in range(len(p)-1, len(s)):
# Add new character to window
window_count[s[i]] += 1
# Check if window matches pattern
if window_count == p_count:
result.append(i - len(p) + 1)
# Remove leftmost character for next iteration
left_char = s[i - len(p) + 1]
window_count[left_char] -= 1
if window_count[left_char] == 0:
del window_count[left_char]
return result
โ ๏ธ Common Mistake 3: When tracking indices in a sliding window, forgetting to check if the previously seen index is actually within the current window bounds. โ ๏ธ
Pattern 5: Prefix Sum Maps for Subarray Problems
The prefix sum pattern combines cumulative sums with hash maps to solve subarray problems in linear time. This pattern is particularly powerful for finding subarrays with specific sum properties.
๐ง Mental Model: A prefix sum at position i represents the sum of all elements from index 0 to i. If prefix_sum[j] - prefix_sum[i] = target, then the subarray from i+1 to j has sum equal to target.
Mathematically:
If prefix[j] - prefix[i] = target
Then prefix[i] = prefix[j] - target
So we store prefix sums and check:
"Have I seen (current_sum - target) before?"
Here's the implementation for finding subarrays that sum to a target:
def subarray_sum_equals_k(nums, k):
"""
Count number of continuous subarrays that sum to k.
Uses prefix sum with hash map.
"""
count = 0
current_sum = 0
# Maps prefix_sum -> frequency of that sum
prefix_sums = {0: 1} # Base case: empty prefix
for num in nums:
current_sum += num
# Check if there's a prefix sum that gives us target
# current_sum - previous_sum = k
# previous_sum = current_sum - k
if (current_sum - k) in prefix_sums:
count += prefix_sums[current_sum - k]
# Add current sum to map
prefix_sums[current_sum] = prefix_sums.get(current_sum, 0) + 1
return count
## Example
nums = [1, 2, 3]
result = subarray_sum_equals_k(nums, 3)
## Result: 2 (subarrays: [3] and [1,2])
Visualization of the process:
Array: [1, 2, 3], Target k = 3
Step num current_sum looking_for prefix_sums count
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0 - 0 - {0: 1} 0
1 1 1 1-3=-2 {0:1, 1:1} 0
2 2 3 3-3=0 {0:1, 1:1, 3:1} 1 โ
3 3 6 6-3=3 {0:1, 1:1, 3:1, 6:1} 2 โ
Subarrays found:
[2] at index 1: prefix[1] - prefix[0] = 1-0 = 1... wait, that's not 3
[1,2] at indices 0-1: prefix[1] = 3, and we saw prefix 0
[3] at index 2: prefix[2] = 6, and we saw prefix 3
๐ก Pro Tip: The base case {0: 1} handles subarrays that start from index 0. Without it, we'd miss subarrays like [1, 2] in the example above.
This pattern extends to binary arrays (finding equal 0s and 1s) by treating 0 as -1:
def find_max_length_binary_subarray(nums):
"""
Find longest subarray with equal number of 0s and 1s.
Treats 0 as -1, then finds subarray with sum = 0.
"""
max_length = 0
current_sum = 0
# Maps sum -> earliest index where this sum occurred
sum_index = {0: -1} # Base case
for i, num in enumerate(nums):
# Treat 0 as -1
current_sum += 1 if num == 1 else -1
if current_sum in sum_index:
# Found a subarray with equal 0s and 1s
length = i - sum_index[current_sum]
max_length = max(max_length, length)
else:
# First time seeing this sum
sum_index[current_sum] = i
return max_length
๐ฏ Key Principle: For "longest" subarray problems, store the earliest index where each prefix sum appears. For "count all" problems, store the frequency of each prefix sum.
Pattern Integration: Combining Techniques
Real interview problems often require combining multiple patterns. Understanding how these patterns work together is crucial for tackling complex challenges.
๐ Quick Reference Card: Pattern Selection Guide
| ๐ฏ Problem Type | ๐ง Pattern to Use | โฑ๏ธ Complexity |
|---|---|---|
| ๐ข Count occurrences | Frequency counting | O(n) time, O(k) space |
| ๐ฒ Find pairs summing to target | Two-sum complement | O(n) time, O(n) space |
| ๐ Group similar items | Grouping/categorization | O(nยทm) time* |
| ๐ Track positions | Index tracking | O(n) time, O(n) space |
| โ Subarray sum queries | Prefix sum map | O(n) time, O(n) space |
*where m is the cost of computing the grouping key (e.g., sorting for anagrams)
โ Correct thinking: "This problem asks about subarrays with a sum propertyโI should consider prefix sums. But I also need to count occurrences, so I'll track prefix sum frequencies."
โ Wrong thinking: "I need a hash map, so I'll just try different things until something works."
๐ก Remember: The best solution often emerges from recognizing which pattern(s) the problem naturally maps to, not from memorizing specific problem solutions.
These five patterns form the foundation of hash map problem-solving. As you practice, you'll develop intuition for recognizing when each pattern applies, and you'll learn to combine them fluently to tackle increasingly complex challenges. The key is understanding why each pattern works, not just how to implement it.
Practical Application: Solving Real LeetCode Problems
Now that we understand the patterns, let's roll up our sleeves and solve real LeetCode problems from start to finish. Each problem demonstrates a distinct hash map technique that you'll encounter repeatedly in interviews. We'll walk through the thought process, build the solution step by step, and analyze why it works.
Problem 1: Two Sum - The Foundation
Two Sum is perhaps the most famous hash map problem on LeetCode. Given an array of integers and a target sum, find two numbers that add up to the target and return their indices.
Let's think through this problem systematically. The brute force approach would check every pair of numbers, requiring nested loops with O(nยฒ) time complexity. But we can do better by recognizing a key insight: for each number we examine, we know exactly what its complement must be (target - current number). If we've seen that complement before, we have our answer.
๐ฏ Key Principle: When you need to find pairs that satisfy a condition, store what you're looking for in a hash map as you iterate.
Here's how the algorithm unfolds:
def two_sum(nums, target):
"""
Find two numbers that add up to target and return their indices.
Time Complexity: O(n) - single pass through array
Space Complexity: O(n) - hash map stores up to n elements
"""
# Map to store: number -> its index
seen = {}
for i, num in enumerate(nums):
complement = target - num
# Check if we've seen the complement before
if complement in seen:
return [seen[complement], i]
# Store current number and its index
seen[num] = i
return [] # No solution found
## Example walkthrough
nums = [2, 7, 11, 15]
target = 9
print(two_sum(nums, target)) # Output: [0, 1]
Let's trace through the execution visually:
Iteration 0: num=2, complement=7
seen = {}
7 not in seen
Store: seen = {2: 0}
Iteration 1: num=7, complement=2
seen = {2: 0}
2 in seen! โ
Return [0, 1]
โ ๏ธ Common Mistake 1: Storing the index before checking for the complement. This can cause issues when the same element appears twice and you need both occurrences. The order matters! โ ๏ธ
๐ก Pro Tip: Notice we store seen[num] = i AFTER checking for the complement. This prevents using the same element twice (e.g., if target=6 and num=3, we don't want to return the same index twice).
Problem 2: Group Anagrams - Custom Hash Keys
Group Anagrams asks us to group strings that are anagrams of each other. For example, ["eat", "tea", "tan", "ate", "nat", "bat"] should become [["eat","tea","ate"], ["tan","nat"], ["bat"]].
The challenge here is choosing the right hash key. Anagrams contain the same characters in different orders, so we need a key that's identical for all anagrams. Two excellent options emerge:
- Sorted string: Sort the characters alphabetically ("eat" โ "aet")
- Character count tuple: Count each character's frequency
from collections import defaultdict
def group_anagrams(strs):
"""
Group strings that are anagrams of each other.
Time Complexity: O(n * k log k) where n = number of strings, k = max string length
Space Complexity: O(n * k) for storing all strings in groups
"""
# Map from sorted string -> list of original strings
anagram_groups = defaultdict(list)
for string in strs:
# Create a canonical key by sorting characters
key = ''.join(sorted(string))
anagram_groups[key].append(string)
return list(anagram_groups.values())
## Example
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(words))
## Output: [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
Let's visualize how the hash map builds up:
Processing "eat":
key = "aet"
anagram_groups = {"aet": ["eat"]}
Processing "tea":
key = "aet"
anagram_groups = {"aet": ["eat", "tea"]}
Processing "tan":
key = "ant"
anagram_groups = {"aet": ["eat", "tea"],
"ant": ["tan"]}
Processing "ate":
key = "aet"
anagram_groups = {"aet": ["eat", "tea", "ate"],
"ant": ["tan"]}
... and so on
For an alternative approach using character counts, we can create a tuple of 26 integers representing the frequency of each letter:
def group_anagrams_by_count(strs):
"""
Alternative approach using character frequency as key.
Time Complexity: O(n * k) where n = number of strings, k = max string length
Space Complexity: O(n * k)
"""
anagram_groups = defaultdict(list)
for string in strs:
# Create frequency count as tuple (tuples are hashable)
count = [0] * 26
for char in string:
count[ord(char) - ord('a')] += 1
# Use tuple as key
key = tuple(count)
anagram_groups[key].append(string)
return list(anagram_groups.values())
๐ค Did you know? The character count approach is actually fasterโO(n * k) versus O(n * k log k)โbecause counting is linear while sorting is logarithmic. However, sorted strings are often more intuitive and the performance difference rarely matters for typical string lengths.
๐ก Mental Model: Think of the hash key as a "fingerprint" that uniquely identifies a group. All anagrams share the same fingerprint, whether it's sorted characters or frequency counts.
Problem 3: Longest Substring Without Repeating Characters - Sliding Window with Hash Set
This problem asks us to find the length of the longest substring without repeating characters. For example, in "abcabcbb", the answer is 3 ("abc").
This problem combines two powerful techniques: the sliding window pattern and a hash set to track seen characters. The window expands by moving the right pointer and contracts by moving the left pointer when we encounter duplicates.
def length_of_longest_substring(s):
"""
Find length of longest substring without repeating characters.
Time Complexity: O(n) - each character visited at most twice (once by right, once by left)
Space Complexity: O(min(n, m)) where m = size of character set (typically O(1) for ASCII)
"""
char_set = set()
left = 0
max_length = 0
for right in range(len(s)):
# If character already in window, shrink from left
while s[right] in char_set:
char_set.remove(s[left])
left += 1
# Add current character to window
char_set.add(s[right])
# Update max length
max_length = max(max_length, right - left + 1)
return max_length
## Example walkthrough
print(length_of_longest_substring("abcabcbb")) # Output: 3
print(length_of_longest_substring("bbbbb")) # Output: 1
print(length_of_longest_substring("pwwkew")) # Output: 3 ("wke")
Let's trace through "abcabcbb" step by step:
Window State:
[a]bcabcbb left=0, right=0, char_set={'a'}, max_length=1
[ab]cabcbb left=0, right=1, char_set={'a','b'}, max_length=2
[abc]abcbb left=0, right=2, char_set={'a','b','c'}, max_length=3
Duplicate 'a' found!
a[bca]bcbb left=1, right=3, char_set={'b','c','a'}, max_length=3
Duplicate 'b' found!
bc[ab]cbb left=2, right=3, removed 'b', then added 'a'
c[abc]bb left=3, right=4, char_set={'c','a','b'}, max_length=3
Duplicate 'c' found!
a[bc]bb left=4, right=5, char_set={'a','b','c'}, max_length=3
... continues
๐ฏ Key Principle: The sliding window expands optimistically (moving right) and contracts when constraints are violated (moving left). The hash set provides O(1) lookups to detect duplicates instantly.
โ ๏ธ Common Mistake 2: Using a while loop instead of an if statement when shrinking the window. You need while because multiple characters might need removal before the duplicate is gone. โ ๏ธ
๐ก Pro Tip: An optimization exists using a hash map instead of a set. Map characters to their most recent index, allowing you to jump the left pointer directly to the duplicate's position + 1, avoiding the while loop entirely.
Problem 4: Subarray Sum Equals K - Prefix Sums with Hash Maps
Subarray Sum Equals K is a beautiful problem that showcases prefix sum technique with hash maps. Given an array of integers and a target sum k, count how many contiguous subarrays sum to k.
The insight here is profound: if we know the cumulative sum up to index i is sum_i, and we've seen a prefix with sum sum_i - k before, then the subarray between those positions sums to k.
โ Wrong thinking: "I need to check every possible subarray (nested loops)." This leads to O(nยฒ).
โ Correct thinking: "I need to track prefix sums and check if (current_sum - k) exists in my hash map." This achieves O(n).
Here's the mathematical foundation:
If prefix_sum[j] - prefix_sum[i] = k
Then prefix_sum[i] = prefix_sum[j] - k
So we look for how many times (current_sum - k) appeared before!
from collections import defaultdict
def subarray_sum(nums, k):
"""
Count subarrays that sum to k using prefix sums.
Time Complexity: O(n) - single pass through array
Space Complexity: O(n) - hash map stores prefix sums
"""
# Map from prefix_sum -> count of times seen
prefix_sums = defaultdict(int)
prefix_sums[0] = 1 # Empty prefix (important for subarrays starting at index 0)
current_sum = 0
count = 0
for num in nums:
current_sum += num
# Check if (current_sum - k) exists
# If it does, we found subarray(s) that sum to k
target = current_sum - k
if target in prefix_sums:
count += prefix_sums[target]
# Record current prefix sum
prefix_sums[current_sum] += 1
return count
## Examples
print(subarray_sum([1, 1, 1], 2)) # Output: 2 ([1,1] at positions 0-1 and 1-2)
print(subarray_sum([1, 2, 3], 3)) # Output: 2 ([1,2] and [3])
print(subarray_sum([1, -1, 1, 1, 1], 3)) # Output: 4
Let's trace through [1, 2, 3] with k=3:
Initial state:
prefix_sums = {0: 1}
current_sum = 0, count = 0
Processing 1:
current_sum = 1
target = 1 - 3 = -2 (not in map)
prefix_sums = {0: 1, 1: 1}
Processing 2:
current_sum = 3
target = 3 - 3 = 0 (found! count += 1)
count = 1 # subarray [1,2] sums to 3
prefix_sums = {0: 1, 1: 1, 3: 1}
Processing 3:
current_sum = 6
target = 6 - 3 = 3 (found! count += 1)
count = 2 # subarray [3] sums to 3
prefix_sums = {0: 1, 1: 1, 3: 1, 6: 1}
Final count: 2
๐ง Mnemonic: "Prefix sum - k = prefix we need." If you've seen that prefix before, you've found a valid subarray.
โ ๏ธ Common Mistake 3: Forgetting to initialize prefix_sums[0] = 1. This handles subarrays that start from index 0. Without it, you'll miss cases where the entire prefix equals k. โ ๏ธ
๐ก Real-World Example: This pattern appears in many forms: counting subarrays with sum divisible by k, finding subarrays with equal 0s and 1s, or calculating running averages. The prefix sum + hash map combo is incredibly versatile.
Complexity Analysis Summary
Let's consolidate our understanding of the time and space complexity patterns we've seen:
๐ Quick Reference Card: Problem Complexity Analysis
| Problem | Time Complexity | Space Complexity | Hash Structure | Key Insight |
|---|---|---|---|---|
| ๐ฏ Two Sum | O(n) | O(n) | Hash Map | Store complement |
| ๐ฏ Group Anagrams | O(nยทk log k) | O(nยทk) | Hash Map | Sorted key |
| ๐ฏ Longest Substring | O(n) | O(min(n,m)) | Hash Set | Sliding window |
| ๐ฏ Subarray Sum | O(n) | O(n) | Hash Map | Prefix sums |
Pattern Recognition Guide
Now that you've seen these solutions, let's identify when to apply each pattern:
Use hash map for lookup when:
- ๐ง You need to find pairs/complements (Two Sum pattern)
- ๐ง You need to group items by some property (Group Anagrams pattern)
- ๐ง You need to count occurrences or track frequencies (Subarray Sum pattern)
Use hash set for existence when:
- ๐ง You only care about presence/absence, not counts
- ๐ง You need to detect duplicates quickly (Longest Substring pattern)
- ๐ง You're implementing a sliding window with uniqueness constraints
Combine with sliding window when:
- ๐ง The problem involves contiguous subarrays/substrings
- ๐ง You can expand/contract a window based on constraints
- ๐ง You need to track "current valid state" efficiently
๐ก Remember: The hash map's O(1) lookup transforms many O(nยฒ) nested loop problems into elegant O(n) single-pass solutions. When you catch yourself writing nested loops over arrays, pause and ask: "Could a hash map remember what I'm looking for?"
Building Your Problem-Solving Intuition
These four problems represent foundational patterns you'll encounter repeatedly. Two Sum teaches the complement lookup technique. Group Anagrams demonstrates custom key design. Longest Substring shows hash structures working with sliding windows. Subarray Sum reveals the power of prefix sums with hash maps.
๐ฏ Key Principle: Master these patterns deeply rather than memorizing specific solutions. Interview problems are variations on these themes, and recognizing which pattern applies is half the battle.
As you practice more problems, you'll develop an intuition for when hash maps are the right tool. You'll start to see the "shape" of hash map problems: they often involve searching, counting, or grouping operations that benefit from constant-time lookups. The investment in understanding these four solutions will pay dividends across dozens of similar problems.
In the next section, we'll explore common pitfalls and best practices to help you avoid mistakes and write production-quality hash map code that performs well even under challenging conditions.
Common Pitfalls and Best Practices
Hash maps and sets are powerful tools, but with great power comes the potential for subtle bugs and performance issues. After reviewing thousands of LeetCode submissions, certain mistakes appear repeatedlyโeven among experienced developers. Understanding these pitfalls will help you write more robust, efficient code and avoid debugging headaches during interviews.
Pitfall 1: Choosing the Wrong Data Structure
One of the most fundamental mistakes is selecting the wrong data structure for the problem at hand. The choice between a hash map, hash set, and array significantly impacts both code clarity and performance.
โ ๏ธ Common Mistake 1: Using a hash map when a set suffices โ ๏ธ
Many developers default to hash maps even when they only need to track presence, not frequency or associations.
## โ Unnecessarily complex - wasting space
def has_duplicates(nums):
seen = {}
for num in nums:
if num in seen:
return True
seen[num] = True # The value serves no purpose
return False
## โ
Simpler and more memory efficient
def has_duplicates(nums):
seen = set()
for num in nums:
if num in seen:
return True
seen.add(num)
return False
## โ
Even better - Pythonic one-liner
def has_duplicates(nums):
return len(nums) != len(set(nums))
๐ฏ Key Principle: Use a set when you only care about membership ("have I seen this before?"). Use a map when you need to associate values ("how many times have I seen this?" or "what index did this appear at?").
โ ๏ธ Common Mistake 2: Using hash structures when order matters โ ๏ธ
Hash maps and sets don't preserve insertion order in all languages (though Python 3.7+ and modern JavaScript do). If you need to maintain or check order, consider alternatives.
// โ Wrong thinking: Hash set will preserve order for comparison
function isAnagram(s, t) {
// This accidentally works in modern JS, but relies on implementation detail
return Array.from(new Set(s)).join('') === Array.from(new Set(t)).join('');
}
// โ
Correct thinking: Use appropriate structure for the task
function isAnagram(s, t) {
if (s.length !== t.length) return false;
const freq = new Map();
for (let char of s) {
freq.set(char, (freq.get(char) || 0) + 1);
}
for (let char of t) {
if (!freq.has(char)) return false;
freq.set(char, freq.get(char) - 1);
if (freq.get(char) < 0) return false;
}
return true;
}
๐ก Pro Tip: When the input size is guaranteed to be small (like the 26 lowercase letters), a fixed-size array often outperforms hash structures:
## For problems with limited character sets
def character_frequency(s):
# Array approach - O(1) lookup, better cache locality
freq = [0] * 26
for char in s:
freq[ord(char) - ord('a')] += 1
return freq
๐ Quick Reference Card: Structure Selection Guide
| Need | Use | Example Problem |
|---|---|---|
| ๐ Check existence only | Set | Contains Duplicate |
| ๐ข Count frequencies | Map | Valid Anagram |
| ๐ฏ Store key-value pairs | Map | Two Sum |
| ๐ Maintain order | Array/List + Map | Top K Frequent |
| ๐ค Fixed small domain | Array (26 letters) | Ransom Note |
| โก Need sorted iteration | TreeMap/Sorted structures | Sorted frequency |
Pitfall 2: Hash Key Design Mistakes
Hash key design is critical for correctness. Poor key choices lead to bugs that can be maddeningly difficult to debug.
โ ๏ธ Common Mistake 3: Using mutable objects as keys โ ๏ธ
In languages like Python, using mutable objects (lists, sets) as dictionary keys will raise an error. In JavaScript, objects used as Map keys are compared by reference, not value.
## โ This raises TypeError: unhashable type: 'list'
tried = {}
key = [1, 2, 3]
tried[key] = True # Error!
## โ
Convert to immutable tuple
tried = {}
key = tuple([1, 2, 3])
tried[key] = True # Works!
## Real example: Group Anagrams
def group_anagrams(strs):
groups = {}
for s in strs:
# โ sorted(s) returns a list - can't be a key
# key = sorted(s) # This would fail!
# โ
Convert to tuple or string
key = tuple(sorted(s))
if key not in groups:
groups[key] = []
groups[key].append(s)
return list(groups.values())
โ ๏ธ Common Mistake 4: Incorrect composite key construction โ ๏ธ
When combining multiple values into a single key, naive string concatenation can cause collisions.
## โ Wrong: String concatenation can create collisions
def store_coordinate(x, y):
seen = {}
key = str(x) + str(y) # (1, 23) and (12, 3) both become "123"!
seen[key] = True
## โ
Better: Use tuple (automatically handles hashing)
def store_coordinate(x, y):
seen = {}
key = (x, y) # Tuples are compared element-wise
seen[key] = True
## โ
Alternative: Use delimiter if you must use strings
def store_coordinate(x, y):
seen = {}
key = f"{x},{y}" # Clear separation
seen[key] = True
๐ก Real-World Example: In the "Island Perimeter" or grid-based problems, you often need to track visited cells:
def island_perimeter(grid):
visited = set()
def dfs(row, col):
# Composite key for 2D coordinates
if (row, col) in visited: # Tuple key is clean and collision-free
return 0
visited.add((row, col))
# ... rest of DFS logic
๐ค Did you know? In some languages like Go, you can use structs as map keys as long as all fields are comparable. This makes composite keys very clean and type-safe.
Pitfall 3: Not Handling Edge Cases
Edge case handling separates working code from production-ready code. Hash structures introduce specific edge cases that are easy to overlook.
โ ๏ธ Common Mistake 5: Forgetting to check for key existence โ ๏ธ
// โ This will throw undefined errors or give wrong results
function twoSum(nums, target) {
const map = new Map();
for (let i = 0; i < nums.length; i++) {
const complement = target - nums[i];
// If complement doesn't exist, this returns undefined
return [map.get(complement), i]; // Bug: returns on first iteration!
}
}
// โ
Proper existence check
function twoSum(nums, target) {
const map = new Map();
for (let i = 0; i < nums.length; i++) {
const complement = target - nums[i];
if (map.has(complement)) { // Explicit check
return [map.get(complement), i];
}
map.set(nums[i], i);
}
return []; // Handle case when no solution exists
}
โ ๏ธ Common Mistake 6: Improper handling of duplicates โ ๏ธ
Duplicates in input data can break algorithms if not considered carefully.
## Problem: Find all unique pairs that sum to target
## โ This counts duplicates multiple times
def count_pairs_wrong(nums, target):
seen = set()
count = 0
for num in nums:
complement = target - num
if complement in seen:
count += 1 # If nums has [2,2,2] and target=4, overcounts!
seen.add(num)
return count
## โ
Track pairs to avoid double-counting
def count_pairs_correct(nums, target):
seen = set()
pairs = set()
for num in nums:
complement = target - num
if complement in seen:
# Store pair in sorted order to avoid (2,2) vs (2,2) duplicates
pairs.add(tuple(sorted([num, complement])))
seen.add(num)
return len(pairs)
โ ๏ธ Common Mistake 7: Empty input edge cases โ ๏ธ
## โ Crashes on empty input
def most_frequent(nums):
freq = {}
for num in nums:
freq[num] = freq.get(num, 0) + 1
return max(freq, key=freq.get) # max() on empty dict raises ValueError!
## โ
Handle empty input
def most_frequent(nums):
if not nums: # Guard clause
return None
freq = {}
for num in nums:
freq[num] = freq.get(num, 0) + 1
return max(freq, key=freq.get)
๐ก Remember: Always ask yourself during interviews: "What if the input is empty? What if all elements are the same? What if there are duplicates?"
Pitfall 4: Space Complexity Traps
Space complexity is often neglected in favor of time optimization, but hash structures can consume significant memory. Knowing when to avoid them is crucial.
โ ๏ธ Common Mistake 8: Using hash maps when space is constrained โ ๏ธ
## Problem: Find duplicate in array where nums[i] in range [1, n]
## โ O(n) space - works but uses extra memory
def find_duplicate_hash(nums):
seen = set()
for num in nums:
if num in seen:
return num
seen.add(num)
return -1
## โ
O(1) space - uses array indices as hash (only works with specific constraints)
def find_duplicate_optimal(nums):
# Use the array itself as a hash table
for num in nums:
idx = abs(num)
if nums[idx] < 0: # Already marked - it's a duplicate
return idx
nums[idx] = -nums[idx] # Mark as seen
return -1
๐ฏ Key Principle: Before reaching for a hash map, check if the problem constraints allow for a more space-efficient solution:
- Limited range (1 to n): Use array indices
- Sorted input: Two-pointer approach
- Binary properties: Bit manipulation
- Read-only requirement: Floyd's cycle detection
๐ก Pro Tip: Interview question: "Can you solve this with O(1) space?" often means there's a clever way to use the input itself as storage.
โ ๏ธ Common Mistake 9: Building unnecessary intermediate structures โ ๏ธ
## Problem: Check if string has all unique characters
## โ Builds full set even when early return is possible
def is_unique_wrong(s):
return len(s) == len(set(s)) # Creates entire set before comparing
## โ
Early termination saves space and time
def is_unique_better(s):
seen = set()
for char in s:
if char in seen:
return False # Exit immediately
seen.add(char)
return True
## โ
Best: No extra space if you can modify input or use ASCII constraint
def is_unique_optimal(s):
if len(s) > 128: # ASCII has only 128 chars
return False
char_set = [False] * 128
for char in s:
val = ord(char)
if char_set[val]:
return False
char_set[val] = True
return True
Pitfall 5: Language-Specific Gotchas
Each programming language has unique quirks in how it implements hash structures. Being aware of these can prevent subtle bugs.
JavaScript: Object vs Map
// โ ๏ธ Objects have prototype properties that can interfere
const count = {};
count['toString'] = 5; // Overwrites built-in method!
console.log(count.toString()); // "5" not "[object Object]"
// โ Object keys are always strings
const obj = {};
obj[1] = 'one';
obj['1'] = 'ONE';
console.log(obj[1]); // "ONE" - both keys are the same!
// โ
Map preserves key types and has no prototype pollution
const map = new Map();
map.set(1, 'one');
map.set('1', 'ONE');
console.log(map.get(1)); // "one" - different keys!
// โ
Safe iteration without prototype properties
for (const [key, value] of map) {
console.log(key, value); // Only your data, no inherited properties
}
๐ก Pro Tip: In JavaScript interviews, prefer Map over plain objects for hash tables unless you specifically need JSON compatibility.
Python: dict vs defaultdict vs Counter
from collections import defaultdict, Counter
## Standard dict - requires manual initialization
freq = {}
for char in "hello":
if char not in freq:
freq[char] = 0
freq[char] += 1
## defaultdict - automatic initialization
freq = defaultdict(int)
for char in "hello":
freq[char] += 1 # No need to check existence!
## Counter - specialized for frequency counting
freq = Counter("hello") # One-liner!
print(freq.most_common(2)) # [('l', 2), ('h', 1)] - sorted by frequency
## โ ๏ธ Gotcha: defaultdict returns default even for non-existent keys
freq = defaultdict(int)
print(freq['z']) # 0 - but 'z' is now in the dict!
print('z' in freq) # True - accessing created the key!
## โ
Use regular dict if you need to distinguish missing vs zero
freq = {}
print(freq.get('z', 0)) # 0 - but 'z' is NOT added to dict
print('z' in freq) # False
๐ง Mnemonic: "Default Dictionary Doesn't Distinguish" - accessing a defaultdict key creates it, even for reads.
Python: Dictionary Iteration Pitfalls
## โ ๏ธ Can't modify dict size during iteration
freq = {'a': 1, 'b': 2, 'c': 3}
for key in freq:
if freq[key] > 1:
del freq[key] # RuntimeError: dictionary changed size during iteration
## โ
Iterate over copy of keys
for key in list(freq.keys()):
if freq[key] > 1:
del freq[key] # Safe!
## โ
Or build new dictionary
freq = {k: v for k, v in freq.items() if v <= 1}
Best Practices Checklist
Before submitting your solution, run through this mental checklist:
๐ง Data Structure Selection
- Am I tracking existence only? โ Use Set
- Do I need key-value associations? โ Use Map
- Is the domain small and fixed? โ Consider Array
- Do I need ordering? โ Consider TreeMap or sorted structures
๐ง Key Design
- Are my keys immutable?
- If using composite keys, am I avoiding collisions?
- Am I using tuples for multi-dimensional keys?
๐ง Edge Cases
- What if input is empty?
- What if all elements are identical?
- What if there are duplicates?
- What if the key doesn't exist?
- What about negative numbers or zero?
๐ง Space Optimization
- Can I use the input array itself as storage?
- Can I solve this with two pointers instead?
- Am I building unnecessary intermediate structures?
- Can I use early termination to save space?
๐ง Language-Specific
- JavaScript: Am I using Map instead of Object?
- Python: Should I use defaultdict or Counter?
- Am I aware of iteration modification rules?
Performance Monitoring Pattern
When optimizing, be mindful of the actual performance characteristics rather than just Big-O notation:
## For small inputs (n < 100), simple array might beat hash map
def two_sum_small(nums, target):
# O(nยฒ) but fast for small n due to cache locality
for i in range(len(nums)):
for j in range(i + 1, len(nums)):
if nums[i] + nums[j] == target:
return [i, j]
return []
## For large inputs, hash map wins
def two_sum_large(nums, target):
# O(n) with hash map overhead
seen = {}
for i, num in enumerate(nums):
if target - num in seen:
return [seen[target - num], i]
seen[num] = i
return []
๐ก Real-World Example: In production systems, consider the typical input size. If you're processing user input that's rarely over 50 elements, a simple linear scan might outperform a hash map due to better cache performance and no hash function overhead.
The Golden Rule
โ Correct thinking: "Let me understand the problem constraints, consider the trade-offs, and choose the simplest solution that meets the requirements."
โ Wrong thinking: "Hash maps are always faster, so I'll use them everywhere."
Hash maps are powerful tools, but they're not universal solutions. The best code is correct first, clear second, and clever third. A simple, readable solution that handles edge cases will always beat a complex, optimized solution that fails on unexpected input.
๐ฏ Key Principle: KISS (Keep It Simple, Structured) - Start with the simplest approach that works, then optimize only if needed. During interviews, clearly communicate your thought process: "I'm choosing a hash map here because we need O(1) lookup and the space trade-off is acceptable given that n โค 10โด."
Mastering these pitfalls and best practices will make you not just a better LeetCode solver, but a better engineer who writes production-ready code that handles the messy reality of real-world data.
Summary and Quick Reference Guide
Congratulations! You've journeyed through the essential world of hash maps and sets, transforming from someone who might have written nested loops for every problem to a developer who can recognize hash-based optimization opportunities instantly. Before this lesson, you might have struggled to identify when a hash map could reduce O(nยฒ) to O(n) time complexity, or fumbled with the subtle differences between when to use a hash map versus a hash set. Now you have a comprehensive mental framework for tackling array and string problems efficiently.
This reference guide consolidates everything you've learned into quick-access tools you can use during coding interviews and real-world development. Let's create your go-to resource for hash-based problem solving.
๐ฏ Decision Flowchart: Choosing the Right Data Structure
Knowing when to use hash maps versus other structures is often more valuable than knowing how to use them. Here's your decision tree:
START: Analyzing the Problem
|
v
Do I need to look up values
by a key/identifier quickly?
/ \
NO YES
/ \
v v
Consider: Do I need to store
โข Array associated values?
โข Stack / \
โข Queue NO YES
/ \
v v
HASH SET Do I need ordering?
Use when: / \
โข Checking NO YES
existence / \
โข Removing v v
duplicates HASH MAP Consider:
โข Finding Use when: โข TreeMap/SortedDict
unique โข Counting โข OrderedDict
elements โข Mapping โข Heap + HashMap
โข Caching
โข Indexing
๐ฏ Key Principle: If you find yourself writing for i in range(len(arr)): for j in range(len(arr)): to search for something, stop and ask: "Could a hash map eliminate this inner loop?"
๐ก Pro Tip: In 80% of LeetCode array/string problems where the naive solution is O(nยฒ), a hash map can reduce it to O(n). Always consider hash-based solutions first for these problems.
๐ Complexity Reference Table
Here's your comprehensive cheat sheet for time and space complexity of hash-based operations:
| Operation | Hash Map | Hash Set | Array | Notes |
|---|---|---|---|---|
| ๐ Lookup/Search | O(1) avg, O(n) worst | O(1) avg, O(n) worst | O(n) | Hash structures excel here |
| โ Insert | O(1) avg, O(n) worst | O(1) avg, O(n) worst | O(1) end, O(n) middle | O(n) worst case during resize |
| ๐๏ธ Delete | O(1) avg, O(n) worst | O(1) avg, O(n) worst | O(n) | Deletion by value in array is O(n) |
| ๐ Iteration | O(n) | O(n) | O(n) | No ordering guarantee in hash |
| ๐พ Space | O(n) | O(n) | O(n) | Hash has overhead for buckets |
| โ Check Existence | O(1) avg | O(1) avg | O(n) | Primary use case for hash set |
| ๐ Get Min/Max | O(n) | O(n) | O(n) unsorted | Use heap/tree for O(log n) |
| ๐ฏ Ordered Iteration | โ Not guaranteed | โ Not guaranteed | โ Yes | Use TreeMap or sorted list |
โ ๏ธ Remember: The O(n) worst case occurs with many hash collisions (rare with good hash functions) or during table resizing. In practice, assume O(1) for interview discussions unless asked about edge cases.
๐ค Did you know? Python dictionaries (hash maps) maintain insertion order as of Python 3.7, but this is an implementation detail, not a fundamental property of hash maps. Don't rely on it for algorithmic correctness!
๐ง Pattern Recognition Checklist
Use this checklist to instantly identify when a problem calls for hash maps or sets:
โ Hash Map Indicators:
Problem mentions or requires:
- ๐ฏ "Count occurrences/frequency" โ Use
Counterordefaultdict(int) - ๐ฏ "Find pairs that sum to target" โ Store complements:
target - num - ๐ฏ "Group by property" โ Use property as key, list as value
- ๐ฏ "Cache/memoize results" โ Store computed values by input
- ๐ฏ "Map indices to values" โ Create valueโindex lookup
- ๐ฏ "Find first unique/non-repeating" โ Count then scan
- ๐ฏ "Anagram grouping" โ Sort string as key
โ Hash Set Indicators:
Problem mentions or requires:
- ๐ "Check if exists" โ O(1) membership testing
- ๐ "Remove duplicates" โ Set automatically handles this
- ๐ "Find unique elements" โ Set operations (union, intersection)
- ๐ "Detect cycles/visited nodes" โ Track seen elements
- ๐ "Difference between arrays" โ Set subtraction
- ๐ "Common elements" โ Set intersection
โ NOT a Hash Map Problem:
When you see:
- ๐ "Find kth smallest/largest" โ Use heap or quickselect
- ๐ "Maintain sorted order" โ Use tree-based structures
- ๐ "Range queries" โ Use segment tree or prefix sums
- ๐ "Stack operations (LIFO)" โ Use actual stack
- ๐ "Queue operations (FIFO)" โ Use actual queue
- ๐ "Sliding window with order" โ Use deque
๐ก Mental Model: If the problem involves finding, counting, grouping, or checking existence โ think hash first. If it involves ordering, ranking, or LIFO/FIFO โ think other structures.
๐ป Code Template Library
Here are battle-tested templates you can adapt for interviews. Commit these patterns to muscle memory:
Template 1: Frequency Counter Pattern
from collections import defaultdict, Counter
def frequency_counter_template(arr):
"""
Use when: counting occurrences, finding duplicates,
checking character/element frequency
Time: O(n), Space: O(k) where k = unique elements
"""
# Method 1: Using Counter (most concise)
freq = Counter(arr)
# Method 2: Using defaultdict (more control)
freq = defaultdict(int)
for item in arr:
freq[item] += 1
# Method 3: Manual dict (interview-safe)
freq = {}
for item in arr:
freq[item] = freq.get(item, 0) + 1
# Common operations:
most_common = freq.most_common(1) # Returns [(item, count)]
unique_items = len(freq) # Number of unique elements
items_appearing_once = [k for k, v in freq.items() if v == 1]
return freq
## Example usage: Find first non-repeating character
def first_unique_char(s):
freq = Counter(s)
for i, char in enumerate(s):
if freq[char] == 1:
return i
return -1
Template 2: Complement/Two-Pointer Alternative Pattern
def two_sum_template(nums, target):
"""
Use when: finding pairs, checking complements,
avoiding nested loops
Time: O(n), Space: O(n)
Pattern: Store what you need, check if current satisfies it
"""
seen = {} # Maps value โ index (or just use set if index not needed)
for i, num in enumerate(nums):
complement = target - num
# Check if we've seen the complement
if complement in seen:
return [seen[complement], i] # Found the pair!
# Store current number for future complements
seen[num] = i
return [] # No pair found
## Variation: Check if pair exists (return boolean)
def has_pair_with_sum(nums, target):
seen = set()
for num in nums:
if target - num in seen:
return True
seen.add(num)
return False
Template 3: Grouping/Categorization Pattern
from collections import defaultdict
def grouping_template(items, key_function):
"""
Use when: grouping anagrams, grouping by property,
categorizing elements
Time: O(n * k) where k = cost of key_function
Space: O(n)
"""
groups = defaultdict(list)
for item in items:
# Generate key based on grouping criteria
key = key_function(item)
groups[key].append(item)
return groups
## Example: Group anagrams
def group_anagrams(strs):
groups = defaultdict(list)
for s in strs:
# Sorted string as key (anagrams have same sorted form)
key = ''.join(sorted(s))
groups[key].append(s)
return list(groups.values())
## Example: Group by first letter
def group_by_first_letter(words):
groups = defaultdict(list)
for word in words:
key = word[0].lower() if word else ''
groups[key].append(word)
return dict(groups)
๐ก Pro Tip: During interviews, verbalize which template you're using: "This is a classic complement pattern, so I'll use a hash map to store elements I've seen and check for target minus current." This demonstrates pattern recognition.
๐ง Quick Implementation Decision Guide
Python developers:
| Need | Use This | Example |
|---|---|---|
| ๐ข Simple counting | Counter(iterable) |
Counter([1,2,2,3]) โ {1:1, 2:2, 3:1} |
| ๐ฏ Default values | defaultdict(type) |
defaultdict(list) for grouping |
| ๐ Existence checking | set() |
if item in my_set: |
| ๐ Ordered dict | dict() (3.7+) |
Maintains insertion order |
| ๐ฒ Remove duplicates preserving order | dict.fromkeys(list) |
list(dict.fromkeys([1,2,1,3])) |
JavaScript developers:
| Need | Use This | Example |
|---|---|---|
| ๐ข Counting | Map() or object |
map.get(key) || 0 |
| ๐ Existence | Set() |
set.has(item) |
| ๐ Ordered keys | Map() |
Preserves insertion order |
| ๐ฏ String keys only | Plain object {} |
Slightly faster for strings |
Java developers:
| Need | Use This | Example |
|---|---|---|
| ๐ข Counting | HashMap<K,Integer> |
map.getOrDefault(key, 0) |
| ๐ Existence | HashSet<T> |
set.contains(item) |
| ๐ Ordered iteration | LinkedHashMap<K,V> |
Insertion order |
| ๐ Sorted keys | TreeMap<K,V> |
O(log n) operations |
โก Performance Optimization Quick Reference
When your hash map solution is too slow:
๐ Check your key type:
- โ Use immutable types (strings, tuples, numbers)
- โ Avoid mutable types (lists, sets) as keys
- ๐ก Convert
listโtupleif you must use as key
๐ฏ Minimize hash function calls:
# โ Slow: computes hash twice if key in my_dict: value = my_dict[key] # โ Fast: computes hash once value = my_dict.get(key) # OR try: value = my_dict[key] except KeyError: pass๐พ Consider space-time tradeoffs:
- If space is limited, consider two-pointer or sorting (O(n log n)) instead
- Hash maps trade space for time: worth it for O(nยฒ) โ O(n) improvements
๐ง Use appropriate size initialization:
# If you know size in advance, pre-allocate (Python dict) my_dict = dict.fromkeys(range(1000)) # Pre-sized
๐ What You Now Know
Let's recap your transformation:
โ Before this lesson, you might have:
- Written nested loops for every search problem
- Confused when to use hash maps versus hash sets
- Struggled with time complexity analysis
- Missed obvious hash-based optimizations in interviews
- Not recognized common patterns like complement checking
โ After this lesson, you can:
- Recognize hash map opportunities instantly using the pattern checklist
- Choose between hash map, hash set, and other structures confidently
- Implement frequency counting, complement finding, and grouping patterns from memory
- Explain time/space complexity tradeoffs articulately
- Avoid common pitfalls like using mutable keys or unnecessary hash recomputation
- Reduce O(nยฒ) solutions to O(n) by eliminating inner loops
| Concept | Before ๐ฐ | After ๐ฏ |
|---|---|---|
| Two Sum problem | Nested loops O(nยฒ) | Hash map O(n) |
| Finding duplicates | Sort then compare O(n log n) | Hash set O(n) |
| Anagram grouping | Compare all pairs O(nยฒk) | Hash map with sorted keys O(nk log k) |
| First unique character | Count array + rescan O(n) | Counter + single pass O(n) |
| Subarray sum equals k | Brute force O(nยฒ) | Prefix sum + hash map O(n) |
โ ๏ธ Critical Final Points:
- Hash maps don't maintain order (except Python 3.7+ dicts as implementation detail)
- Average O(1) assumes good hash distribution - worst case is O(n)
- Space complexity matters - hash maps use O(n) extra space
- Not all problems need optimization - for small inputs, simple solutions are fine
๐ Next Steps: Advanced Techniques
You've mastered the fundamentals. Here's where to go next:
๐ฅ Advanced Hash-Based Techniques:
LRU Cache Implementation
- Combines hash map + doubly linked list
- O(1) get and put operations
- Essential for system design interviews
- Next practice: LeetCode #146 - LRU Cache
Rolling Hash for String Matching
- Rabin-Karp algorithm
- Efficient substring search
- Pattern: maintain hash of sliding window
- Next practice: LeetCode #28 - Find Index of First Occurrence
Prefix Sum + Hash Map
- Subarray sum problems
- Count subarrays with property
- Next practice: LeetCode #560 - Subarray Sum Equals K
Union-Find with Hash Map
- Connected components in graphs
- Group membership tracking
- Next practice: LeetCode #547 - Number of Provinces
๐ Related Topics to Explore:
- Two Pointers: Alternative to hash maps for sorted arrays
- Sliding Window: For contiguous subarray problems
- Trie (Prefix Tree): For advanced string matching beyond hash maps
- Bloom Filters: Space-efficient probabilistic hash sets
- Consistent Hashing: Distributed systems application
๐ฏ Practical Applications:
Web Development:
- Session management (hash map: sessionId โ userData)
- Caching (memoization of API responses)
- Rate limiting (hash map: userId โ requestCount)
Data Engineering:
- Deduplication pipelines (hash sets for seen records)
- Join operations (hash joins in databases)
- Aggregation queries (grouping with hash maps)
System Design:
- Load balancer routing (consistent hashing)
- Database indexing (hash indexes for equality lookups)
- In-memory caches (Redis, Memcached)
๐ก Real-World Example: At companies like Netflix, hash maps power the recommendation engine's collaborative filtering, mapping users to their viewing history and quickly finding similar users for recommendationsโall in O(1) average time per lookup.
๐ฏ Your Interview Game Plan
When you see a problem in an interview:
- โฑ๏ธ First 30 seconds: Listen for keywords ("count," "find pair," "duplicate," "group")
- ๐ง Next 1 minute: Think "Can I eliminate a loop with a hash map?"
- ๐ Next 2 minutes: Choose template (frequency/complement/grouping)
- ๐ป Next 10-15 minutes: Code while explaining your approach
- ๐ Final 5 minutes: Test with edge cases (empty input, duplicates, no solution)
๐ง Mnemonic for hash map mastery: "FETCH"
- Frequency counting
- Existence checking
- Two-sum / complement patterns
- Caching / memoization
- Hashable grouping by property
You now have everything you need to confidently tackle hash map and set problems. The patterns you've learned appear in approximately 30-40% of LeetCode medium problems and form the foundation for more advanced techniques.
Keep practicing, and remember: The best developers don't memorize solutionsโthey recognize patterns. You've built that pattern recognition muscle. Now go apply it! ๐
๐ One-Page Cheat Sheet
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ HASH MAP & SET QUICK REFERENCE โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฃ
โ WHEN TO USE โ
โ โ Need O(1) lookup โ Counting/frequency โ
โ โ Check existence โ Finding pairs/complements โ
โ โ Remove duplicates โ Grouping by property โ
โ โ
โ COMPLEXITY (Average Case) โ
โ โข Lookup: O(1) โข Space: O(n) โ
โ โข Insert: O(1) โข Delete: O(1) โ
โ โ
โ PYTHON QUICK PICKS โ
โ Counter(list) โ frequency counting โ
โ defaultdict(list) โ grouping โ
โ set() โ existence checks โ
โ dict.fromkeys() โ deduplication with order โ
โ โ
โ COMMON PATTERNS โ
โ 1. Complement: seen[target-num] exists? โ
โ 2. Frequency: Counter(arr)[item] โ
โ 3. Grouping: groups[key].append(item) โ
โ 4. Dedup: set(arr) or dict.fromkeys(arr) โ
โ โ
โ PITFALLS TO AVOID โ
โ โ Mutable keys (lists) โ Assuming order โ
โ โ Forgetting space cost โ Redundant hash calls โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Print this, keep it by your desk, and refer to it before every practice session. Within weeks, you won't need it anymoreโthe patterns will be second nature. Good luck! ๐ฏ