Heaps & Priority Queues

Q: Complete the heap implementation for bubble-up: ```python def _bubble_up(self, index): while index > 0: parent_index = (index - 1) // 2 if self.heap[index] >= self.heap[parent_index]: break self.heap[index], self.heap[{{1}}] = \ self.heap[{{2}}], self.heap[index] index = parent_index ```

["parent_index","parent_index"]

Manage dynamic datasets requiring constant access to extreme values

Introduction to Heaps & Priority Queues

Imagine you're building a hospital emergency room system. Patients arrive constantly, but you can't treat them in first-come, first-serve order—someone with a heart attack needs immediate attention over someone with a minor cut, even if they arrived later. How do you efficiently manage this queue where priority matters more than arrival time? This is exactly the problem that heaps and priority queues solve, and they're fundamental to countless real-world systems from operating system schedulers to Google's search algorithms. In this lesson, we'll explore these powerful data structures with free flashcards to help cement your understanding, and you'll discover why they're essential tools in your LeetCode arsenal.

You've likely encountered situations where you need to repeatedly find the "most important" item from a collection—whether that's the smallest number, the largest value, or the highest-priority task. The naive approach of sorting the entire collection every time is wasteful. What if there was a data structure that could give you the most important element in constant time and add or remove elements in logarithmic time? That's the promise of heaps, and understanding them will unlock an entire category of optimization problems that would otherwise seem insurmountable.

What Exactly Is a Heap?

A heap is a specialized tree-based data structure that satisfies the heap property. Before we dive deeper, let's be crystal clear about what this means. A heap is always a complete binary tree—a binary tree where every level is completely filled except possibly the last level, which is filled from left to right. This structural requirement is crucial because it allows us to efficiently represent the tree using a simple array.

The heap property comes in two flavors:

🎯 Key Principle: In a min-heap, every parent node is smaller than or equal to its children. The smallest element is always at the root. In a max-heap, every parent node is larger than or equal to its children. The largest element is always at the root.

Let's visualize this with a simple example:

Min-Heap Example:              Max-Heap Example:
       1                              100
      / \                            /   \
     3   2                          50    80
    / \   \                        /  \   /
   7   5   9                      10  25 60

Notice how in the min-heap, each parent is smaller than its children (1 < 3 and 1 < 2, 3 < 7 and 3 < 5). In the max-heap, each parent is larger than its children. This property must hold for every parent-child relationship in the tree.

💡 Mental Model: Think of a min-heap like a corporate hierarchy where the most junior person (smallest number) is at the top, and each manager only knows they have more experience than their direct reports. They don't need to know about everyone in the company—just their immediate team.

🤔 Did you know? The heap data structure has nothing to do with "heap memory" in programming. They just share the same English word. Heap memory refers to dynamically allocated memory, while heap data structures are about maintaining sorted properties efficiently.

Priority Queues: The Abstract Interface

While a heap is a concrete data structure with a specific tree-based implementation, a priority queue is an abstract data type (ADT)—it's an interface that defines what operations should be available, not how to implement them. Think of it like this: a priority queue is the "what" and a heap is the "how."

A priority queue supports these essential operations:

🔧 insert(element, priority): Add an element with a given priority 🔧 extractMin() or extractMax(): Remove and return the highest-priority element 🔧 peek(): View the highest-priority element without removing it 🔧 decreaseKey(): Increase an element's priority (decrease its key in a min-heap)

You could implement a priority queue using a simple array or linked list, but the performance would be terrible. Here's why heaps are the gold standard:

📋 Quick Reference Card: Time Complexity Comparison

Operation	🐌 Unsorted Array	🏃 Sorted Array	🚀 Binary Heap
Insert	O(1)	O(n)	O(log n)
Extract Min/Max	O(n)	O(1)	O(log n)
Peek	O(n)	O(1)	O(1)
Search	O(n)	O(log n)	O(n)

The heap strikes the perfect balance for priority queue operations. We get logarithmic time for insertions and deletions—which is nearly as good as constant time for reasonably sized datasets—and constant time for peeking at the top element.

Real-World Applications: Where Heaps Shine

Understanding the theory is one thing, but seeing where these structures actually matter in real systems will help you recognize when to reach for them on LeetCode. Let's explore some compelling applications:

1. Operating System Task Scheduling

💡 Real-World Example: Your computer's operating system manages hundreds of processes, each with different priorities. System-critical processes need CPU time before background tasks. The OS uses a priority queue (typically implemented with heaps) to decide which process runs next. When a process completes or a higher-priority task arrives, the scheduler uses extractMax() to get the next most important task in just O(log n) time.

2. Dijkstra's Shortest Path Algorithm

If you've studied graph algorithms, you've encountered Dijkstra's algorithm for finding the shortest path between nodes. The algorithm repeatedly needs to select the unvisited node with the smallest distance—a perfect use case for a min-heap! The heap stores nodes with their current shortest distance as the priority.

import heapq

def dijkstra(graph, start):
    """Find shortest paths from start to all other nodes."""
    # Priority queue stores (distance, node) tuples
    pq = [(0, start)]  # (distance, node)
    distances = {start: 0}
    visited = set()
    
    while pq:
        # Extract minimum distance node - O(log n)
        current_dist, current_node = heapq.heappop(pq)
        
        if current_node in visited:
            continue
        visited.add(current_node)
        
        # Check all neighbors
        for neighbor, weight in graph[current_node]:
            distance = current_dist + weight
            
            # If we found a shorter path, update and insert
            if neighbor not in distances or distance < distances[neighbor]:
                distances[neighbor] = distance
                heapq.heappush(pq, (distance, neighbor))  # O(log n)
    
    return distances

Without a heap, we'd need O(n) time to find the minimum distance node each iteration, making the algorithm much slower.

3. Finding K-th Largest/Smallest Elements

One of the most common LeetCode patterns involves finding the k-th largest element, top k elements, or median of a stream. These problems scream for heap solutions:

def findKthLargest(nums, k):
    """
    Find the k-th largest element in an array.
    Solution: Use a min-heap of size k.
    """
    import heapq
    
    # Maintain a min-heap of the k largest elements seen so far
    heap = []
    
    for num in nums:
        heapq.heappush(heap, num)  # Add element
        
        # If heap exceeds size k, remove the smallest
        if len(heap) > k:
            heapq.heappop(heap)
    
    # The root of the min-heap is the k-th largest element
    return heap[0]

## Example: [3, 2, 1, 5, 6, 4], k=2
## Result: 5 (the 2nd largest element)

💡 Pro Tip: For finding the k-th largest, use a min-heap of size k. For finding the k-th smallest, use a max-heap of size k. This seems counterintuitive at first, but think about it: in a min-heap of k elements, the smallest element (root) is exactly the k-th largest overall, because we've kept the k largest elements and discarded everything smaller.

4. Merge K Sorted Lists

Imagine you're merging search results from k different databases, each returning results in sorted order. You need to merge them efficiently into a single sorted sequence:

import heapq
from typing import List, Optional

class ListNode:
    def __init__(self, val=0, next=None):
        self.val = val
        self.next = next
    
    # Make ListNode comparable for heap operations
    def __lt__(self, other):
        return self.val < other.val

def mergeKLists(lists: List[Optional[ListNode]]) -> Optional[ListNode]:
    """
    Merge k sorted linked lists into one sorted list.
    Time: O(N log k) where N is total nodes, k is number of lists
    """
    heap = []
    
    # Initialize heap with the first node from each list - O(k log k)
    for i, node in enumerate(lists):
        if node:
            heapq.heappush(heap, (node.val, i, node))
    
    dummy = ListNode(0)
    current = dummy
    
    # While heap has elements
    while heap:
        # Extract minimum - O(log k)
        val, i, node = heapq.heappop(heap)
        
        # Add to result
        current.next = node
        current = current.next
        
        # If this list has more elements, add the next one - O(log k)
        if node.next:
            heapq.heappush(heap, (node.next.val, i, node.next))
    
    return dummy.next

The heap keeps track of the smallest unprocessed element from each list. At each step, we extract the minimum and add the next element from that same list. Beautiful!

When to Recognize Heap Problems on LeetCode

Developing pattern recognition is crucial for tackling LeetCode efficiently. Here are the telltale signs that a problem might benefit from a heap:

✅ Correct thinking: "The problem asks for the k-th largest/smallest element" → Use a heap of size k

✅ Correct thinking: "I need to repeatedly access the minimum or maximum element" → Use a heap for O(log n) updates

✅ Correct thinking: "The problem involves merging sorted sequences" → Use a heap to track current minimums

✅ Correct thinking: "I need to process elements in priority order, but they arrive dynamically" → Priority queue with heap

❌ Wrong thinking: "I need to find all elements greater than x" → Heap doesn't help with range queries

❌ Wrong thinking: "I need to find if an element exists" → Heap search is O(n), use a hash set instead

❌ Wrong thinking: "I need elements in sorted order" → If you need complete sorted order and no dynamic updates, just sort the array

🎯 Key Principle: Heaps excel when you need partial ordering (just the top k elements or just the min/max) with dynamic updates (elements being added/removed). They're not for complete sorting or membership queries.

Here's a quick decision tree:

Do you need k largest/smallest elements (k < n)?
└─ YES → Consider a heap of size k
   └─ NO ↓

Do you need to repeatedly extract min/max with new elements arriving?
└─ YES → Consider a priority queue (heap)
   └─ NO ↓

Are you merging k sorted sequences?
└─ YES → Consider a heap of size k
   └─ NO ↓

Do you need complete sorted order once?
└─ YES → Just use sorting O(n log n)

Understanding the Logarithmic Advantage

Why is O(log n) such a big deal? Let's put this in perspective with concrete numbers:

📋 Quick Reference Card: Logarithmic Scaling

📊 Dataset Size (n)	🐌 O(n) Operations	🚀 O(log n) Operations	⚡ Speedup
1,000	1,000	~10	100x faster
1,000,000	1,000,000	~20	50,000x faster
1,000,000,000	1,000,000,000	~30	33M x faster

Logarithmic time complexity means that doubling your dataset size only adds one more operation. This is nearly as good as constant time for practical purposes. When you're doing millions of insertions and extractions, the difference between O(n) and O(log n) per operation is the difference between your code running instantly versus timing out.

💡 Remember: The logarithm base doesn't matter for Big-O notation because different bases only differ by a constant factor. A binary heap has O(log₂ n) height, but we just write O(log n).

The Array Representation: Why Heaps Are Memory Efficient

One of the most elegant aspects of heaps is how they're actually stored in memory. Despite being conceptually a tree, heaps are implemented using a simple array with no need for node objects or pointers. This makes them incredibly memory-efficient compared to other tree structures.

The mapping is beautifully simple for a node at index i:

🧠 Key Relationships:

🔼 Parent: index (i - 1) // 2
🔽 Left child: index 2 * i + 1
🔽 Right child: index 2 * i + 2

Array:  [1, 3, 2, 7, 5, 9]
Index:   0  1  2  3  4  5

Visualizes as:
       1 (index 0)
      / \
     3   2 (indices 1, 2)
    / \   \
   7   5   9 (indices 3, 4, 5)

🧠 Mnemonic: "PILlow - Parent, then I-left, then L-right" - the parent comes before both children in the array, left child before right.

This array representation is why heaps are cache-friendly and fast in practice. No pointer chasing, no memory fragmentation—just sequential array access.

Building Intuition: The Mental Model

⚠️ Common Mistake 1: Thinking heaps are fully sorted ⚠️

Many beginners assume that if a heap maintains the minimum element at the root, the entire structure must be sorted. This is false! Only the heap property is maintained: each parent is smaller (or larger) than its children. Siblings have no ordering relationship.

Consider this valid min-heap:

Notice that 5 (left child of root) is actually larger than 3 (right child of root). That's perfectly fine! The heap property only governs parent-child relationships, not siblings.

💡 Mental Model: A heap is like a tournament bracket where winners (smaller/larger) always beat their direct opponents and move up, but you can't compare teams from different parts of the bracket without looking at their paths.

⚠️ Common Mistake 2: Using heaps for searching ⚠️

Heaps are not efficient for searching for arbitrary elements. Finding whether an element exists requires O(n) time in the worst case because the partial ordering doesn't help narrow down the search. If you need fast lookup, use a hash set or binary search tree instead.

✅ Correct thinking: "I need to quickly find and remove the minimum element 1000 times" → Heap (O(log n) per operation)

❌ Wrong thinking: "I need to check if element x exists in my collection 1000 times" → Don't use heap (O(n) per check), use HashSet (O(1))

Preparing for the Journey Ahead

Now that you understand what heaps and priority queues are and why they matter, you're ready to dive into the mechanics. In the next sections, we'll explore:

🔧 How to implement heap operations from scratch (heapify, push, pop) 🔧 The most common LeetCode patterns with detailed code solutions 🔧 Advanced techniques like custom comparators and lazy deletion 🔧 How to avoid the pitfalls that trip up even experienced developers

The beauty of heaps is that once you understand the core concepts, a whole category of seemingly difficult problems becomes straightforward. That "find median from data stream" problem that looked impossible? You'll be able to solve it with two heaps. That "meeting rooms II" problem about scheduling? A priority queue makes it elegant.

🎯 Key Principle: Heaps are about maintaining just enough order to efficiently answer "what's the most important element?" repeatedly. They don't waste time sorting everything when you only care about the extremes.

As you continue through this lesson, remember that pattern recognition is your most valuable skill. When you see words like "k-th largest," "top k," "median," "merge sorted," or "minimum cost," your heap-senses should tingle. With the foundation we've built here, you're now equipped to recognize these opportunities and transform complex problems into elegant, efficient solutions.

Let's move forward to understand exactly how these magical data structures work under the hood!

Heap Fundamentals and Implementation

Now that we understand what heaps are and why they matter, let's dive deep into how they actually work. Understanding the mechanics of heaps will transform them from a mysterious "magic" data structure into a tool you can confidently implement and modify for any problem.

The Array Representation: Elegant Simplicity

One of the most beautiful aspects of heaps is how they leverage a simple array to represent a complete binary tree. Unlike traditional tree structures that require node objects with left and right pointers, a heap stores its elements in a contiguous array where the tree structure is implicit in the array indices themselves.

🎯 Key Principle: In a heap's array representation, the parent-child relationships are defined purely by mathematical formulas, not pointers.

Here's how the magic works. If we use 0-based indexing (common in most modern languages):

Parent of node at index i: (i - 1) // 2
Left child of node at index i: 2 * i + 1
Right child of node at index i: 2 * i + 2

If you're working with 1-based indexing (sometimes cleaner for heap logic):

Parent of node at index i: i // 2
Left child of node at index i: 2 * i
Right child of node at index i: 2 * i + 1

💡 Mental Model: Think of it like a family tree where every generation is perfectly packed from left to right. The array index tells you exactly where to find any node's relatives.

Let's visualize a min-heap with the array [3, 5, 8, 9, 6, 12, 15]:

         3           Array: [3, 5, 8, 9, 6, 12, 15]
       /   \         Index:  0  1  2  3  4   5   6
      5     8
     / \   / \
    9   6 12  15

Index 0 (value 3): left child at 1 (value 5), right child at 2 (value 8)
Index 1 (value 5): parent at 0 (value 3), left child at 3 (value 9), right child at 4 (value 6)
Index 2 (value 8): parent at 0 (value 3), left child at 5 (value 12), right child at 6 (value 15)

⚠️ Common Mistake: Forgetting to check bounds when accessing children! Always verify 2*i + 1 < heap_size before accessing a child node. ⚠️

The Two Fundamental Operations: Bubble-Up and Bubble-Down

Every heap operation—insertion, deletion, heap construction—relies on two core mechanisms that maintain the heap property: bubble-up (also called sift-up or heapify-up) and bubble-down (also called sift-down or heapify-down).

Bubble-Up: Restoring Order from Below

Bubble-up is used when we add a new element to the heap or when a node's value decreases (in a min-heap) or increases (in a max-heap). The operation compares a node with its parent and swaps them if they violate the heap property, then repeats this process moving upward through the tree.

Here's a detailed implementation for a min-heap:

class MinHeap:
    def __init__(self):
        self.heap = []
    
    def _bubble_up(self, index):
        """
        Move element at index up until heap property is satisfied.
        Time Complexity: O(log n) - at most the height of the tree
        """
        while index > 0:
            parent_index = (index - 1) // 2
            
            # If heap property is satisfied, we're done
            if self.heap[index] >= self.heap[parent_index]:
                break
            
            # Swap with parent and continue
            self.heap[index], self.heap[parent_index] = \
                self.heap[parent_index], self.heap[index]
            index = parent_index
    
    def insert(self, value):
        """
        Add a new element to the heap.
        1. Append to end of array (maintains complete tree property)
        2. Bubble up to restore heap property
        """
        self.heap.append(value)
        self._bubble_up(len(self.heap) - 1)

Let's trace through inserting 4 into our heap [3, 5, 8, 9, 6, 12, 15]:

Step 1: Append 4 to end          Step 2: Compare with parent (8)
         3                                3
       /   \                            /   \
      5     8                          5     4  ← swapped!
     / \   / \                        / \   / \
    9   6 12  15-4                   9   6 12  15-8

Step 3: Compare with parent (3)
Heap property satisfied! 4 > 3, so stop.
Final: [3, 5, 4, 9, 6, 12, 15, 8]

Bubble-Down: Restoring Order from Above

Bubble-down is used when we remove the root (extract-min or extract-max) or when a node's value increases (in a min-heap) or decreases (in a max-heap). The operation compares a node with its children and swaps with the smaller child (in a min-heap), repeating this process moving downward.

def _bubble_down(self, index):
    """
    Move element at index down until heap property is satisfied.
    Time Complexity: O(log n) - at most the height of the tree
    """
    n = len(self.heap)
    
    while True:
        smallest = index
        left = 2 * index + 1
        right = 2 * index + 2
        
        # Check if left child exists and is smaller
        if left < n and self.heap[left] < self.heap[smallest]:
            smallest = left
        
        # Check if right child exists and is smaller
        if right < n and self.heap[right] < self.heap[smallest]:
            smallest = right
        
        # If current node is smallest, we're done
        if smallest == index:
            break
        
        # Swap with smallest child and continue
        self.heap[index], self.heap[smallest] = \
            self.heap[smallest], self.heap[index]
        index = smallest

def extract_min(self):
    """
    Remove and return the minimum element (root).
    1. Swap root with last element
    2. Remove last element
    3. Bubble down new root to restore heap property
    """
    if not self.heap:
        raise IndexError("Heap is empty")
    
    # Swap first and last elements
    self.heap[0], self.heap[-1] = self.heap[-1], self.heap[0]
    
    # Remove and store the minimum value
    min_val = self.heap.pop()
    
    # Restore heap property if heap is not empty
    if self.heap:
        self._bubble_down(0)
    
    return min_val

💡 Pro Tip: When bubble-down encounters two children, always choose the smaller child (in a min-heap) or larger child (in a max-heap) to swap with. This ensures the heap property is maintained for both children.

Building a Heap: The O(n) Magic

One of the most counterintuitive results in heap theory is that you can build a heap from an unsorted array in O(n) time, not O(n log n) as you might expect. This is a powerful technique that appears in several LeetCode problems.

❌ Wrong thinking: "Each bubble-down is O(log n) and we do it n times, so building a heap is O(n log n)."

✅ Correct thinking: "We only need to heapify the non-leaf nodes, and most nodes are near the bottom where bubble-down is cheap. The math works out to O(n)."

🤔 Did you know? The reason it's O(n) is that most nodes are at the bottom of the tree (half are leaves!) and don't need much work. Only a few nodes at the top need to bubble down many levels, and this weighted average gives us linear time.

The heapify algorithm works bottom-up:

def build_min_heap(arr):
    """
    Convert an unsorted array into a min-heap in-place.
    Time Complexity: O(n) - NOT O(n log n)!
    Space Complexity: O(1) - in-place operation
    
    Strategy: Start from the last non-leaf node and bubble-down each node.
    Leaf nodes (second half of array) are already valid heaps of size 1.
    """
    n = len(arr)
    heap = MinHeap()
    heap.heap = arr  # Use the array directly
    
    # Last non-leaf node is at index (n//2 - 1)
    # Work backwards to the root
    for i in range(n // 2 - 1, -1, -1):
        heap._bubble_down(i)
    
    return heap

## Example usage
arr = [9, 5, 6, 2, 3, 7, 1, 4, 8]
heap = build_min_heap(arr)
print(heap.heap)  # Output: [1, 2, 6, 4, 3, 7, 9, 5, 8]

Let's visualize building a heap from [9, 5, 6, 2, 3]:

Initial array:    [9, 5, 6, 2, 3]

Step 1: Start at index 1 (value 5)
         9              9
       /   \          /   \
      5     6   →    2     6     (swap 5 and 2)
     / \            / \
    2   3          5   3

Step 2: Move to index 0 (value 9)
         9              3              3
       /   \          /   \          /   \
      2     6   →    2     6   →    2     6
     / \            / \            / \
    5   3          5   9          5   9
    (compare with 2 and 6,  (swap with 2)   (compare with 5 and 9, done)
     choose smaller: 2)

Final heap: [3, 2, 6, 5, 9]

Core Heap Operations: Complete Implementation

Let's bring it all together with a complete, production-ready min-heap implementation:

class MinHeap:
    def __init__(self):
        self.heap = []
    
    def peek(self):
        """
        Return the minimum element without removing it.
        Time Complexity: O(1)
        """
        if not self.heap:
            raise IndexError("Heap is empty")
        return self.heap[0]
    
    def size(self):
        """Return the number of elements in the heap."""
        return len(self.heap)
    
    def is_empty(self):
        """Check if heap is empty."""
        return len(self.heap) == 0
    
    def insert(self, value):
        """Add element to heap. O(log n)"""
        self.heap.append(value)
        self._bubble_up(len(self.heap) - 1)
    
    def extract_min(self):
        """Remove and return minimum element. O(log n)"""
        if not self.heap:
            raise IndexError("Heap is empty")
        
        self.heap[0], self.heap[-1] = self.heap[-1], self.heap[0]
        min_val = self.heap.pop()
        
        if self.heap:
            self._bubble_down(0)
        
        return min_val
    
    def delete(self, index):
        """
        Delete element at specific index. O(log n)
        Useful for implementing decrease-key operation.
        """
        if index >= len(self.heap):
            raise IndexError("Index out of bounds")
        
        # Replace with last element
        self.heap[index] = self.heap[-1]
        self.heap.pop()
        
        if index < len(self.heap):
            # Try both bubble-up and bubble-down
            # Only one will actually move the element
            parent_index = (index - 1) // 2
            if index > 0 and self.heap[index] < self.heap[parent_index]:
                self._bubble_up(index)
            else:
                self._bubble_down(index)
    
    def _bubble_up(self, index):
        while index > 0:
            parent_index = (index - 1) // 2
            if self.heap[index] >= self.heap[parent_index]:
                break
            self.heap[index], self.heap[parent_index] = \
                self.heap[parent_index], self.heap[index]
            index = parent_index
    
    def _bubble_down(self, index):
        n = len(self.heap)
        while True:
            smallest = index
            left = 2 * index + 1
            right = 2 * index + 2
            
            if left < n and self.heap[left] < self.heap[smallest]:
                smallest = left
            if right < n and self.heap[right] < self.heap[smallest]:
                smallest = right
            
            if smallest == index:
                break
            
            self.heap[index], self.heap[smallest] = \
                self.heap[smallest], self.heap[index]
            index = smallest

📋 Quick Reference Card: Heap Operation Complexities

🎯 Operation	⏱️ Time Complexity	📝 Notes
peek/top	O(1)	Just return first element
insert	O(log n)	Append then bubble-up
extract-min/max	O(log n)	Swap with last, remove, bubble-down
delete at index	O(log n)	Replace with last, then bubble-up or down
build heap	O(n)	Heapify from bottom-up
search	O(n)	No ordering except parent-child

Language-Specific Implementations

While understanding heap internals is crucial, in practice you'll often use built-in heap implementations. Let's explore the most common ones across popular languages.

Python: heapq Module

Python's heapq module provides a min-heap implementation that operates on regular Python lists. It doesn't provide a heap class; instead, it offers functions that maintain the heap invariant on a list.

import heapq

## Create an empty heap (just a list)
heap = []

## Insert elements - O(log n) per insertion
heapq.heappush(heap, 5)
heapq.heappush(heap, 3)
heapq.heappush(heap, 7)
heapq.heappush(heap, 1)

## Peek at minimum - O(1)
min_val = heap[0]  # Returns 1

## Extract minimum - O(log n)
min_val = heapq.heappop(heap)  # Returns 1

## Build heap from existing list - O(n)
nums = [9, 5, 6, 2, 3, 7, 1, 4, 8]
heapq.heapify(nums)  # Converts list to heap in-place

## Get n smallest/largest elements - O(n log k) where k is n
smallest_3 = heapq.nsmallest(3, [5, 1, 9, 2, 7])  # [1, 2, 5]
largest_3 = heapq.nlargest(3, [5, 1, 9, 2, 7])    # [9, 7, 5]

⚠️ Common Mistake: Python's heapq only provides min-heaps. For a max-heap, negate your values when inserting and negate again when extracting! ⚠️

💡 Pro Tip: For max-heap behavior in Python, use the negation trick:

## Max-heap using negation
max_heap = []
heapq.heappush(max_heap, -5)  # Insert 5
heapq.heappush(max_heap, -10) # Insert 10
max_val = -heapq.heappop(max_heap)  # Extract 10

Alternatively, use tuples with negated first elements:

heap = []
heapq.heappush(heap, (-priority, value))  # Higher priority = larger number

Java: PriorityQueue

Java's PriorityQueue is more feature-rich, providing both min and max heaps through custom comparators.

import java.util.PriorityQueue;
import java.util.Collections;

// Min-heap (default)
PriorityQueue<Integer> minHeap = new PriorityQueue<>();

// Max-heap (using reverse order comparator)
PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Collections.reverseOrder());

// Insert elements - O(log n)
minHeap.offer(5);
minHeap.offer(3);
minHeap.offer(7);

// Peek - O(1)
Integer min = minHeap.peek();  // Returns 3

// Extract - O(log n)
Integer extracted = minHeap.poll();  // Returns and removes 3

// Size and empty check
int size = minHeap.size();
boolean isEmpty = minHeap.isEmpty();

// Custom comparator for objects
PriorityQueue<Task> taskQueue = new PriorityQueue<>(
    (a, b) -> a.priority - b.priority  // Compare by priority field
);

C++: priority_queue

C++'s priority_queue is a max-heap by default (opposite of Python and Java).

#include <queue>
#include <vector>

// Max-heap (default)
std::priority_queue<int> maxHeap;

// Min-heap (using greater comparator)
std::priority_queue<int, std::vector<int>, std::greater<int>> minHeap;

// Insert - O(log n)
maxHeap.push(5);
maxHeap.push(3);
maxHeap.push(7);

// Peek - O(1)
int max = maxHeap.top();  // Returns 7

// Extract - O(log n)
maxHeap.pop();  // Removes 7

// Size and empty check
int size = maxHeap.size();
bool isEmpty = maxHeap.empty();

// Custom comparator with lambda
auto cmp = [](const Task& a, const Task& b) { 
    return a.priority < b.priority;  // Max-heap by priority
};
std::priority_queue<Task, std::vector<Task>, decltype(cmp)> taskQueue(cmp);

🧠 Mnemonic: "Python's Polite, C++ is Confident" - Python defaults to min (polite, gives you smallest first), C++ defaults to max (confident, gives you biggest first).

Understanding Heap Invariants

The heap invariant (or heap property) is what makes heaps work. For every operation, we must maintain this invariant:

🎯 Key Principle: In a min-heap, every parent must be less than or equal to its children. In a max-heap, every parent must be greater than or equal to its children.

This property has several important implications:

🔧 No global ordering: Unlike a sorted array, heap elements aren't fully ordered. The only guarantee is the parent-child relationship. Elements at the same level have no ordering constraint.

🔧 Efficient extremum access: The minimum (min-heap) or maximum (max-heap) is always at the root, giving O(1) access.

🔧 Efficient insertion and deletion: We only need to fix violations along a single path from root to leaf (or leaf to root), which is O(log n).

💡 Real-World Example: Think of a tournament bracket. The winner of each match (parent) must have beaten both competitors from the previous round (children). But we don't know the relative ranking of all participants—only that each parent beat their specific children.

This partial ordering is precisely what makes heaps efficient. They provide just enough structure to guarantee constant-time access to the min/max element, while keeping insertions and deletions logarithmic.

Practical Implementation Tips

When implementing heap solutions for LeetCode problems:

1️⃣ Start with built-in libraries: Unless the problem specifically asks you to implement a heap, use your language's standard library. They're well-tested and optimized.

2️⃣ Tuple/pair trick for complex comparisons: When you need to track multiple values, use tuples/pairs where the first element is the comparison key:

import heapq

## Track (distance, node_id) - automatically compares by distance first
heap = []
heapq.heappush(heap, (0, start_node))
dist, node = heapq.heappop(heap)

3️⃣ Remember edge cases: Always check if the heap is empty before peeking or popping.

4️⃣ Consider space-time tradeoffs: Building a heap is O(n), but inserting n elements is O(n log n). Choose based on your use case.

With these fundamentals mastered, you're now equipped to implement and manipulate heaps with confidence. In the next section, we'll explore common LeetCode patterns where heaps are the key to elegant, efficient solutions.

Common LeetCode Patterns with Heaps

Once you understand how heaps work mechanically, the real power emerges when you recognize recurring patterns in algorithm problems. Heaps aren't just data structures—they're problem-solving tools that unlock elegant solutions to otherwise complex challenges. In this section, we'll explore five fundamental patterns that appear repeatedly in coding interviews and competitive programming. Mastering these patterns will transform how you approach optimization problems.

Pattern 1: Top K Elements

The Top K Elements pattern is perhaps the most frequently encountered heap pattern in interviews. The core insight is deceptively simple: when you need to find the k largest or smallest elements from a dataset, you don't need to sort the entire collection. Instead, you maintain a heap of size k, which gives you O(n log k) time complexity instead of O(n log n).

🎯 Key Principle: To find the k largest elements, use a min-heap of size k. To find the k smallest elements, use a max-heap of size k. This feels counterintuitive at first, but the logic is elegant.

When finding the k largest elements with a min-heap, the smallest of your "top k" sits at the heap's root. As you process new elements, if an element is larger than your current minimum (the root), it deserves a spot in your top k, so you evict the current minimum. The min-heap efficiently maintains which element should be evicted next.

import heapq

def find_k_largest(nums, k):
    """
    Find k largest elements using a min-heap of size k.
    Time: O(n log k), Space: O(k)
    """
    if k == 0:
        return []
    
    # Build min-heap with first k elements
    min_heap = nums[:k]
    heapq.heapify(min_heap)  # O(k)
    
    # Process remaining elements
    for num in nums[k:]:
        if num > min_heap[0]:  # num is larger than smallest in top-k
            heapq.heapreplace(min_heap, num)  # Pop min, push num
    
    return min_heap

## Example: Find 3 largest numbers
nums = [3, 2, 1, 5, 6, 4]
print(find_k_largest(nums, 3))  # [4, 5, 6] (order may vary)

💡 Pro Tip: Use heapreplace() instead of separate heappop() and heappush() operations—it's more efficient as it performs both operations in one step.

Let's visualize why this works:

Finding 3 largest from [3, 2, 1, 5, 6, 4]

Step 1: Initialize heap with first 3 elements
Heap (min-heap): [1, 3, 2]
                  1
                 / \
                3   2

Step 2: Process 5 (5 > 1, so replace)
Heap: [2, 3, 5]
       2
      / \
     3   5

Step 3: Process 6 (6 > 2, so replace)
Heap: [3, 6, 5]
       3
      / \
     6   5

Step 4: Process 4 (4 > 3, so replace)
Heap: [4, 6, 5]
       4
      / \
     6   5

Result: Top 3 elements are [4, 5, 6]

⚠️ Common Mistake 1: Using a max-heap of size k to find k largest elements. This doesn't work efficiently because you can't easily determine which elements to keep versus discard. ⚠️

Real-world applications of this pattern include finding the k most frequent items, k closest points to origin, and k highest-rated products. The pattern extends beyond simple numbers to any comparable objects.

Pattern 2: K-Way Merge

The K-way merge pattern elegantly solves problems involving multiple sorted sequences. Imagine you have k sorted arrays and need to merge them into one sorted result. The naive approach of concatenating and sorting takes O(n log n) where n is the total number of elements, but we can do better by exploiting the fact that inputs are already sorted.

The insight: at any moment, the next smallest element must be at the beginning of one of the k arrays. Instead of checking all k positions repeatedly, we use a min-heap to track the smallest current element from each array.

import heapq

def merge_k_sorted_lists(lists):
    """
    Merge k sorted lists into one sorted list.
    Time: O(n log k) where n = total elements, k = number of lists
    Space: O(k) for the heap
    """
    min_heap = []
    result = []
    
    # Initialize heap with first element from each list
    # Store (value, list_index, element_index)
    for i, lst in enumerate(lists):
        if lst:  # Check if list is not empty
            heapq.heappush(min_heap, (lst[0], i, 0))
    
    # Extract min and add next element from same list
    while min_heap:
        value, list_idx, elem_idx = heapq.heappop(min_heap)
        result.append(value)
        
        # If current list has more elements, add next one
        if elem_idx + 1 < len(lists[list_idx]):
            next_value = lists[list_idx][elem_idx + 1]
            heapq.heappush(min_heap, (next_value, list_idx, elem_idx + 1))
    
    return result

## Example: Merge 3 sorted lists
lists = [
    [1, 4, 7],
    [2, 5, 8],
    [3, 6, 9]
]
print(merge_k_sorted_lists(lists))  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

Let's trace through the algorithm:

Initial state:
Lists: [1,4,7], [2,5,8], [3,6,9]
Heap: [(1,0,0), (2,1,0), (3,2,0)]
       1
      / \
     2   3

Step 1: Pop 1, add next from list 0
Result: [1]
Heap: [(2,1,0), (3,2,0), (4,0,1)]

Step 2: Pop 2, add next from list 1
Result: [1, 2]
Heap: [(3,2,0), (4,0,1), (5,1,1)]

... continues until all elements processed

🤔 Did you know? This pattern is the foundation of external sorting algorithms used when data doesn't fit in memory. Database systems use k-way merge to combine sorted runs from disk.

💡 Mental Model: Think of the heap as a "tournament bracket" where k competitors (one from each list) compete to be the smallest. The winner is removed, and the next competitor from the same list enters the tournament.

This pattern appears in problems like:

Merge K Sorted Lists (LeetCode #23)
Find K Pairs with Smallest Sums (LeetCode #373)
Smallest Range Covering Elements from K Lists (LeetCode #632)

Pattern 3: Two Heaps (Median Maintenance)

The Two Heaps pattern is a sophisticated technique for problems requiring quick access to the median or maintaining a balanced split of data. The core idea: maintain two heaps that divide your dataset into two halves.

🎯 Key Principle: Use a max-heap for the smaller half and a min-heap for the larger half. The median is always at the roots of these heaps.

Here's the elegant insight:

Data divided into two halves:

Smaller half (max-heap)  |  Larger half (min-heap)
      Max at root        |       Min at root
         ↓               |           ↓
    [smaller values]     |     [larger values]
                         |
              median is at the boundary

The max-heap's root is the largest of the small values, and the min-heap's root is the smallest of the large values. If the heaps are balanced, the median is the average of these two roots (for even-length data) or the root of the larger heap (for odd-length data).

import heapq

class MedianFinder:
    """
    Maintain running median of a stream of numbers.
    All operations are O(log n) time.
    """
    def __init__(self):
        # Max-heap for smaller half (negate values for max behavior)
        self.small = []  # Python only has min-heap, so negate values
        # Min-heap for larger half
        self.large = []
    
    def addNum(self, num):
        """Add a number to the data structure."""
        # Add to small heap (max-heap) first
        heapq.heappush(self.small, -num)
        
        # Balance rule 1: largest in small should be ≤ smallest in large
        if self.small and self.large and (-self.small[0] > self.large[0]):
            val = -heapq.heappop(self.small)
            heapq.heappush(self.large, val)
        
        # Balance rule 2: size difference should be at most 1
        if len(self.small) > len(self.large) + 1:
            val = -heapq.heappop(self.small)
            heapq.heappush(self.large, val)
        
        if len(self.large) > len(self.small) + 1:
            val = heapq.heappop(self.large)
            heapq.heappush(self.small, -val)
    
    def findMedian(self):
        """Return the median of all elements."""
        if len(self.small) > len(self.large):
            return -self.small[0]
        elif len(self.large) > len(self.small):
            return self.large[0]
        else:
            return (-self.small[0] + self.large[0]) / 2.0

## Example usage
mf = MedianFinder()
mf.addNum(1)    # data: [1], median: 1
mf.addNum(2)    # data: [1, 2], median: 1.5
print(mf.findMedian())  # 1.5
mf.addNum(3)    # data: [1, 2, 3], median: 2
print(mf.findMedian())  # 2.0

⚠️ Common Mistake 2: Forgetting to negate values when using Python's heapq as a max-heap. Python only provides min-heap, so you must negate values: push -num and pop/peek as -heap[0]. ⚠️

Let's visualize the state after adding numbers [1, 2, 3, 4, 5]:

After adding [1, 2, 3, 4, 5]:

Small (max-heap)        Large (min-heap)
  Stores: [3, 2, 1]      Stores: [4, 5]
     -1                      4
    /  \                    /
  -2   -3                  5

Median = largest(small) = 3
(small has one extra element, so use its root)

💡 Real-World Example: Streaming services use this pattern to maintain statistics like median watch time without storing all data. As users watch content, the median is updated in O(log n) time.

Applications of the two heaps pattern:

Find Median from Data Stream (LeetCode #295)
Sliding Window Median (LeetCode #480)
IPO (LeetCode #502) - maximizing capital with project selection

Pattern 4: Heap + HashMap Combination

Many problems require custom priority handling where priority isn't just based on a single value but on a combination of factors, or where you need to quickly look up whether an element is in the heap. The Heap + HashMap pattern combines the priority queue's ordering with the hash map's O(1) lookup.

A classic example is finding the k most frequent elements. Here, priority is based on frequency (which requires counting), not the element values themselves.

import heapq
from collections import Counter

def top_k_frequent(nums, k):
    """
    Find k most frequent elements.
    Time: O(n log k), Space: O(n)
    """
    # Step 1: Build frequency map - O(n)
    freq_map = Counter(nums)
    
    # Step 2: Use min-heap of size k, ordered by frequency
    # Heap stores (frequency, element) tuples
    min_heap = []
    
    for num, freq in freq_map.items():
        heapq.heappush(min_heap, (freq, num))
        if len(min_heap) > k:
            heapq.heappop(min_heap)  # Remove least frequent
    
    # Step 3: Extract elements (discard frequencies)
    return [num for freq, num in min_heap]

## Example
nums = [1, 1, 1, 2, 2, 3, 4, 4, 4, 4]
print(top_k_frequent(nums, 2))  # [1, 4] - most frequent elements

🧠 Mnemonic: CHOP - Count, Heap, Order by Priority. First count/compute your priority metric, then use a heap to maintain top k by that priority.

The HashMap enables the counting phase, while the heap provides efficient "top k" selection. This division of responsibility is powerful:

HashMap responsibility:        Heap responsibility:
- Store auxiliary data         - Maintain ordering
- Enable O(1) lookups          - Efficient min/max access
- Track metadata               - Quick insertion/deletion

Variations and extensions:

Priority with ties: When multiple elements have the same priority, you might need a secondary sorting criterion. Use tuples: (primary_priority, secondary_priority, element).
Dynamic priority updates: If priorities change, you might need "lazy deletion" (covered in advanced techniques) since heaps don't support efficient priority updates.
Task scheduling: Combine timestamps with task IDs to schedule tasks efficiently.

💡 Pro Tip: When using tuples in heaps, Python compares element-by-element. For custom objects, implement __lt__ (less-than) method to define comparison logic.

Common problems using this pattern:

Top K Frequent Elements (LeetCode #347)
Top K Frequent Words (LeetCode #692)
Reorganize String (LeetCode #767)
Task Scheduler (LeetCode #621)

Pattern 5: Scheduling and Interval Problems

The Scheduling pattern uses priority queues with custom comparators to solve problems involving time intervals, resource allocation, and event processing. The key is choosing the right priority metric for your specific problem.

🎯 Key Principle: Sort intervals by start time, then use a min-heap ordered by end time to track active/ongoing intervals.

Consider the classic "Meeting Rooms II" problem: given meeting time intervals, find the minimum number of conference rooms required.

import heapq

def min_meeting_rooms(intervals):
    """
    Find minimum meeting rooms needed.
    Time: O(n log n), Space: O(n)
    
    intervals: list of [start, end] times
    """
    if not intervals:
        return 0
    
    # Sort meetings by start time
    intervals.sort(key=lambda x: x[0])
    
    # Min-heap to track end times of ongoing meetings
    rooms = []  # stores end times
    heapq.heappush(rooms, intervals[0][1])  # First meeting ends at...
    
    for start, end in intervals[1:]:
        # If earliest ending meeting finishes before this one starts,
        # we can reuse that room
        if rooms[0] <= start:
            heapq.heappop(rooms)
        
        # Add current meeting's end time
        heapq.heappush(rooms, end)
    
    # Number of rooms = size of heap
    return len(rooms)

## Example
meetings = [[0, 30], [5, 10], [15, 20]]
print(min_meeting_rooms(meetings))  # 2
## Meeting [0,30] and [5,10] overlap, and [0,30] and [15,20] overlap

Let's trace through the algorithm:

Meetings (sorted): [0,30], [5,10], [15,20]

Step 1: Add first meeting
Rooms heap: [30]
Active rooms: 1

     Timeline: [========30

Step 2: Process [5,10]
- Check: rooms[0]=30 > 5 (earliest room not free yet)
- Need new room
Rooms heap: [10, 30]
Active rooms: 2

     Timeline: [========30
                   [=10

Step 3: Process [15,20]
- Check: rooms[0]=10 ≤ 15 (room is free!)
- Reuse that room, remove 10, add 20
Rooms heap: [20, 30]
Active rooms: 2

     Timeline: [========30
                   [=10 [===20

Result: Need 2 rooms minimum

❌ Wrong thinking: "I need to track which specific room each meeting uses." ✅ Correct thinking: "I only need to count the maximum overlap. The heap's size represents current overlap."

Why this works: The heap always contains the end times of currently active meetings. When we process a new meeting, if the earliest-ending active meeting finishes before the new one starts, we can reuse that room (pop it). Otherwise, we need an additional room. The maximum heap size throughout the process is the answer.

💡 Mental Model: Think of the heap as a parking lot. Each element represents a car that will leave at a certain time. When a new car arrives, if a spot is about to free up (earliest end time ≤ new start time), we can wait and reuse it. Otherwise, we need a new spot.

Other scheduling problems:

Problem Type	Heap Priority	What Heap Tracks
🏢 Meeting Rooms	End time	Active meetings
🚗 Car Pooling	Drop-off location	Passengers on board
💻 CPU Tasks	Next available time	CPU cores
📦 Package Delivery	Deadline	Pending deliveries
🎮 Single-threaded CPU	Arrival + duration	Waiting tasks

⚠️ Common Mistake 3: Forgetting to sort intervals by start time first. The heap-based approach only works correctly when processing intervals in chronological order. ⚠️

Common problems using this pattern:

Meeting Rooms II (LeetCode #253)
Task Scheduler (LeetCode #621)
Single-Threaded CPU (LeetCode #1834)
Car Pooling (LeetCode #1094)
Course Schedule III (LeetCode #630)

Recognizing the Pattern

As you practice, pattern recognition becomes intuitive. Here's a quick decision tree:

Does the problem mention "k largest/smallest/most frequent"?
    ↓ YES
  Use TOP K ELEMENTS pattern
    ↓ NO
    
Does it involve k sorted sequences?
    ↓ YES
  Use K-WAY MERGE pattern
    ↓ NO
    
Does it need the median or balance split?
    ↓ YES
  Use TWO HEAPS pattern
    ↓ NO
    
Does priority depend on auxiliary data?
    ↓ YES
  Use HEAP + HASHMAP pattern
    ↓ NO
    
Does it involve intervals or scheduling?
    ↓ YES
  Use SCHEDULING pattern

📋 Quick Reference Card:

Pattern	🎯 When to Use	🔧 Structure	⏱️ Complexity
Top K Elements	Find k best/worst	Min or max heap of size k	O(n log k)
K-Way Merge	Merge k sorted sequences	Min-heap with (value, source)	O(n log k)
Two Heaps	Find median, split data	Max-heap + min-heap	O(log n) per op
Heap + HashMap	Custom priority, frequency	HashMap + heap	O(n log k)
Scheduling	Intervals, resources	Sorted + min-heap by end	O(n log n)

💡 Remember: These patterns often combine. A complex problem might use heap + hashmap for frequency tracking AND the two heaps pattern for median maintenance. Don't be afraid to mix patterns when the problem demands it.

Mastering these five patterns will equip you to tackle the majority of heap-related problems in interviews. The key is recognizing which pattern applies, then adapting the template to the specific constraints of your problem. With practice, this recognition becomes second nature, and you'll start seeing heap opportunities in problems that might initially seem unrelated.

Advanced Techniques and Optimization

Once you've mastered the fundamentals of heaps, the real power emerges when you learn to bend them to solve complex competitive programming challenges. This section explores advanced techniques that separate good heap solutions from exceptional ones—techniques that can transform an acceptable solution into one that's both elegant and lightning-fast.

Custom Comparators: Beyond Simple Min and Max

Most heap problems start with simple numeric comparisons, but real-world LeetCode challenges often require custom comparators that define priority based on complex criteria. A custom comparator is a function that tells the heap how to compare two elements, allowing you to create sophisticated priority schemes.

Consider the classic "Meeting Rooms II" problem where you need to find the minimum number of conference rooms required. You might need to prioritize meetings by end time, not by a simple numeric value. Or imagine tracking the k closest points to the origin—you need to compare based on calculated distance, not the raw coordinates.

import heapq
from typing import List, Tuple

class Task:
    def __init__(self, name: str, priority: int, deadline: int):
        self.name = name
        self.priority = priority
        self.deadline = deadline
    
    # Custom comparison: higher priority wins, ties broken by earlier deadline
    def __lt__(self, other):
        if self.priority != other.priority:
            return self.priority > other.priority  # Higher priority first (max heap behavior)
        return self.deadline < other.deadline  # Earlier deadline wins ties
    
    def __repr__(self):
        return f"Task({self.name}, p={self.priority}, d={self.deadline})"

## Using the custom comparator
tasks = [
    Task("Debug prod issue", priority=9, deadline=2),
    Task("Write tests", priority=5, deadline=10),
    Task("Code review", priority=9, deadline=5),
    Task("Deploy", priority=7, deadline=3)
]

heapq.heapify(tasks)

print("Tasks in priority order:")
while tasks:
    print(heapq.heappop(tasks))
## Output:
## Task(Debug prod issue, p=9, d=2)
## Task(Code review, p=9, d=5)
## Task(Deploy, p=7, d=3)
## Task(Write tests, p=5, d=10)

🎯 Key Principle: In Python, define __lt__ (less than) to create custom comparison logic. In C++, pass a custom comparator function or lambda to priority_queue. In Java, implement Comparator<T> or make your class implement Comparable<T>.

💡 Pro Tip: When you want a max heap in Python (which natively only supports min heaps), you can negate numeric values or reverse your comparison logic in __lt__. For complex objects, reversing the comparison is cleaner than trying to negate multiple fields.

Lazy Deletion: The Art of Deferred Cleanup

Lazy deletion is a powerful technique where you mark elements as deleted without immediately removing them from the heap. This avoids the expensive O(n) operation of searching through the heap to remove an arbitrary element. Instead, you maintain a separate data structure (usually a hash map or set) to track which elements are "logically deleted."

Here's the mental model:

Heap State:          [5, 10, 15, 20, 25]
Deleted Set:         {10, 20}
Logical View:        [5, 15, 25]  (what the heap "really" contains)

When you pop from the heap, you check if the element is in the deleted set. If it is, discard it and pop again. This transforms expensive deletions into cheap lookups.

import heapq
from collections import defaultdict

class LazyDeletionHeap:
    def __init__(self):
        self.heap = []
        self.deleted = set()  # Track deleted elements
        self.size = 0  # Track logical size
    
    def push(self, val):
        heapq.heappush(self.heap, val)
        self.size += 1
    
    def lazy_delete(self, val):
        """Mark element as deleted without removing it"""
        if val not in self.deleted:
            self.deleted.add(val)
            self.size -= 1
    
    def top(self):
        """Get minimum element, cleaning deleted items lazily"""
        self._cleanup()
        return self.heap[0] if self.heap else None
    
    def pop(self):
        """Remove and return minimum element"""
        self._cleanup()
        if self.heap:
            self.size -= 1
            val = heapq.heappop(self.heap)
            return val
        return None
    
    def _cleanup(self):
        """Remove deleted elements from top of heap"""
        while self.heap and self.heap[0] in self.deleted:
            deleted_val = heapq.heappop(self.heap)
            self.deleted.remove(deleted_val)  # Clean up the set too
    
    def __len__(self):
        return self.size

## Example usage: Finding median with removals
heap = LazyDeletionHeap()
for num in [5, 10, 15, 20, 25]:
    heap.push(num)

print(f"Top: {heap.top()}")  # 5
heap.lazy_delete(5)
heap.lazy_delete(15)
print(f"Top after deletions: {heap.top()}")  # 10
print(f"Logical size: {len(heap)}")  # 3

⚠️ Common Mistake: Forgetting to track logical size separately. The physical heap size and logical size diverge with lazy deletion, which can break algorithms that depend on knowing the true number of valid elements. Mistake 1: Using len(heap) directly instead of maintaining a separate size counter. ⚠️

💡 Real-World Example: LeetCode problem "Find Median from Data Stream" with element removal becomes much more tractable with lazy deletion. Without it, you'd need to rebuild the heap after each removal—an O(n) operation that quickly makes the solution too slow.

Space Optimization: Heaps vs Sorting

One of the most powerful optimization decisions is choosing between heap-based solutions and sorting-based solutions. The choice often comes down to whether you need all elements sorted or just the top k elements.

🎯 Key Principle: When you only need the k smallest/largest elements from n items where k << n (k is much smaller than n), a heap of size k uses O(k) space while full sorting requires O(n) space. The time complexity difference is equally dramatic: O(n log k) vs O(n log n).

Consider finding the k largest elements in a stream:

Sorting approach:
- Store all n elements: O(n) space
- Sort everything: O(n log n) time
- Take top k: O(k) time
- Total: O(n log n) time, O(n) space

Heap approach:
- Maintain min heap of size k: O(k) space
- For each element: O(log k) time to maintain heap
- Total: O(n log k) time, O(k) space

❌ Wrong thinking: "Sorting is simpler, so I'll just sort and take the first k elements."

✅ Correct thinking: "I only need k elements. If k is small relative to n (like k=100, n=1,000,000), a size-k heap saves massive space and time."

import heapq
from typing import List

def kth_largest_element_optimal(nums: List[int], k: int) -> int:
    """
    Find kth largest element using a min heap of size k.
    Time: O(n log k), Space: O(k)
    
    Why min heap? We keep the k largest elements. The smallest of these
    k largest elements (top of min heap) is the kth largest overall.
    """
    if k > len(nums):
        return None
    
    # Maintain min heap of size k
    heap = []
    
    for num in nums:
        if len(heap) < k:
            heapq.heappush(heap, num)
        elif num > heap[0]:  # num is larger than smallest in our top-k
            heapq.heapreplace(heap, num)  # Pop smallest, push num (atomic operation)
    
    return heap[0]  # Smallest of the k largest = kth largest

## Compare with sorting approach
def kth_largest_element_sorting(nums: List[int], k: int) -> int:
    """
    Time: O(n log n), Space: O(n) for sorted array
    """
    return sorted(nums, reverse=True)[k-1]

## Example: find 3rd largest in a million numbers
import random
nums = [random.randint(1, 1000000) for _ in range(1000000)]
k = 3

## Heap approach: maintains only 3 elements in memory during processing
result = kth_largest_element_optimal(nums, k)
print(f"3rd largest (heap): {result}")

## Sorting approach: creates a sorted array of 1 million elements
result = kth_largest_element_sorting(nums, k)
print(f"3rd largest (sort): {result}")

💡 Pro Tip: Use heapq.heapreplace(heap, item) instead of separate heappop and heappush calls. It's more efficient and atomic—it pops and pushes in a single operation, maintaining heap invariants throughout.

🤔 Did you know? For finding the median of a stream, you need two heaps (max heap for lower half, min heap for upper half), which uses O(n) space. But for finding any specific percentile (like 95th percentile), you can use the same k-sized heap trick, using only O(k) space!

Combining Heaps with Hash Maps and Sets

The most sophisticated heap problems often require combining heaps with hash-based data structures. This combination gives you the best of both worlds: O(1) lookups from hash maps and O(log n) ordered access from heaps.

Common patterns include:

🔧 Pattern 1: Heap + HashMap for Frequency Tracking

Use case: "Top K Frequent Elements"
HashMap tracks element → frequency
Heap orders by frequency
Enables efficient "most/least frequent" queries

🔧 Pattern 2: Heap + HashSet for Lazy Deletion

Use case: "Sliding Window Median" with removals
HashSet marks deleted elements
Heap maintains order of valid elements
Clean deleted elements lazily when accessing heap top

🔧 Pattern 3: Heap + HashMap for Multi-criteria Priority

Use case: "Design Twitter" (recent tweets from followed users)
HashMap stores user → tweet list
Heap merges k sorted lists (k = number of followed users)
Track which list each heap element came from

Here's a complete example combining these techniques:

import heapq
from collections import defaultdict
from typing import List

class TopKFrequent:
    """
    Maintain top k most frequent elements with updates.
    Demonstrates: Heap + HashMap + Lazy Deletion
    """
    def __init__(self, k: int):
        self.k = k
        self.freq = defaultdict(int)  # element -> frequency
        self.heap = []  # min heap of (frequency, element)
        self.in_heap = set()  # elements currently in heap
        self.deleted = set()  # elements marked for deletion
    
    def add(self, val: int):
        """Add an occurrence of val"""
        self.freq[val] += 1
        
        # If val is in heap, mark old entry as deleted
        if val in self.in_heap:
            self.deleted.add((self.freq[val] - 1, val))
        
        # Add new entry with updated frequency
        entry = (self.freq[val], val)
        heapq.heappush(self.heap, entry)
        self.in_heap.add(val)
        
        # Maintain heap size of k
        self._cleanup()
        while len([e for e in self.in_heap if (self.freq[e], e) not in self.deleted]) > self.k:
            self._cleanup()
            if self.heap:
                removed = heapq.heappop(self.heap)
                self.in_heap.discard(removed[1])
    
    def _cleanup(self):
        """Remove deleted entries from heap top"""
        while self.heap and self.heap[0] in self.deleted:
            deleted_entry = heapq.heappop(self.heap)
            self.deleted.remove(deleted_entry)
    
    def get_top_k(self) -> List[int]:
        """Return the k most frequent elements"""
        self._cleanup()
        # Return elements in heap (all are top-k by construction)
        valid_entries = [(f, e) for f, e in self.heap if (f, e) not in self.deleted]
        return [element for freq, element in sorted(valid_entries, reverse=True)]

## Example usage
top_k = TopKFrequent(k=2)
for num in [1, 1, 1, 2, 2, 3]:
    top_k.add(num)

print(f"Top 2 frequent: {top_k.get_top_k()}")  # [1, 2]

top_k.add(3)
top_k.add(3)
top_k.add(3)
print(f"Top 2 frequent: {top_k.get_top_k()}")  # [1, 3] (3 now appears 4 times)

⚠️ Common Mistake: Creating separate heap entries for the same element without tracking and invalidating old entries. This causes the heap to contain stale frequency information. Mistake 2: Not maintaining the invariant that heap elements match their current frequency in the hash map. ⚠️

Time-Space Tradeoffs: O(n log k) vs O(n log n)

Understanding when a heap-based O(n log k) solution beats an O(n log n) sorting solution requires thinking carefully about the relationship between k and n.

Let's break down the mathematics:

Heap approach: O(n log k)
Sort approach: O(n log n)

When is n log k < n log n?
=> log k < log n
=> k < n (always true when k < n)

But how much faster?

If k = n/2:      log(n/2) vs log(n) → similar performance
If k = 100:      log(100) ≈ 6.6 vs log(1000000) ≈ 20 → 3x faster
If k = 10:       log(10) ≈ 3.3 vs log(1000000) ≈ 20 → 6x faster

🎯 Key Principle: The heap approach's advantage grows as k becomes smaller relative to n. When k = O(1) or k = O(log n), the difference is dramatic. When k = O(n), sorting might actually be simpler and comparable in performance.

💡 Mental Model: Think of the heap approach as "selective attention"—you only maintain awareness of the k most interesting elements. Sorting is "total awareness"—you must organize everything even if you only care about a small portion.

📋 Quick Reference Card: Heap vs Sort Decision Matrix

Scenario	📊 k vs n	🎯 Best Choice	💭 Reasoning
Top 10 from 1M	k << n	🏆 Heap	O(1M × log 10) vs O(1M × log 1M)
Top 500K from 1M	k ≈ n/2	🤷 Similar	Heap has overhead; sort is simpler
Need all sorted	k = n	🏆 Sort	Heap gives no advantage
Streaming data	k << n	🏆 Heap	Can't sort unknown n elements
One-time query	k << n	🤷 Either	For single query, constants matter
Repeated queries	Any k	🏆 Heap	Amortizes heap maintenance cost

🧠 Mnemonic: "Keep Kount Kompact"—when you need only k elements, keep your data structure compact (heap of size k) rather than expanding to full size (sorting all n).

Advanced Pattern: Dual Heap for Dynamic Median

One of the most elegant advanced heap patterns uses two heaps working in tandem to maintain a dynamic median. This pattern appears in multiple LeetCode hard problems and demonstrates sophisticated heap orchestration.

The idea:

Max heap stores the smaller half of numbers (largest of small numbers on top)
Min heap stores the larger half of numbers (smallest of large numbers on top)
Keep heaps balanced (sizes differ by at most 1)
Median is either the top of one heap or average of both tops

Numbers: [1, 3, 5, 7, 9]
Max heap (small half): [3, 1]  (top = 3)
Min heap (large half): [5, 7, 9]  (top = 5)
Median = 5 (middle element)

After adding 4:
Max heap: [3, 1]  (top = 3)
Min heap: [4, 5, 7, 9]  (top = 4)
Median = (3 + 4) / 2 = 3.5 (average of middle two)

This dual-heap pattern maintains O(log n) insert time while providing O(1) median access—far better than re-sorting the entire dataset after each insertion.

⚠️ Common Mistake: Not carefully maintaining the balance invariant. Mistake 3: Allowing heap sizes to differ by more than 1, or forgetting to rebalance after insertions, breaks the median property. ⚠️

💡 Remember: The dual-heap median pattern generalizes to finding any percentile. For the 25th percentile, keep the lower heap at 1/4 size of the upper heap. For 90th percentile, keep the lower heap at 9x the size of the upper heap.

Practical Optimization: When to Build vs Insert

A subtle but important optimization involves choosing between heapify (building a heap from scratch) and incremental insertion.

Building heap from array: O(n) using heapify
Inserting n elements one-by-one: O(n log n)

If you have all elements upfront, heapify is asymptotically faster:

import heapq

## Slower: O(n log n)
heap = []
for item in items:
    heapq.heappush(heap, item)

## Faster: O(n)
heap = items.copy()  # Don't modify original
heapq.heapify(heap)

💡 Pro Tip: In competitive programming, this optimization can mean the difference between TLE (Time Limit Exceeded) and AC (Accepted) on large inputs. When you see all n elements available at once, always prefer heapify over incremental insertion.

🤔 Did you know? The O(n) heapify works by starting from the bottom of the heap tree and "sinking" elements down. This is faster than the top-down approach because most elements are near the bottom (a complete binary tree has n/2 leaves) and don't need to move far.

Summary: Choosing the Right Advanced Technique

Mastering these advanced techniques means knowing which tool to reach for:

🎯 Use custom comparators when your priority depends on multiple fields, computed values, or complex business logic—not just simple numeric comparison.

🎯 Use lazy deletion when you need to remove arbitrary elements frequently and can afford the space overhead of tracking deleted items.

🎯 Choose heaps over sorting when k << n, you're processing streams, or you need repeated access to top-k elements.

🎯 Combine heaps with hash maps when you need both fast lookup and ordered access, especially for frequency-based problems.

🎯 Use dual heaps when you need to maintain dynamic medians or percentiles efficiently.

🎯 Use heapify instead of incremental insertion when all elements are available upfront and you're optimizing for time.

These techniques transform heap problems from straightforward min/max operations into sophisticated solutions that handle real-world complexity with elegance and efficiency.

Common Pitfalls and Best Practices

Heaps are powerful data structures, but they come with subtle traps that can derail even experienced developers. Understanding these pitfalls isn't just about avoiding bugs—it's about developing an intuition for when and how to use heaps effectively. In this section, we'll explore the most common mistakes developers make when working with heaps and learn battle-tested strategies to avoid them.

Pitfall 1: Min-Heap vs Max-Heap Confusion

One of the most frequent sources of confusion in heap problems is choosing the wrong heap type, particularly in top-k problems. This confusion stems from a counterintuitive principle: to find the k largest elements, you often need a min-heap, not a max-heap.

❌ Wrong thinking: "I need the k largest elements, so I'll use a max-heap to keep track of the largest values."

✅ Correct thinking: "I need the k largest elements, so I'll use a min-heap of size k. The smallest element in my heap is the k-th largest overall, and I can efficiently discard anything smaller."

💡 Mental Model: Think of a min-heap as a "bouncer" for an exclusive club that only admits the k largest elements. The bouncer (root of the min-heap) represents the minimum requirement to get in. Anyone smaller than the bouncer gets rejected; anyone larger replaces the bouncer and becomes the new minimum requirement.

## ⚠️ Common Mistake 1: Using max-heap for top-k largest ⚠️
## This approach works but is inefficient
def find_k_largest_inefficient(nums, k):
    import heapq
    # Creating a max-heap with all elements (using negation)
    max_heap = [-x for x in nums]
    heapq.heapify(max_heap)
    
    result = []
    for _ in range(k):
        result.append(-heapq.heappop(max_heap))
    return result
    # Time: O(n + k log n), Space: O(n)

## ✅ Correct Approach: Using min-heap of size k
def find_k_largest_efficient(nums, k):
    import heapq
    # Maintain a min-heap of size k
    min_heap = nums[:k]
    heapq.heapify(min_heap)  # O(k)
    
    # For remaining elements, only add if larger than smallest in heap
    for num in nums[k:]:
        if num > min_heap[0]:  # Compare with minimum (root)
            heapq.heapreplace(min_heap, num)  # Pop min, push new
    
    return min_heap
    # Time: O(n log k), Space: O(k) - Much better!

## Example usage
nums = [3, 2, 1, 5, 6, 4]
print(find_k_largest_efficient(nums, 2))  # [5, 6]

🎯 Key Principle: For top-k problems, use the opposite heap type:

Top-k largest → Use min-heap of size k
Top-k smallest → Use max-heap of size k

This principle extends to problems like "Kth Largest Element" where maintaining a min-heap of size k gives you O(n log k) time complexity instead of O(n log n).

Pitfall 2: Python's heapq Min-Heap Limitation

Python's heapq module only implements min-heaps natively. This is a deliberate design choice, but it catches many developers off guard when they need max-heap functionality.

⚠️ Common Mistake 2: Assuming heapq has a max-heap option ⚠️

There's no heapq.heappush_max() or flag to switch modes. The workaround is to negate values when inserting and negate again when extracting. While simple, this approach has subtle gotchas:

import heapq

## Simulating a max-heap with negation
class MaxHeapExample:
    def __init__(self):
        self.heap = []
    
    def push(self, val):
        heapq.heappush(self.heap, -val)  # Negate on insert
    
    def pop(self):
        return -heapq.heappop(self.heap)  # Negate on extract
    
    def peek(self):
        return -self.heap[0]  # Negate on peek

## ⚠️ GOTCHA: This fails with tuples! ⚠️
## Problem: Finding k closest points to origin
def k_closest_broken(points, k):
    heap = []
    for x, y in points:
        dist = x*x + y*y
        # Trying to negate tuple - THIS DOESN'T WORK!
        # heapq.heappush(heap, (-(dist), [x, y]))  # TypeError!
        pass

## ✅ Correct: Negate only the comparable part
def k_closest_correct(points, k):
    heap = []
    for x, y in points:
        dist = x*x + y*y
        heapq.heappush(heap, (-dist, [x, y]))  # Negate distance only
        if len(heap) > k:
            heapq.heappop(heap)
    
    return [point for _, point in heap]

## Even better: Use custom comparison with a class
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        self.dist = x*x + y*y
    
    def __lt__(self, other):
        # Reverse comparison for max-heap behavior
        return self.dist > other.dist  
    
    def to_list(self):
        return [self.x, self.y]

def k_closest_best(points, k):
    heap = []
    for x, y in points:
        heapq.heappush(heap, Point(x, y))
        if len(heap) > k:
            heapq.heappop(heap)
    
    return [p.to_list() for p in heap]

💡 Pro Tip: When working with complex objects, define a wrapper class with __lt__() method instead of negating values. This provides cleaner, more maintainable code and avoids the tuple negation trap.

🤔 Did you know? Some languages like Java provide both PriorityQueue (min-heap) and allow custom comparators easily, while C++ STL's priority_queue defaults to max-heap. Python's choice reflects its philosophy of "one obvious way to do it," though it requires the negation workaround.

Pitfall 3: Off-By-One Errors in Array-Based Heaps

When implementing heaps from scratch (common in interviews), index calculations are a minefield of off-by-one errors. The relationship between parent and child indices depends on whether you use 0-based or 1-based indexing.

0-based indexing (Python standard):
       0
      / \
     1   2
    / \ / \
   3  4 5  6

Parent of i: (i - 1) // 2
Left child:  2 * i + 1
Right child: 2 * i + 2

1-based indexing (mathematical, cleaner):
       1
      / \
     2   3
    / \ / \
   4  5 6  7

Parent of i: i // 2
Left child:  2 * i
Right child: 2 * i + 1

⚠️ Common Mistake 3: Mixing indexing systems or using wrong formulas ⚠️

class MinHeap:
    def __init__(self):
        self.heap = []
    
    def _parent(self, i):
        return (i - 1) // 2  # ✅ Correct for 0-based
    
    def _left_child(self, i):
        return 2 * i + 1  # ✅ Correct for 0-based
    
    def _right_child(self, i):
        return 2 * i + 2  # ✅ Correct for 0-based
    
    def _has_left(self, i):
        return self._left_child(i) < len(self.heap)  # Don't forget bounds check!
    
    def _has_right(self, i):
        return self._right_child(i) < len(self.heap)
    
    def _heapify_down(self, i):
        while self._has_left(i):
            smaller_child_idx = self._left_child(i)
            
            # ⚠️ Critical: Check if right child exists AND is smaller
            if (self._has_right(i) and 
                self.heap[self._right_child(i)] < self.heap[smaller_child_idx]):
                smaller_child_idx = self._right_child(i)
            
            # If parent is already smaller, we're done
            if self.heap[i] <= self.heap[smaller_child_idx]:
                break
            
            # Swap and continue
            self.heap[i], self.heap[smaller_child_idx] = \
                self.heap[smaller_child_idx], self.heap[i]
            i = smaller_child_idx
    
    def _heapify_up(self, i):
        # ⚠️ Critical: Parent of index 0 would be (0-1)//2 = 0, causing infinite loop!
        while i > 0 and self.heap[i] < self.heap[self._parent(i)]:
            parent = self._parent(i)
            self.heap[i], self.heap[parent] = self.heap[parent], self.heap[i]
            i = parent

🧠 Mnemonic: For 0-based indexing, remember "Plus One for Kids" - children indices need +1 and +2, parent needs -1 then divide by 2.

💡 Pro Tip: When implementing from scratch, add assertions to verify your index calculations:

def _verify_heap_property(self, i):
    if self._has_left(i):
        assert self.heap[i] <= self.heap[self._left_child(i)], "Left child violation"
    if self._has_right(i):
        assert self.heap[i] <= self.heap[self._right_child(i)], "Right child violation"

Pitfall 4: Neglecting Edge Cases

Heap problems have several edge cases that frequently cause runtime errors or incorrect results. Defensive programming requires checking for these scenarios:

📋 Quick Reference Card: Heap Edge Cases

🎯 Scenario	⚠️ Risk	✅ Solution
🔒 Empty heap operations	IndexError on peek/pop	Check `if heap:` before access
🎯 Single element	Wrong comparison logic	Ensure parent/child checks handle this
🔄 Duplicate values	Incorrect counting in frequency problems	Use tuples with tie-breakers
📊 k > array size	Not enough elements	Return `min(k, len(array))` elements
💾 Null/None values	Comparison errors	Filter or handle explicitly
🔢 Integer overflow	Negation of MIN_INT fails	Use float('-inf') / float('inf')

import heapq
from collections import Counter

def top_k_frequent_with_edge_cases(nums, k):
    """
    Find k most frequent elements, handling all edge cases.
    """
    # Edge case: Empty array
    if not nums:
        return []
    
    # Edge case: k >= number of unique elements
    freq_map = Counter(nums)
    if k >= len(freq_map):
        return list(freq_map.keys())
    
    # Use min-heap of size k
    # Tuple format: (frequency, unique_id, value)
    # unique_id prevents comparison issues with duplicate frequencies
    heap = []
    
    for idx, (num, freq) in enumerate(freq_map.items()):
        # Push tuple: freq first for comparison, idx for uniqueness
        heapq.heappush(heap, (freq, idx, num))
        
        # Maintain size k
        if len(heap) > k:
            heapq.heappop(heap)
    
    # Extract values (ignore frequency and index)
    return [num for _, _, num in heap]

## Edge case testing
test_cases = [
    ([1, 1, 1, 2, 2, 3], 2),     # Normal case
    ([1], 1),                     # Single element
    ([1, 2], 3),                  # k > unique elements
    ([], 1),                      # Empty array
    ([1, 1, 1, 1], 1),           # All same element
]

for nums, k in test_cases:
    print(f"nums={nums}, k={k} → {top_k_frequent_with_edge_cases(nums, k)}")

⚠️ Common Mistake 4: Forgetting to handle duplicate priorities ⚠️

When elements have the same priority in a heap, Python's heapq will try to compare the second element of the tuple. If that element isn't comparable (like a list), you'll get a TypeError:

## ❌ This breaks with duplicate distances!
import heapq
heap = []
heapq.heappush(heap, (5, [1, 2]))  # Works
heapq.heappush(heap, (5, [3, 4]))  # TypeError: '<' not supported between list instances

## ✅ Solution: Add unique tie-breaker
heap = []
counter = 0
for point in [[1, 2], [3, 4]]:
    dist = point[0]**2 + point[1]**2
    heapq.heappush(heap, (dist, counter, point))  # counter ensures uniqueness
    counter += 1

💡 Remember: Always include a unique tie-breaker (like an index or counter) when pushing tuples with potentially duplicate priorities.

Pitfall 5: Performance Anti-Patterns

Even when your heap logic is correct, certain patterns can severely degrade performance:

Anti-Pattern 1: Repeated Individual Insertions Instead of Heapify

import heapq
import time

nums = list(range(10000, 0, -1))  # 10,000 elements in reverse order

## ❌ Slow: Individual insertions O(n log n)
start = time.time()
heap1 = []
for num in nums:
    heapq.heappush(heap1, num)
print(f"Individual insertions: {time.time() - start:.4f}s")

## ✅ Fast: Bulk heapify O(n)
start = time.time()
heap2 = nums.copy()
heapq.heapify(heap2)
print(f"Heapify: {time.time() - start:.4f}s")

## Heapify is typically 3-5x faster!

🎯 Key Principle: When building a heap from existing data, use heapq.heapify() instead of repeated heappush() calls. The time complexity improvement from O(n log n) to O(n) is substantial for large datasets.

Anti-Pattern 2: Unnecessary Pop-Then-Push Operations

## ❌ Inefficient: Two operations
if heap:
    heapq.heappop(heap)
heapq.heappush(heap, new_value)

## ✅ Efficient: Single operation (better cache locality)
if heap:
    heapq.heapreplace(heap, new_value)  # Pop and push atomically

## Even better for conditional replacement:
if heap and new_value > heap[0]:
    heapq.heapreplace(heap, new_value)

💡 Pro Tip: heapreplace() is more efficient than separate pop and push because it only performs one heapify operation. Similarly, heappushpop() pushes then pops in one operation.

Anti-Pattern 3: Heap for Simple Comparisons

## ❌ Overkill: Using heap to find maximum
import heapq
def find_max(nums):
    heap = [-x for x in nums]
    heapq.heapify(heap)
    return -heapq.heappop(heap)

## ✅ Simple: Just use max()
def find_max(nums):
    return max(nums)

## ⚠️ Use heaps when you need:
## - Repeated access to min/max (multiple pops)
## - Maintaining top-k elements dynamically
## - Merging sorted sequences
## - NOT for one-time min/max finding!

🧠 Mnemonic: "HEAP for KEEP" - Use heaps when you need to KEEP a dynamic set of elements sorted, not for one-off operations.

Best Practices Summary

To write robust, efficient heap-based solutions, internalize these principles:

🔧 Implementation Best Practices:

Choose the right heap type based on problem requirements (remember the counterintuitive rule for top-k problems)
Use heapify for bulk operations rather than repeated insertions
Add tie-breakers when pushing tuples to prevent comparison errors
Validate edge cases explicitly: empty heaps, single elements, k > n
Prefer heapreplace/heappushpop over separate operations when applicable

🧠 Debugging Best Practices:

Print heap state after operations to verify heap property: print("Heap:", heap)
Use assertions to validate parent-child relationships in custom implementations
Test with small examples first - heap bugs are easier to spot with 3-5 elements
Visualize the tree structure on paper when debugging complex heap operations

📚 Code Review Best Practices:

Document heap type clearly: Add comments like # Min-heap: smallest distance at root
Explain negation tricks: # Negate for max-heap behavior with heapq
Justify heap choice: Why is a heap better than sorting here? (Usually dynamic updates)
Note time/space complexity: Make trade-offs explicit

💡 Real-World Example: In production systems at companies like Google and Facebook, heap-based priority queues manage billions of tasks. A bug that causes unnecessary heap operations can multiply into millions of wasted CPU cycles. Engineers use profiling tools to detect patterns like "too many individual insertions" and refactor to bulk heapify operations, often seeing 3-5x performance improvements.

Quick Debugging Checklist

When your heap solution isn't working, check these items in order:

☐ Am I using the correct heap type for this problem?
  └─ Top-k largest → min-heap of size k
  └─ Top-k smallest → max-heap of size k

☐ If using Python heapq with max-heap:
  └─ Am I negating values consistently?
  └─ Am I handling tuples correctly (negate only comparable part)?

☐ Are my index calculations correct? (0-based vs 1-based)
  └─ Parent: (i - 1) // 2
  └─ Left child: 2 * i + 1
  └─ Right child: 2 * i + 2

☐ Have I handled all edge cases?
  └─ Empty heap operations
  └─ Single element
  └─ k > array length
  └─ Duplicate values/priorities

☐ Am I using efficient heap operations?
  └─ heapify() for bulk creation
  └─ heapreplace() instead of pop + push
  └─ Heap is appropriate for this problem (not overkill)

☐ Are my tuple comparisons safe?
  └─ Added unique tie-breaker for duplicate priorities?
  └─ Using comparable types in comparison positions?

By systematically addressing these common pitfalls and following best practices, you'll write heap solutions that are not only correct but also efficient, maintainable, and resilient to edge cases. The difference between a novice and expert heap implementation often lies not in algorithmic understanding, but in attention to these practical details.

🎯 Key Principle: Mastery of heaps comes from recognizing that the data structure itself is simple—it's the context, edge cases, and performance considerations that require careful thought. Treat each pitfall as a learning opportunity to deepen your intuition about when and how to apply heaps effectively.

Summary and Problem-Solving Checklist

Congratulations! You've journeyed through the comprehensive world of heaps and priority queues. What started as an abstract data structure concept has now transformed into a powerful problem-solving toolkit. You began this lesson perhaps knowing that heaps existed, but now you understand when to use them, how to implement them efficiently, and why they're the optimal choice for certain algorithmic challenges. This final section consolidates everything you've learned into actionable references that you can use while tackling LeetCode problems.

What You've Mastered

Before this lesson, heaps might have seemed like just another data structure to memorize. Now you understand that heaps are your go-to solution for:

🎯 Dynamic extremum tracking - Finding and removing the minimum or maximum element efficiently as data changes

🎯 K-element problems - Maintaining the top-K or bottom-K elements from a stream or large dataset

🎯 Merge operations - Combining multiple sorted sequences efficiently

🎯 Scheduling and prioritization - Processing tasks in order of importance or urgency

🎯 Optimization problems - Finding optimal solutions that require repeatedly selecting the best candidate

You've learned not just the mechanics of heappush() and heappop(), but the strategic thinking behind recognizing heap opportunities in problem statements.

📋 Quick Reference: Heap Operation Complexities

Operation	Time Complexity	Space Complexity	Use Case
🔍 Peek (top)	O(1)	O(1)	View min/max without removal
➕ Insert (push)	O(log n)	O(1)	Add new element
➖ Extract (pop)	O(log n)	O(1)	Remove and return min/max
🏗️ Build heap (heapify)	O(n)	O(1)	Create heap from array
🔄 Increase/decrease key	O(log n)	O(1)	Modify element priority
🗑️ Delete arbitrary	O(log n)	O(1)	Remove specific element
🔍 Search	O(n)	O(1)	Find arbitrary element

💡 Pro Tip: Notice that heaps excel at extremum operations (O(log n)) but are poor at searching (O(n)). If you need frequent searches, combine a heap with a hash map for index tracking.

⚠️ Critical Point: Building a heap from an array is O(n), not O(n log n)! This is often the optimal approach when you have all elements upfront rather than inserting one-by-one.

🌳 Decision Tree: When to Use Heaps

Use this decision tree when analyzing a new LeetCode problem:

Does the problem mention...?
├─ "K largest/smallest elements"? ──────────────────► Use HEAP (size K)
│
├─ "Median" or "middle element"? ──────────────────► Use TWO HEAPS
│
├─ "Merge K sorted..."? ───────────────────────────► Use MIN HEAP
│
├─ "Closest points/numbers"? ──────────────────────► Use HEAP (size K)
│
├─ "Next greater/smaller"? ────────────────────────► Consider MONOTONIC STACK first
│
├─ "Continuously find min/max" while adding data? ─► Use HEAP
│
├─ "Schedule" or "priority" mentioned? ────────────► Use PRIORITY QUEUE (heap)
│
└─ Need to process elements in priority order? ────► Use HEAP

Additional checks:
✓ Are you repeatedly finding extremums? → Heap likely optimal
✓ Do you need the middle element? → Two heaps pattern
✓ Is input size large but K is small? → Heap of size K
✓ Are you merging sorted structures? → Min heap with pointers

🎯 Key Principle: If you're doing more than 3-4 linear scans to find min/max elements, you probably need a heap.

🔧 Step-by-Step Problem-Solving Framework

When you encounter a potential heap problem on LeetCode, follow this systematic approach:

Step 1: Pattern Recognition (30 seconds)

Scan the problem statement for these keywords:

"K largest/smallest/closest" → Almost always a heap
"Priority" or "importance" → Priority queue
"Median" or "middle" → Two heaps
"Merge" + "sorted" → Min heap
"Continuously" or "stream" → Dynamic structure needed

Step 2: Identify Heap Type (15 seconds)

Decide which heap configuration:

Goal	Heap Type	Why
🔝 K largest elements	Min heap of size K	Keep smallest of the large ones at top for easy removal
🔽 K smallest elements	Max heap of size K	Keep largest of the small ones at top for easy removal
📊 Running median	Two heaps (max + min)	Split data at median point
🔀 Merge K lists	Min heap	Always extract global minimum
⏰ Task scheduling	Min heap by time/priority	Process earliest/highest priority first

💡 Mental Model: For "K largest", use a min heap because you want to quickly remove the smallest element when you find something larger. It's counterintuitive but powerful!

Step 3: Design Data Structure (1-2 minutes)

Ask yourself:

What goes into the heap? (values, tuples, objects?)
What's the comparison key? (value itself, distance, time?)
Do I need auxiliary data structures? (hash map for lookups?)
What's the heap size constraint? (fixed K, or unbounded?)

Step 4: Implement Core Operations (5-10 minutes)

Write the heap manipulation code with these considerations:

## Template for K-sized heap problems
import heapq

def solve_k_problem(data, k):
    heap = []  # For max heap, negate values
    
    for item in data:
        # Add to heap
        heapq.heappush(heap, (priority, item))
        
        # Maintain size constraint
        if len(heap) > k:
            heapq.heappop(heap)
    
    # Extract results
    return [item for priority, item in heap]

## Template for two-heaps (median) problems
class MedianFinder:
    def __init__(self):
        self.small = []  # Max heap (negate values)
        self.large = []  # Min heap
    
    def add_num(self, num):
        # Always add to small first
        heapq.heappush(self.small, -num)
        
        # Balance: largest of small ≤ smallest of large
        if self.small and self.large and (-self.small[0] > self.large[0]):
            val = -heapq.heappop(self.small)
            heapq.heappush(self.large, val)
        
        # Maintain size: small has at most 1 more element
        if len(self.small) > len(self.large) + 1:
            val = -heapq.heappop(self.small)
            heapq.heappush(self.large, val)
        elif len(self.large) > len(self.small):
            val = heapq.heappop(self.large)
            heapq.heappush(self.small, -val)
    
    def find_median(self):
        if len(self.small) > len(self.large):
            return -self.small[0]
        return (-self.small[0] + self.large[0]) / 2

Step 5: Optimize and Edge Cases (2-3 minutes)

⚠️ Check for these common issues:

Empty heap operations: Always verify heap is non-empty before popping
K larger than array size: Handle edge case when k >= len(data)
Duplicate values: Ensure your comparison handles ties correctly
Custom objects: Implement __lt__ method or use tuples
Max heap in Python: Remember to negate values

💻 Language-Specific Cheat Sheet

Python (heapq)

import heapq

## Min heap (default)
min_heap = []
heapq.heappush(min_heap, 5)
heapq.heappush(min_heap, (priority, value))  # Tuple comparison
min_val = heapq.heappop(min_heap)
min_val = min_heap[0]  # Peek without removal

## Max heap (negate values)
max_heap = []
heapq.heappush(max_heap, -5)
max_val = -heapq.heappop(max_heap)

## Build heap from existing list
data = [3, 1, 4, 1, 5]
heapq.heapify(data)  # O(n) - in-place conversion

## N largest/smallest (returns list)
result = heapq.nlargest(k, data)
result = heapq.nsmallest(k, data)
result = heapq.nlargest(k, data, key=lambda x: x.value)  # Custom key

## Common patterns
## Pattern 1: Maintain K-sized heap
if len(heap) > k:
    heapq.heappop(heap)

## Pattern 2: Push and pop in one operation
heapq.heappushpop(heap, item)  # More efficient than push then pop
heapq.heapreplace(heap, item)  # Pop first, then push (heap must be non-empty)

Java (PriorityQueue)

import java.util.PriorityQueue;
import java.util.Collections;

// Min heap (default)
PriorityQueue<Integer> minHeap = new PriorityQueue<>();
minHeap.offer(5);
minHeap.add(5);  // Alternative
int min = minHeap.poll();  // Remove and return
int min = minHeap.peek();  // View without removal

// Max heap (reverse comparator)
PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Collections.reverseOrder());

// Custom comparator
PriorityQueue<int[]> heap = new PriorityQueue<>((a, b) -> a[0] - b[0]);
PriorityQueue<Point> heap = new PriorityQueue<>(
    (a, b) -> Integer.compare(a.distance, b.distance)
);

// Initialize with capacity and comparator
PriorityQueue<Integer> heap = new PriorityQueue<>(k, (a, b) -> b - a);

// Common operations
heap.size();
heap.isEmpty();
heap.clear();

// Maintain K-sized heap
if (heap.size() > k) {
    heap.poll();
}

C++ (priority_queue)

#include <queue>
#include <vector>
using namespace std;

// Max heap (default in C++)
priority_queue<int> maxHeap;
maxHeap.push(5);
int max = maxHeap.top();
maxHeap.pop();

// Min heap (use greater comparator)
priority_queue<int, vector<int>, greater<int>> minHeap;

// Custom comparator
auto cmp = [](const pair<int,int>& a, const pair<int,int>& b) {
    return a.first > b.first;  // Min heap by first element
};
priority_queue<pair<int,int>, vector<pair<int,int>>, decltype(cmp)> heap(cmp);

// Using struct for complex objects
struct Point {
    int x, y, dist;
    bool operator<(const Point& other) const {
        return dist > other.dist;  // Note: reversed for min heap
    }
};
priority_queue<Point> heap;

// Common operations
heap.size();
heap.empty();

📚 Recommended Practice Problems (Ordered by Difficulty)

🟢 Easy - Foundation Building

[703] Kth Largest Element in a Stream
- Pattern: Maintain K-sized min heap
- Concepts: Heap initialization, size maintenance
- Time: 15-20 minutes
[1046] Last Stone Weight
- Pattern: Max heap with simulation
- Concepts: Max heap operations, basic manipulation
- Time: 10-15 minutes
[1337] The K Weakest Rows in a Matrix
- Pattern: K smallest with custom comparator
- Concepts: Tuple heaps, custom sorting
- Time: 15-20 minutes

🟡 Medium - Core Patterns

[215] Kth Largest Element in an Array
- Pattern: K largest with min heap
- Concepts: Classic K-element problem
- Time: 15-20 minutes
- 💡 Must-solve: This is the quintessential heap problem
[973] K Closest Points to Origin
- Pattern: K smallest distances
- Concepts: Distance calculation, tuple heaps
- Time: 20-25 minutes
[347] Top K Frequent Elements
- Pattern: Frequency counting + heap
- Concepts: Hash map + heap combination
- Time: 20-25 minutes
[295] Find Median from Data Stream
- Pattern: Two heaps (max + min)
- Concepts: Balanced heaps, dynamic median
- Time: 30-40 minutes
- 💡 Must-solve: Master the two-heap technique
[373] Find K Pairs with Smallest Sums
- Pattern: Min heap with pairs
- Concepts: Multiple pointers, avoiding duplicates
- Time: 30-35 minutes
[378] Kth Smallest Element in a Sorted Matrix
- Pattern: Merge K sorted lists variant
- Concepts: Matrix traversal with heap
- Time: 25-30 minutes
[621] Task Scheduler
- Pattern: Frequency-based scheduling
- Concepts: Greedy + heap, cooldown management
- Time: 35-45 minutes

🔴 Hard - Advanced Applications

[23] Merge k Sorted Lists
- Pattern: Classic K-way merge
- Concepts: Linked list + heap, pointer management
- Time: 30-40 minutes
- 💡 Must-solve: Fundamental merge pattern
[502] IPO
- Pattern: Two heaps with conditional movement
- Concepts: Dynamic filtering, optimization
- Time: 40-50 minutes
[632] Smallest Range Covering Elements from K Lists
- Pattern: K pointers with min/max tracking
- Concepts: Range optimization, multi-heap coordination
- Time: 45-60 minutes
[358] Rearrange String k Distance Apart
- Pattern: Frequency-based reconstruction
- Concepts: Greedy selection, constraint satisfaction
- Time: 40-50 minutes
[786] K-th Smallest Prime Fraction
- Pattern: Sorted matrix merge variant
- Concepts: Mathematical optimization, fraction comparison
- Time: 45-55 minutes

🧠 Mnemonic for Practice: "K-M-T-S" - Kth elements, Merge operations, Two heaps, Scheduling. These four categories cover 90% of heap problems.

🎯 Common Code Patterns Reference

Pattern 1: K Largest/Smallest Elements

def k_largest(nums, k):
    """Find K largest elements using min heap of size K"""
    heap = []
    
    for num in nums:
        heapq.heappush(heap, num)
        if len(heap) > k:
            heapq.heappop(heap)  # Remove smallest
    
    return heap  # All remaining are K largest

## Variation: K smallest using max heap
def k_smallest(nums, k):
    """Find K smallest elements using max heap of size K"""
    heap = []
    
    for num in nums:
        heapq.heappush(heap, -num)  # Negate for max heap
        if len(heap) > k:
            heapq.heappop(heap)  # Remove largest
    
    return [-x for x in heap]  # Un-negate results

When to use: Any problem mentioning "K largest/smallest/closest" elements.

Pattern 2: Merge K Sorted Structures

def merge_k_sorted_lists(lists):
    """Merge K sorted lists using min heap"""
    heap = []
    result = []
    
    # Initialize: Add first element from each list
    for i, lst in enumerate(lists):
        if lst:  # Non-empty list
            heapq.heappush(heap, (lst[0], i, 0))  # (value, list_idx, element_idx)
    
    # Extract min and add next from same list
    while heap:
        val, list_idx, elem_idx = heapq.heappop(heap)
        result.append(val)
        
        # Add next element from same list
        if elem_idx + 1 < len(lists[list_idx]):
            next_val = lists[list_idx][elem_idx + 1]
            heapq.heappush(heap, (next_val, list_idx, elem_idx + 1))
    
    return result

When to use: Problems involving merging multiple sorted arrays, lists, or streams.

Pattern 3: Two Heaps for Median

class MedianFinder:
    """Maintain running median using two heaps"""
    
    def __init__(self):
        self.small = []  # Max heap - left half (negate values)
        self.large = []  # Min heap - right half
    
    def add_num(self, num):
        # Add to appropriate heap
        if not self.small or num <= -self.small[0]:
            heapq.heappush(self.small, -num)
        else:
            heapq.heappush(self.large, num)
        
        # Rebalance to maintain size constraint
        if len(self.small) > len(self.large) + 1:
            val = -heapq.heappop(self.small)
            heapq.heappush(self.large, val)
        elif len(self.large) > len(self.small):
            val = heapq.heappop(self.large)
            heapq.heappush(self.small, -val)
    
    def find_median(self):
        if len(self.small) > len(self.large):
            return -self.small[0]
        return (-self.small[0] + self.large[0]) / 2.0

When to use: Problems requiring dynamic median calculation or splitting data at a threshold.

⚠️ Final Critical Reminders

Memory

⚠️ Python heapq only provides min heap - Negate values for max heap behavior
⚠️ C++ priority_queue defaults to max heap - Use greater<int> for min heap
⚠️ For K largest, use MIN heap - Counterintuitive but correct!
⚠️ Two-heap median requires size balancing - Small heap has at most 1 extra element
⚠️ Heap search is O(n) - Combine with hash map if you need fast lookups

Performance

🎯 Key Principle:

Building heap from array: O(n) using heapify()
Inserting n elements one-by-one: O(n log n)
Always prefer heapify() when you have all elements upfront!

Problem-Solving

✅ Correct thinking: "I need the K largest, so I'll use a min heap to efficiently remove smaller elements."

❌ Wrong thinking: "I need the K largest, so I'll use a max heap to get large elements."

💡 Remember: The heap type is opposite to what you're collecting:

K largest → min heap (remove small ones)
K smallest → max heap (remove large ones)

🚀 Next Steps and Practical Applications

Now that you've mastered heaps and priority queues, here are your next steps:

1. Reinforce Through Practice

Commit to solving at least:

✅ 5 easy problems (build confidence)
✅ 10 medium problems (master core patterns)
✅ 3 hard problems (challenge yourself)

Focus on pattern recognition rather than memorization. After each problem, identify which of the patterns you used.

2. Combine with Other Techniques

Heaps rarely appear in isolation in real-world scenarios. Practice combining them with:

🔧 Hash maps - For O(1) lookups alongside heap operations (lazy deletion pattern)

🔧 Graphs - Dijkstra's algorithm uses priority queues for shortest path

🔧 Dynamic programming - Some optimization problems combine DP state with heap-based greedy choices

🔧 Binary search - Problems like "minimize maximum" sometimes use binary search with heap validation

3. Real-World Applications

Understanding heaps opens doors to solving real-world problems:

💡 Real-World Example: System Design - Load balancers use heaps to track server loads and route requests to the least busy server.

💡 Real-World Example: Database Query Optimization - Many databases use heap-based structures to maintain sorted indexes and efficiently execute "ORDER BY" and "TOP K" queries.

💡 Real-World Example: Operating Systems - Process schedulers use priority queues to determine which task runs next based on priority and waiting time.

🤔 Did you know? Google's search algorithm originally used a heap-based approach to efficiently compute and maintain PageRank scores for billions of web pages, only processing the most important pages first!

📊 Final Summary Table

Problem Type	Heap Configuration	Time Complexity	Space Complexity	Key Insight
🎯 K largest elements	Min heap (size K)	O(n log k)	O(k)	Keep smallest of large at top
🎯 K smallest elements	Max heap (size K)	O(n log k)	O(k)	Keep largest of small at top
🎯 Running median	Two heaps (max+min)	O(log n) per add	O(n)	Balance heaps at median point
🎯 Merge K sorted	Min heap with pointers	O(n log k)	O(k)	Always extract global minimum
🎯 Task scheduling	Priority queue	O(n log n)	O(n)	Process by priority order
🎯 Top K frequent	Hash map + heap	O(n log k)	O(n)	Count first, then heap
🎯 Closest points	Min/max heap (size K)	O(n log k)	O(k)	Distance as comparison key

🎓 Conclusion

You now possess a comprehensive understanding of heaps and priority queues that extends far beyond basic syntax. You can:

✅ Recognize heap-appropriate problems from keywords and patterns

✅ Choose the correct heap type and configuration for any scenario

✅ Implement efficient solutions using multiple programming languages

✅ Optimize your solutions by avoiding common pitfalls

✅ Combine heaps with other data structures for complex problems

The journey from "I know what a heap is" to "I can solve heap problems efficiently" is now complete. Your systematic approach—from pattern recognition through implementation to optimization—will serve you well not just in coding interviews, but in building performant real-world systems.

💡 Pro Tip: Bookmark this checklist and refer to it when solving your next 10-15 heap problems. After that, the patterns will become second nature, and you'll recognize heap opportunities instantly.

Remember: Heaps are not just about maintaining sorted order; they're about efficiently making optimal choices repeatedly. This principle underlies everything from Dijkstra's algorithm to operating system schedulers to real-time data processing systems.

Now go forth and conquer those LeetCode problems! 🚀

🧠 Final Mnemonic: "HEAP" - Highest/lowest priority, Extremum tracking, Always O(log n), Prune efficiently. This captures the essence of when and why to use heaps.

📝

Ready to practice?

This lesson has 20 questions to help you learn