You are viewing a preview of this lesson. Sign in to start learning
Back to Cache is King

LRU and Eviction Policies

Implementing Least Recently Used and other algorithms to manage limited memory resources

Introduction to Cache Eviction: Why LRU Matters

Have you ever wondered why your smartphone doesn't slow to a crawl even though you've installed hundreds of apps? Or why Netflix can stream video smoothly without loading every frame from distant servers? The secret lies in caching—the art of keeping frequently needed data close at hand. But here's the problem: memory is expensive, and caches are deliberately small. This creates a fundamental challenge that every system must solve: when your cache fills up, what do you throw away? Understanding cache eviction policies is crucial for building performant applications, and we're offering free flashcards throughout this lesson to help you master these concepts.

The Finite Memory Constraint: Why We Can't Keep Everything

Imagine your desk while working on a complex project. You'd love to have every reference book, document, and tool within arm's reach, but physical space constrains you. You keep the most important items on your desk and store the rest on shelves across the room. Caches face the same reality—they're fast, expensive memory with strict size limits.

🎯 Key Principle: Caches exist in a performance-cost trade-off space. Faster memory (like CPU L1 cache or RAM) costs exponentially more per byte than slower storage (like SSDs or network-attached databases).

Consider a typical web application. Your database might hold terabytes of data, but your application server's memory cache might only hold a few gigabytes. A browser might cache hundreds of megabytes of resources, while the actual internet contains exabytes of content. This dramatic size mismatch means that caches must constantly make decisions about what to keep and what to discard.

Storage Hierarchy (Speed vs. Capacity):

CPU L1 Cache:  [====]              (32-64 KB, <1ns access)
CPU L2 Cache:  [========]          (256KB-1MB, ~3ns access)
CPU L3 Cache:  [==============]    (8-32MB, ~10ns access)
RAM:           [========================] (8-32GB, ~100ns)
SSD:           [====================================...] (512GB-2TB, ~100μs)
Network/DB:    [==========================================...] (∞, ~10ms)

               Fast & Expensive ←→ Slow & Cheap

The Performance Impact: Cache Hits vs. Cache Misses

The performance difference between a cache hit (finding data in cache) and a cache miss (fetching from slower storage) can be staggering. This isn't just about milliseconds—it's about the difference between a responsive application and one that frustrates users.

💡 Real-World Example: A typical web application might serve a database query from cache in 1 millisecond, while fetching the same data from a database takes 50 milliseconds. If your cache has a 90% hit rate, your average response time is (0.9 × 1ms) + (0.1 × 50ms) = 5.9ms. Drop that hit rate to 70%, and your average jumps to (0.7 × 1ms) + (0.3 × 50ms) = 15.7ms—nearly 3× slower!

🤔 Did you know? Studies show that 100ms of additional latency can reduce conversion rates by 7% in e-commerce applications. For a company generating $1M in daily revenue, poor cache hit rates could cost $70,000 per day.

The mathematics of cache hit rates reveal why eviction policies matter so profoundly:

  • Hit rate: Percentage of requests served from cache
  • Miss rate: Percentage of requests requiring slower storage access (1 - hit rate)
  • Effective access time: (hit rate × cache time) + (miss rate × backend time)

Even small improvements in hit rate can dramatically improve application performance and reduce infrastructure costs by decreasing load on backend systems.

The Cache Eviction Decision Problem

When your cache fills to capacity and new data arrives, you face a critical decision: which existing item should you evict (remove) to make room? This seemingly simple question has profound implications for system performance.

❌ Wrong thinking: "It doesn't matter much which item I remove—the cache will still work."

✅ Correct thinking: "Every eviction decision affects future hit rates. Removing the wrong item means I'll need to fetch it again soon, creating unnecessary cache misses."

The cache eviction problem is essentially a prediction challenge. You're trying to predict which cached items are least likely to be needed in the near future. Get it right, and your hit rate stays high. Get it wrong, and you create a cascade of expensive cache misses.

💡 Mental Model: Think of cache eviction like deciding which apps to keep on your phone's home screen. You want quick access to the apps you use most, but you only have limited prime screen real estate. The apps you evict to deeper screens take longer to access—similar to the penalty of a cache miss.

The Landscape of Eviction Strategies

Computer scientists have developed numerous eviction policies to solve this problem, each with different assumptions about access patterns:

📋 Quick Reference Card: Common Eviction Policies

🎯 Policy 📝 Strategy 💪 Best For ⚡ Complexity
🔄 FIFO Evict oldest entry Simple, predictable access O(1)
🎲 Random Evict random entry Unpredictable workloads O(1)
🕐 LRU Evict least recently used Temporal locality O(1)*
🔥 LFU Evict least frequently used Skewed frequency patterns O(log n)
⏱️ TTL Evict after time expires Time-sensitive data O(1)

*with optimized implementation

Least Recently Used (LRU) has emerged as one of the most popular and effective policies because it leverages a powerful principle called temporal locality—the observation that data accessed recently is likely to be accessed again soon. This makes LRU particularly effective for real-world workloads where access patterns aren't random.

⚠️ Common Mistake: Assuming the "best" eviction policy is universal. Different workloads demand different strategies. LRU excels with temporal locality but can struggle with sequential scans or cyclic patterns. ⚠️

🎯 Key Principle: The goal of any eviction policy is to maximize hit rate by keeping the most valuable items in cache. "Value" depends on access patterns—frequency, recency, or sometimes external factors like computation cost or data size.

In the sections ahead, we'll dive deep into how LRU works, why it's so effective for common access patterns, and how to implement it efficiently. We'll also explore when LRU might not be the best choice and what alternatives to consider.

Understanding LRU and Core Eviction Policies

When your cache fills up, something must go. But what? This decision—seemingly simple—can make or break your application's performance. Let's explore how Least Recently Used (LRU) and other eviction policies solve this critical problem.

The Core Idea: Why Recency Matters

LRU operates on a beautifully simple premise: if you haven't used something recently, you probably won't need it soon. This reflects the principle of temporal locality—the observation that programs tend to access the same data repeatedly within short time windows. Think about your web browser: you're far more likely to revisit the current article than one you opened three days ago.

🎯 Key Principle: LRU exploits temporal locality by treating recency as a predictor of future access. The least recently used item is the first candidate for eviction.

When you access an item in an LRU cache, it moves to the "most recently used" position. Every access updates this ordering. When the cache reaches capacity and you need to add something new, the item that hasn't been touched for the longest time gets evicted.

How LRU Tracks Recency: The Elegant Data Structure

The genius of LRU lies in achieving O(1) time complexity for both lookups and updates. This requires a clever combination of two data structures:

Hash Map + Doubly Linked List

        Hash Map (key -> node)
        +---+---+---+---+
        |   |   |   |   |
        +---+---+---+---+
          |   |   |   |
          v   v   v   v
        
Doubly Linked List (MRU to LRU)
HEAD <-> [D] <-> [A] <-> [C] <-> [B] <-> TAIL
         ↑                         ↑
      Most Recently              Least Recently
         Used                      Used

The hash map provides O(1) lookups: given a key, instantly find the corresponding node in the linked list. The doubly linked list maintains access order: the head contains the most recently used item, the tail holds the least recently used.

When you access key "A":

  1. Hash map locates node A in O(1)
  2. Remove A from its current position (O(1) with doubly linked list)
  3. Insert A at the head (O(1))
  4. Total: O(1)

When eviction is needed:

  1. Remove node at tail (O(1))
  2. Delete corresponding hash map entry (O(1))
  3. Total: O(1)

💡 Mental Model: Think of the linked list as a queue where people keep cutting to the front every time they're served. The person at the back who hasn't been served longest gets kicked out when space runs out.

⚠️ Common Mistake: Using only a hash map with timestamps seems simpler, but finding the minimum timestamp for eviction requires O(n) time—unacceptable for cache performance! ⚠️

The Eviction Policy Landscape

LRU isn't the only strategy. Each policy makes different assumptions about access patterns:

FIFO (First In, First Out) treats the cache like a simple queue. The oldest item gets evicted first, regardless of how recently it was accessed. FIFO is trivial to implement—just a circular buffer—but ignores actual usage patterns entirely.

❌ Wrong thinking: "Age equals relevance—old items should go." ✅ Correct thinking: "Recency of use predicts future use better than insertion time."

LFU (Least Frequently Used) counts access frequency and evicts items with the lowest count. LFU excels when certain items are consistently popular over long periods. However, it suffers from cache pollution: an item accessed heavily in the past but no longer needed will stick around indefinitely because of its high historical count.

💡 Real-World Example: A news site's homepage is accessed thousands of times during a breaking story. Hours later, when interest fades, LFU keeps it cached based on historical frequency while fresh content struggles to enter.

Random eviction simply picks a victim at random. Surprisingly, random replacement isn't terrible—it's simple, fast, and avoids worst-case scenarios that can plague other policies. For certain workloads with no clear access patterns, random can even compete with sophisticated algorithms.

📋 Quick Reference Card: Eviction Policy Comparison

Policy ⏱️ Time 💾 Space 🎯 Best For ⚠️ Weakness
LRU O(1) O(n) Temporal locality workloads Sequential scans thrash cache
FIFO O(1) O(1) Simple, predictable access Ignores actual usage
LFU O(log n)* O(n) Stable popularity patterns Cache pollution from historical data
Random O(1) O(1) Uniform random access No exploitation of patterns

*With priority queue implementation

Beyond Pure LRU: Practical Variations

Pure LRU has limitations. A sequential scan through data larger than the cache will evict every useful item, replacing them with data you'll never access again. Real systems often use enhanced variants:

LRU-K tracks the time of the K-th most recent access instead of just the last access. This provides better frequency awareness. If K=2, an item must be accessed twice before it's truly "valuable." This filters out one-time scans while still respecting recency.

2Q (Two Queues) uses separate queues: items enter a FIFO "probationary" queue first. Only if accessed again do they promote to the main LRU queue. This creates a multi-tiered cache that adapts to both frequency and recency.

🤔 Did you know? PostgreSQL uses a variant called Clock Sweep (approximating LRU) because true LRU's linked list manipulation creates too much contention in multi-threaded environments. Sometimes "good enough" beats "optimal" when implementation costs matter.

💡 Pro Tip: Modern Redis uses LRU approximation by sampling random keys rather than maintaining perfect ordering. This trades small accuracy for massive scalability—Redis can handle millions of keys without the overhead of updating linked list pointers.

Space Complexity Considerations

LRU's O(n) space overhead isn't just the cached data itself—it includes pointers for the doubly linked list (two per node) and hash map entries. For small cache values, this metadata can exceed the actual data size. A cache of 1000 small integers might use 10x more memory for LRU bookkeeping than for the integers themselves.

⚠️ Common Mistake: Ignoring metadata overhead when calculating cache capacity. A "1GB cache" might only store 700MB of actual data after accounting for LRU structures. ⚠️

🧠 Mnemonic: "REAL" helps remember LRU benefits: Recency-based, Efficient O(1), Adaptive to patterns, Locality-exploiting.

The choice of eviction policy fundamentally shapes your cache's behavior. LRU's dominance stems from its sweet spot: theoretically sound (temporal locality), practically efficient (O(1) operations), and broadly applicable (most workloads exhibit some recency patterns). Yet understanding alternatives ensures you recognize when LRU isn't the answer—and what to reach for instead.

Implementing and Optimizing Eviction Policies

With a solid understanding of LRU theory, it's time to translate concepts into working code and production systems. This section provides practical implementation patterns, library recommendations, and optimization strategies that will help you build high-performance caching layers.

Step-by-Step LRU Implementation

Implementing LRU from scratch requires two core data structures: a doubly-linked list for maintaining access order and a hash map for O(1) lookups. Here's the fundamental pattern:

LRU Cache Architecture:

HashMap: key → node pointer
    ┌─────┬─────┬─────┬─────┐
    │ k1  │ k2  │ k3  │ k4  │
    └──│──┴──│──┴──│──┴──│──┘
       ↓     ↓     ↓     ↓
Doubly-Linked List (MRU → LRU):
   HEAD ⇄ [k4:v4] ⇄ [k2:v2] ⇄ [k3:v3] ⇄ [k1:v1] ⇄ TAIL
   (most recent)                        (least recent)

🎯 Key Principle: Every cache access moves the accessed node to the head of the list. When capacity is exceeded, remove the tail node.

Core operations include:

🔧 get(key): Look up in hash map → Move node to head → Return value
🔧 put(key, value): If exists, update and move to head. If new and at capacity, remove tail node and its hash entry, then insert new node at head
🔧 evict(): Remove tail node from list and hash map

💡 Pro Tip: Use sentinel head/tail nodes to eliminate null checks and simplify edge cases. This makes your code more robust and easier to maintain.

Using Production-Ready Libraries

While custom implementations teach fundamentals, production systems should leverage battle-tested libraries:

Redis Eviction Policies offer six main configurations:

Policy 🎯 Behavior 🔧 Best For 💡
allkeys-lru Evicts least recently used keys across entire keyspace General-purpose caching with mixed workloads
volatile-lru Evicts LRU only among keys with TTL set Caching with explicit expiration requirements
allkeys-lfu Evicts least frequently used keys Access patterns with repeated hot items
volatile-ttl Evicts keys with shortest remaining TTL Time-sensitive data prioritization
allkeys-random Random eviction Uniform access patterns
noeviction Returns errors when memory full Persistent storage use cases

Configure Redis eviction with: maxmemory-policy allkeys-lru

Memcached uses a slab-based LRU implementation with per-slab eviction, making it excellent for workloads with consistent object sizes. Framework-specific solutions like Guava Cache (Java) and functools.lru_cache (Python) provide decorator patterns for method-level caching.

💡 Real-World Example: A news website uses volatile-lru for article caches with 1-hour TTLs, ensuring fresh content while automatically evicting old articles that readers stop accessing.

Decision Framework: Matching Policies to Workloads

Read-heavy workloads (90%+ reads) benefit most from LRU or LFU since access patterns strongly predict future use. The cache hit rate typically reaches 80-95% when properly sized.

Write-heavy workloads (frequent updates) require different strategies:

Write-through caching with TTL-based eviction prevents stale data
Write-behind caching with size-based LRU improves write performance
✅ Consider LFU if writes are concentrated on popular items

Mixed workloads demand careful analysis:

Decision Tree:

          Access Pattern Analysis
                    │
        ┌───────────┴───────────┐
   Temporal locality?      Frequency matters?
        │                        │
       Yes                      Yes
        │                        │
     Use LRU                  Use LFU
        │                        │
    High hit rate          Resist one-time scans

🤔 Did you know? LinkedIn uses a segmented LRU approach with separate caches for hot (frequently accessed) and warm (occasionally accessed) data, achieving 30% better hit rates than standard LRU.

Monitoring Cache Performance

Effective cache optimization requires tracking key metrics:

📊 Hit Rate = hits / (hits + misses)
🎯 Target: >80% for read-heavy, >60% for mixed workloads

📊 Eviction Rate = evictions per second
⚠️ Warning sign: Eviction rate approaching request rate indicates cache thrashing

📊 Memory Utilization = used memory / max memory
🎯 Target: 70-85% (allows headroom for traffic spikes)

📊 Average Latency = mean response time
💡 Compare cache hits vs. misses to calculate acceleration

Instrumentation pattern for monitoring:

class MonitoredLRUCache:
    def get(self, key):
        result = self._cache.get(key)
        if result:
            self.metrics.increment('cache.hits')
        else:
            self.metrics.increment('cache.misses')
        return result

Common Pitfalls and Solutions

⚠️ Mistake 1: Cache Stampede ⚠️
When a popular key expires, multiple requests simultaneously fetch and cache the same data, overwhelming the backend.

Solution: Implement request coalescing (deduplicate concurrent requests) or probabilistic early expiration (refresh before TTL expires):

## Probabilistic refresh
if time_to_expire < random() * refresh_window:
    async_refresh_cache(key)

⚠️ Mistake 2: Cache Thrashing ⚠️
Cache too small for working set → constant evictions → hit rate plummets.

Solution: Monitor eviction rate and working set size. Increase cache size if eviction rate exceeds 10% of request rate.

⚠️ Mistake 3: Ignoring Access Patterns ⚠️
Using LRU for scan-heavy workloads where sequential access evicts useful data.

Solution: Use LFU or segmented LRU that isolates scan operations from normal traffic.

SUMMARY

You now understand how to move from theory to production with cache eviction policies. You've learned to implement LRU using hash maps and linked lists, leverage production libraries like Redis with appropriate eviction configurations, and match policies to workload patterns. You can monitor critical metrics (hit rate, eviction rate, latency) and avoid common pitfalls like cache stampede and thrashing.

📋 Quick Reference Card:

Scenario 🎯 Recommended Policy 🔧 Key Metric 📊
General caching allkeys-lru Hit rate >80%
Session storage volatile-lru + TTL Eviction rate <10%
Analytics queries allkeys-lfu Memory utilization 70-85%
Rate limiting volatile-ttl Latency p99 <5ms

⚠️ Critical Reminders:

  • Monitor eviction rate as your primary health signal - high rates indicate undersized cache
  • Always implement request coalescing for popular keys to prevent stampedes
  • Match your eviction policy to actual access patterns, not assumptions

Next Steps:

🔧 Implement monitoring for your existing caches - you can't optimize what you don't measure
🔧 A/B test eviction policies in staging with production traffic patterns
🔧 Explore advanced patterns like multi-tier caching (L1: LFU, L2: LRU) for complex workloads

With these implementation strategies and optimization techniques, you're equipped to build cache systems that dramatically improve application performance while avoiding common pitfalls. Start with simple LRU implementations, measure relentlessly, and iterate based on your unique access patterns.