Match each system design concept to its correct description in the context of a URL shortener:

!MATCH[["Cache-aside pattern","App checks cache first; on miss, queries DB then populates cache"],["LRU eviction","Removes keys not accessed recently when Redis memory is full"],["Primary-replica replication","One node accepts writes; copies propagate to read-serving nodes"],["Consistent hashing","Minimizes data movement when adding new database shards"],["Cache stampede","Expired cache entry causes simultaneous DB requests from many threads"],["KGS (Key Generation Service)","Pre-generates random keys in a pool before any URL is shortened"]]

URL Shortener (e.g. TinyURL)

Design encoding, redirection, analytics storage, and handle 100M+ URLs with low-latency reads.

Why URL Shorteners Matter and What We're Building

You've clicked a link like https://bit.ly/3xK9mPq without thinking twice. But behind that tiny string of characters lives one of the most elegantly designed distributed systems in modern software engineering. If you're preparing for system design interviews, free flashcards are scattered throughout this lesson to help you retain the core concepts — but first, let's make you genuinely care about why URL shorteners are worth your deep attention.

Think back to the last time you tried to share a link in a tweet, a text message, or on a printed flyer. Maybe it was a product page buried six levels deep in an e-commerce site, a Google Maps route, or a PDF buried in a corporate document management system. The URL was probably something like:

https://www.example-store.com/products/category/electronics/laptops/gaming/asus-rog-strix-g15-advantage-edition-ryzen-9-6900hx-2022-model?color=eclipse-gray&size=15inch&ref=homepage_banner&utm_source=google&utm_medium=cpc&utm_campaign=back_to_school_2024

That URL is 221 characters long. Twitter's character limit is 280. You'd burn 79% of your message budget on a single link. URL shorteners solve this problem — and in solving it elegantly at scale, they become one of the richest topics in all of system design.

The Real-World Scale That Should Impress You

Before we dive into architecture, let's ground ourselves in the actual scale of these systems, because the numbers are staggering and they directly inform every design decision we'll make.

Bitly, one of the oldest and most widely used URL shorteners, reports processing over 10 billion clicks per month — that's roughly 3,800 redirects every single second, around the clock, every day of the year. TinyURL has been quietly shortening URLs since 2002 and has handled hundreds of millions of short links. Twitter's own shortener, t.co, wraps every single URL posted to the platform — and with roughly 500 million tweets per day, that's an enormous volume of link creation and resolution happening continuously.

💡 Real-World Example: When a major news story breaks and millions of people simultaneously click the same shortened link, the URL shortener must handle a sudden traffic spike — potentially 100x normal load — without going down. This is called a hot key problem, and it's one of the most interesting edge cases we'll explore in later sections.

These systems aren't toys. They're mission-critical infrastructure that, if they go down, break links embedded in emails, printed marketing materials, QR codes on product packaging, and social media posts that can never be edited. Durability isn't just a nice-to-have — it's a business requirement.

Defining What We're Actually Building

A well-structured system design interview starts with crystal-clear requirements. Let's establish exactly what our URL shortener needs to do.

Functional Requirements

These are the behaviors the system must exhibit — the features a product manager would put on a roadmap:

🔧 Core feature 1 — URL shortening: Given a long URL as input, the system generates and returns a short alias. For example, https://www.example.com/very/long/path becomes https://short.ly/aB3xK9.

🔧 Core feature 2 — URL redirection: When a user visits the short URL, the system looks up the original URL and redirects the browser to it. This must happen with low latency — users notice delays over 200ms.

🔧 Core feature 3 — Custom aliases: Premium users can request a specific short code, like https://short.ly/my-product-launch instead of a randomly generated one.

🔧 Optional feature — Link expiration: Short URLs can be set to expire after a certain date or number of clicks, after which they return a 410 Gone response.

🔧 Optional feature — Analytics: Track how many times each link was clicked, from which countries, and via which referrers.

Non-Functional Requirements

These are the quality attributes of the system — the things that determine whether it's actually production-worthy:

📚 High Availability: The redirect service must have 99.99% uptime or better. A few hours of downtime per year is acceptable in most SLAs; a few minutes might not be, depending on the contract.

📚 Low Latency: Redirects should complete in under 100 milliseconds at the 99th percentile. This means we'll need aggressive caching.

📚 Durability: Once a short URL is created, it must never be lost. We cannot afford to silently drop records.

📚 Scalability: The system must scale horizontally to handle growing traffic without architectural changes.

📚 Eventual Consistency is acceptable for writes: If a newly created URL takes a few hundred milliseconds to propagate across all regions, that's fine. Users don't expect instant global consistency.

🎯 Key Principle: In system design, separating functional from non-functional requirements isn't just academic housekeeping. It directly determines which technologies you choose and which tradeoffs you make. A system optimized purely for features but not for latency will fail in production.

Back-of-Envelope Estimation: Doing the Math

One of the most important skills in a system design interview is back-of-envelope estimation — the ability to quickly calculate rough but reasonable numbers that guide your architecture. Interviewers aren't looking for precision; they're looking for structured thinking.

Let's work through the numbers together for a URL shortener at modest scale.

Traffic Estimation

Let's assume our service has:

100 million new URLs shortened per day (write traffic)
Read-to-write ratio of 100:1 (reads are much more common — people create a link once and click it many times)
This gives us 10 billion redirects per day (read traffic)

Converting to Queries Per Second (QPS):

Write QPS  = 100,000,000 / 86,400 seconds ≈ 1,160 writes/second
Read QPS   = 10,000,000,000 / 86,400 seconds ≈ 115,740 reads/second
Peak QPS   = ~2x average, so ~230,000 reads/second at peak

This is a read-heavy system by a factor of 100. That single observation drives a huge architectural decision: we need aggressive caching on the read path, and we can afford to be slightly less optimized on the write path.

💡 Mental Model: Think of a URL shortener like a library. People donate books (write) occasionally, but people check out books (read) constantly. You'd invest heavily in fast checkout kiosks (read caches), not in making donations marginally faster.

Storage Estimation

Each URL record needs to store:

Short key: ~7 characters = 7 bytes
Long URL: average ~200 characters = 200 bytes
Creation timestamp: 8 bytes
Expiration timestamp: 8 bytes
User ID (optional): 8 bytes
Metadata (click count, etc.): ~50 bytes

Total per record: roughly 300 bytes

Storage per year  = 100M records/day × 365 days × 300 bytes
                  = 36.5 billion records × 300 bytes
                  = ~10.95 TB per year

Storage over 5 years  = ~55 TB
Storage over 10 years = ~110 TB

This is entirely manageable. A single modern database cluster can handle hundreds of terabytes. But here's the subtlety: it's not just storage volume that matters — it's the index size. As we accumulate billions of records, our short-key lookup index grows large. If it doesn't fit in RAM, lookups hit disk and latency spikes. This is why caching becomes critical.

🤔 Did you know? The top 20% of short URLs typically account for 80% of redirect traffic — a classic Pareto distribution. This means caching just the hottest 20% of links in memory dramatically reduces database load. You don't need to cache everything; you need to cache the right things.

Bandwidth Estimation

## Quick bandwidth estimate
write_qps = 1160          # writes per second
read_qps = 115740         # reads per second

## Assume each write request is ~500 bytes (long URL + metadata)
## Assume each read response is ~300 bytes (redirect response)

write_bandwidth = write_qps * 500   # bytes per second
read_bandwidth  = read_qps  * 300   # bytes per second

print(f"Write bandwidth: {write_bandwidth / 1_000_000:.2f} MB/s")   # ~0.58 MB/s
print(f"Read bandwidth:  {read_bandwidth  / 1_000_000:.2f} MB/s")   # ~34.72 MB/s
print(f"Total bandwidth: {(write_bandwidth + read_bandwidth) / 1_000_000:.2f} MB/s")  # ~35.3 MB/s

Running this code shows us that read bandwidth dominates at roughly 34 MB/s, while writes are less than 1 MB/s. A single modern network interface handles gigabits per second, so bandwidth alone isn't a bottleneck — but it confirms our read-heavy architecture decision.

Why This Problem Is a Classic Interview Question

System design interviews can cover anything from designing Netflix to building a distributed lock manager. So why does the URL shortener appear so consistently across companies like Google, Meta, Amazon, and Stripe? Because within its deceptively simple surface lives a remarkably complete set of distributed systems challenges.

Let's break down the hidden complexity:

1. Encoding and Uniqueness

How do you generate a short key that is unique, URL-safe, and collision-resistant at billions of records? Should you use a hash function? A counter? A random number generator? Each approach has tradeoffs in collision probability, hotspot risk, and predictability. We'll explore this deeply in Section 2.

2. Database Selection

Should you use a relational database like PostgreSQL or a NoSQL store like Cassandra or DynamoDB? The access pattern is simple (key-value lookup), but the scale is enormous. The choice affects consistency guarantees, replication strategy, and operational complexity.

3. Caching Strategy

With 100,000+ reads per second, you cannot hit your database on every request. You need a cache layer — but which cache eviction policy? How do you handle cache invalidation when a URL expires? What happens during a cache miss storm after a restart?

4. Distributed Systems Concerns

At global scale, you need multiple data centers. How do you ensure a short URL created in Europe is immediately resolvable from Asia? How do you handle replication lag? How do you prevent two servers from generating the same short key simultaneously?

5. HTTP Semantics

Should a redirect return HTTP 301 (Moved Permanently) or 302 (Moved Temporarily)? This choice has real consequences for browser caching behavior and analytics accuracy — and getting it wrong is a common interview mistake.

## Illustrating the 301 vs 302 decision in a Flask-style pseudocode
from flask import redirect, abort

def handle_redirect(short_code: str, db, analytics_enabled: bool):
    """
    301 = browser caches the redirect permanently
          -> fewer server hits, but analytics stop working after first visit
    
    302 = browser always checks back with the server
          -> every click is tracked, but more server load
    """
    original_url = db.get(short_code)
    
    if original_url is None:
        abort(404)  # Short code doesn't exist
    
    if analytics_enabled:
        # Must use 302 - we need every request to hit our server
        # so we can record the click event
        return redirect(original_url, code=302)
    else:
        # 301 is fine - browser can cache this, reduces server load
        return redirect(original_url, code=301)

This code snippet illustrates a nuanced design decision that separates strong candidates from average ones. The "correct" answer depends entirely on your requirements — and knowing why is what matters.

💡 Pro Tip: In interviews, resist the urge to give "the right answer" immediately. When your interviewer asks "what HTTP status code should you use for redirects?", say: "It depends — if we need analytics, we want 302. If we want to optimize for performance and don't need per-click tracking, 301 is better. What's our priority here?" Structured reasoning beats a memorized answer every time.

The Beautiful Simplicity That Hides Real Depth

Here's what makes URL shorteners the perfect interview problem: you can scope them to any experience level. A junior candidate can describe a single-server solution with a basic hash function. A mid-level engineer can discuss caching and basic sharding. A senior engineer can explore consistent hashing, multi-region replication, and graceful degradation under failure.

🧠 Mnemonic: Remember SAUCE to recall the five pillars of URL shortener complexity:

Scale (billions of records, millions of reads/second)
Availability (99.99% uptime, no single point of failure)
Uniqueness (collision-free key generation at scale)
Caching (hot URLs must be served from memory)
Encoding (turning long URLs into short, safe keys)

What the Journey Ahead Looks Like

Now that you understand why this problem matters and have a mental model of the scale involved, here's what we'll build across the remaining sections:

Section	What You'll Learn
🔧 Section 2: Encoding Strategies	How to generate short, unique, collision-free keys
🏗️ Section 3: System Architecture	API design, database choice, full component diagram
⚡ Section 4: Scaling Techniques	Caching layers, database sharding, replication
⚠️ Section 5: Interview Pitfalls	The mistakes that cost candidates offers
📋 Section 6: Cheat Sheet	Everything distilled into a quick-reference summary

By the end of this lesson, you won't just know how to answer a URL shortener question — you'll understand it deeply enough to adapt your answer to whatever constraints and curveballs your interviewer throws at you.

⚠️ Common Mistake: Candidates often jump straight into drawing architecture diagrams before establishing requirements. This is Mistake 1. Before you touch a whiteboard, spend 3-5 minutes asking clarifying questions and writing down your assumptions. An interviewer who watches you think through requirements carefully is already impressed — before you've drawn a single box.

📋 Quick Reference Card: Back-of-Envelope Numbers to Memorize

Metric	Value
📊 Daily writes	100 million URLs/day
📊 Daily reads	10 billion redirects/day
⚡ Write QPS	~1,160/second
⚡ Read QPS	~115,740/second
💾 Storage per year	~11 TB
💾 Storage over 10 years	~110 TB
🌐 Read-to-write ratio	100:1

The URL shortener is a microcosm of distributed systems thinking. Master it, and you've built intuition that applies to dozens of other design problems — from feature flag services to API gateways to any key-value lookup at scale. Let's go deeper.

Encoding Strategies: Turning Long URLs Into Short Keys

Before we can build a URL shortener, we need to answer one deceptively simple question: how do we turn a long URL into a short, unique key? This is the beating heart of the entire system. Get it wrong and you end up with collisions (two URLs mapping to the same short code), security vulnerabilities, or a system that breaks down at scale. Get it right and everything else falls cleanly into place.

Let's build up your intuition from the ground up, starting with the alphabet we'll use.

The Encoding Alphabet: Why Base62 Wins

When we talk about a "short code" like tinyurl.com/ab3Xk9, the characters ab3Xk9 aren't random noise — they're a number represented in a particular base (number system). Just as Base10 uses digits 0–9 (10 symbols) and hexadecimal is Base16 (0–9 plus a–f), we can choose our own alphabet.

Base62 uses exactly 62 characters: lowercase letters a-z (26), uppercase letters A-Z (26), and digits 0-9 (10). This is the gold standard for URL shorteners.

Base64 is a well-known alternative that adds two more characters — traditionally + and /. That gives you a slightly denser encoding, but here's the catch:

Base64 characters: A-Z, a-z, 0-9, +, /
                                      ↑  ↑
                          These two are URL-unsafe!

The + sign is interpreted as a space in URLs, and / is a path separator. Using them in a short code would require percent-encoding (%2B, %2F), which defeats the entire purpose of having a short URL. Base62 is URL-safe by design — every character in its alphabet is valid in a URL path with no escaping needed.

🎯 Key Principle: The density vs. safety tradeoff favors Base62. With 62 symbols and a 6-character code, you get 62⁶ = 56.8 billion unique combinations. That's more than enough for any realistic workload, and every code is clean in a browser address bar.

📋 Quick Reference Card:

	🔢 Base	🔤 Alphabet	⚠️ URL-Safe?	📦 6-char capacity
🔒 Base10	10	0–9	✅ Yes	1 million
🔧 Base16	16	0–9, a–f	✅ Yes	~16.7 million
🎯 Base62	62	0–9, a–z, A–Z	✅ Yes	~56.8 billion
❌ Base64	64	0–9, a–z, A–Z, +, /	❌ No	~68.7 billion

🧠 Mnemonic: Think "62 = safe six" — 6-character Base62 codes are both safe for URLs and sufficient for billions of links.

Implementing Base62 Encode/Decode

Let's make this concrete with a Python implementation. Understanding the math here is crucial — every encoding approach in this section ultimately relies on this same function.

## Base62 encoder/decoder
## Alphabet: digits first, then lowercase, then uppercase
## This ordering is a convention — what matters is consistency
ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
BASE = len(ALPHABET)  # 62

def encode(num: int) -> str:
    """
    Convert a non-negative integer into a Base62 string.
    Example: encode(125) -> '2z'  (1*62 + 51 = 125, '2'=2, 'z'=35... wait, let's verify)
    Example: encode(0)   -> '0'
    Example: encode(62)  -> '10'  (like how 10 in base10 means one 10 + zero ones)
    """
    if num == 0:
        return ALPHABET[0]  # Edge case: 0 maps to '0'
    
    result = []
    while num > 0:
        remainder = num % BASE       # Extract the least-significant "digit"
        result.append(ALPHABET[remainder])
        num //= BASE                 # Shift right by one Base62 "digit"
    
    return ''.join(reversed(result))  # Reverse because we built it least-significant first

def decode(short_key: str) -> int:
    """
    Convert a Base62 string back to an integer.
    Example: decode('10') -> 62
    """
    result = 0
    for char in short_key:
        result = result * BASE + ALPHABET.index(char)
    return result

## --- Verification ---
print(encode(0))        # '0'
print(encode(61))       # 'Z'  (last character in alphabet)
print(encode(62))       # '10' (first two-character code)
print(encode(3844))     # '100' (62^2 = 3844)
print(decode('10'))     # 62
print(decode('Z'))      # 61

The key insight here is that encode and decode are perfect inverses of each other. Every integer maps to exactly one Base62 string, and every Base62 string maps back to exactly one integer. This mathematical bijection is what makes the auto-increment approach (discussed shortly) so elegant.

💡 Pro Tip: Some teams sort their alphabet as digits → lowercase → uppercase (as above), while others use a different order. The specific order doesn't matter for correctness, but it does affect the appearance of generated codes. Shuffling the alphabet is also a lightweight way to make sequential IDs look non-sequential to users.

Approach 1: Hashing the URL

The most intuitive approach is to take the long URL, run it through a cryptographic hash function like MD5 or SHA-256, and use the first few characters of the resulting hex string as the short code.

Long URL: "https://www.example.com/some/very/long/path?query=value"
    │
    ▼
MD5 Hash: "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6"
    │
    ▼
Take first 6 chars: "a1b2c3"
    │
    ▼
Short code: tinyurl.com/a1b2c3

This is appealing because it's deterministic — the same URL always hashes to the same code, which means you naturally avoid storing duplicate entries for the same long URL. If two users shorten the same link, they get the same short code without any extra lookup logic.

The Collision Problem

Here's the critical weakness: when you take only the first 6 characters of a hash, you're reducing the output from billions of bits of entropy down to a tiny window. Two different URLs might produce the same 6-character prefix — this is a hash collision.

The probability is low but non-zero, and at billions of URLs, it becomes a real operational concern. You need a collision handling strategy:

Strategy A — Append a salt and retry:

import hashlib

def generate_short_code_with_hash(long_url: str, attempt: int = 0) -> str:
    """
    Hash the URL (with a salt for retry attempts) and take first 6 chars.
    On collision, increment attempt and try again.
    """
    salted = long_url + str(attempt)  # Salt changes output on retry
    hash_hex = hashlib.md5(salted.encode()).hexdigest()  # 32 hex chars
    
    # Take first 6 hex characters (Base16, not Base62 — a simplification here)
    # In production you'd convert the hash integer to Base62 for URL safety
    return hash_hex[:6]

def store_url(long_url: str, db: dict) -> str:
    attempt = 0
    while True:
        code = generate_short_code_with_hash(long_url, attempt)
        if code not in db:           # No collision — store it
            db[code] = long_url
            return code
        elif db[code] == long_url:   # Same URL already stored — return existing code
            return code
        else:                        # True collision — different URL has this code
            attempt += 1            # Retry with a salt

⚠️ Common Mistake: Mistake 1 — Forgetting the duplicate URL case. When checking for a collision, you must distinguish between "this code is taken by a different URL" (true collision, retry) and "this code is taken by the same URL" (idempotent insert, just return the existing code). Many candidates conflate these two cases in interviews. ⚠️

Strategy B — Bloom filter pre-check: Before writing to the database, use an in-memory Bloom filter to probabilistically check if a code already exists. This avoids a database round-trip on the happy path.

🤔 Did you know? MD5 produces 128 bits of output. If you convert the full hash to Base62, you get about 22 characters — far more entropy than the 6 you need. The collision risk comes entirely from truncation, not from MD5 itself being weak for this purpose.

When to use hashing: Hashing shines when you want URL deduplication out of the box (same URL → same code) and you don't need a globally coordinated counter. It's a reasonable choice for small-to-medium scale systems where the collision retry overhead is acceptable.

Approach 2: Auto-Increment ID + Base62

This is arguably the cleanest approach and the one most senior engineers reach for. The idea is simple: maintain a global counter that increments by 1 for every new URL. Convert that integer ID to Base62 to get the short code.

URL #1:         ID=1    → encode(1)    = '1'      → tinyurl.com/1
URL #2:         ID=2    → encode(2)    = '2'      → tinyurl.com/2
...
URL #62:        ID=62   → encode(62)   = '10'     → tinyurl.com/10
URL #3844:      ID=3844 → encode(3844) = '100'    → tinyurl.com/100
URL #56 billion: max 6-char code → tinyurl.com/ZZZZZZ

The mapping is perfect, deterministic, and collision-free — because each ID is unique by definition, each Base62 code is unique.

          Database Auto-Increment
                    │
                    ▼
         INSERT url → gets ID=10000042
                    │
                    ▼
            encode(10000042)
                    │
                    ▼
            short_code = 'FXqk'
                    │
                    ▼
         Store mapping: 'FXqk' ↔ 'https://...'

🎯 Key Principle: Auto-increment + Base62 gives you zero collision probability with no retry logic. Every new row in your database gets a unique ID, and every unique ID produces a unique code.

The Trade-offs

🔧 Predictability concern: Because IDs are sequential, short codes are sequential too. An attacker can enumerate all your short URLs by incrementing the code ('1', '2', '3'...). If your use case involves private or sensitive URLs, this is a serious issue. You can mitigate it by shuffling the Base62 alphabet or XOR-ing the ID with a secret before encoding — but this adds complexity.

🔧 Single point of failure: A single database counter is a bottleneck and a SPOF. In a distributed system you need strategies like:

Using a distributed ID generator (e.g., Twitter's Snowflake)
Pre-allocating ranges to application servers (server A gets IDs 1–1000, server B gets 1001–2000, etc.)

⚠️ Common Mistake: Mistake 2 — Using a database AUTO_INCREMENT primary key directly and assuming it's safe to expose. Sequential IDs leak information about your system ("how many URLs have been shortened today?"). Always consider whether ID predictability is acceptable for your use case. ⚠️

Approach 3: Pre-Generated Keys via a Key Generation Service (KGS)

The third approach decouples key generation from key assignment entirely. A dedicated Key Generation Service (KGS) pre-generates millions of random Base62 strings and stores them in a database — before any URL is ever shortened.

When an application server needs to shorten a URL, it asks the KGS for a key. The KGS picks one from its pool of unused keys, marks it as used, and hands it over. The application server maps that key to the long URL.

  ┌─────────────────────────────────────────────────────┐
  │              Key Generation Service                  │
  │                                                      │
  │  ┌─────────────────┐     ┌──────────────────────┐   │
  │  │  Unused Keys DB │────▶│  In-Memory Key Cache │   │
  │  │  (billions of   │     │  (e.g., 1000 keys    │   │
  │  │   pre-gen keys) │     │   loaded at startup) │   │
  │  └─────────────────┘     └──────────┬───────────┘   │
  │                                     │               │
  └─────────────────────────────────────┼───────────────┘
                                        │  give me a key
                             ┌──────────▼───────────┐
                             │   Application Server  │
                             │  (URL Shortener API)  │
                             └──────────────────────┘

Concurrency: The Critical Challenge

The hardest part of KGS is ensuring no two servers get the same key. This is a classic concurrency problem. If two application servers request a key simultaneously, a naive implementation might give them both the same key from the pool.

Solutions include:

Locking at the KGS level: The KGS marks a key as "in-use" atomically before returning it (using database transactions or Redis atomic operations like GETDEL).
Server-side caching with pre-allocation: Each application server loads a batch of keys (say, 1000 at a time) into its own memory. This eliminates per-request round trips to the KGS, and since each batch is exclusively assigned to one server, there's no sharing risk within a batch.

💡 Real-World Example: This pattern is used in distributed ticket/reservation systems. Airlines pre-generate seat codes; each booking agent takes a batch and assigns from their local pool. The same principle applies here.

Pros and Cons at a Glance

	✅ Pros	❌ Cons
🎯 Speed	No computation at request time — just fetch a key	Requires a running KGS service
🔒 Uniqueness	Guaranteed if implemented correctly	Complex concurrency logic
🔧 Flexibility	Keys can be truly random (not sequential)	KGS is a new SPOF (needs replication)
📦 Preloading	Servers can cache batches for sub-millisecond assignment	Key loss risk if server crashes before using cached keys

⚠️ Common Mistake: Mistake 3 — Ignoring the key-loss problem. If an application server caches 1000 keys locally and then crashes, those 1000 keys are gone forever (they're marked used in the DB but were never assigned to a URL). This is usually acceptable — wasting a few thousand keys out of billions is fine — but you must acknowledge this trade-off in an interview. ⚠️

🤔 Did you know? The KGS approach is essentially a pre-computed lookup table strategy. It trades storage space (keeping a database of billions of keys) for operational simplicity (no collision detection, no hash computation, no counter coordination at request time).

Comparing All Three Approaches

Now that you understand each approach in depth, here's how to think about which one to reach for:

Decision Tree for Encoding Strategy:

  Do you need URL deduplication (same URL → same code)?
  │
  ├─ YES → Hashing Approach
  │         (accept collision handling complexity)
  │
  └─ NO
      │
      ├─ Is simplicity the top priority?
      │   └─ YES → Auto-Increment + Base62
      │             (watch out for predictability)
      │
      └─ Do you need random-looking keys at scale?
          └─ YES → Key Generation Service (KGS)
                    (invest in concurrency-safe implementation)

📋 Quick Reference Card:

	🔢 Approach	⚡ Speed	🔒 Collision Risk	🔄 Deduplication	⚠️ Main Challenge
🧠 Hashing	MD5/SHA256 + truncate	Fast	Low but non-zero	✅ Native	Collision retries
🔧 Auto-Increment	Counter + Base62	Very fast	Zero	❌ Extra lookup	Predictable, distributed counter
🎯 KGS	Pre-generated pool	Instant	Zero	❌ Extra lookup	Concurrency, KGS as SPOF

💡 Pro Tip: In a system design interview, the auto-increment + Base62 approach is usually the best starting point. It's easy to explain, mathematically clean, and free of collision risk. You can then layer on complexity (distributed ID generation, alphabet shuffling) as the interviewer pushes on scale and security concerns. Starting with KGS often bogs down discussions in infrastructure details before you've established the core design.

🧠 Mnemonic: "Hash dedupes, Counter is clean, KGS pre-screens" — one phrase to remember what each approach is best at.

❌ Wrong thinking: "I should pick the most sophisticated approach (KGS) to impress the interviewer."

✅ Correct thinking: "I should pick the approach that best fits the stated requirements, explain its trade-offs clearly, and show I can reason about alternatives."

Putting It All Together: A Mini Code Walkthrough

Let's see how the auto-increment approach would look end-to-end in a minimal Python sketch:

## Minimal end-to-end URL shortener logic using Auto-Increment + Base62

ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
BASE = 62

def encode(num: int) -> str:
    if num == 0:
        return ALPHABET[0]
    result = []
    while num > 0:
        result.append(ALPHABET[num % BASE])
        num //= BASE
    return ''.join(reversed(result))

def decode(short_key: str) -> int:
    result = 0
    for char in short_key:
        result = result * BASE + ALPHABET.index(char)
    return result

## Simulated database: maps short_code -> long_url and id -> short_code
url_store = {}      # short_code → long_url
counter = 0         # In production: this lives in a DB or distributed counter

def shorten(long_url: str) -> str:
    global counter
    counter += 1
    short_code = encode(counter)          # Convert integer ID to Base62 string
    url_store[short_code] = long_url      # Store the mapping
    return f"https://tiny.url/{short_code}"

def expand(short_code: str) -> str:
    return url_store.get(short_code, "URL not found")

## --- Demo ---
print(shorten("https://www.example.com/a/very/long/path"))  # https://tiny.url/1
print(shorten("https://openai.com/research/gpt4"))          # https://tiny.url/2
print(expand("1"))  # https://www.example.com/a/very/long/path
print(expand("2"))  # https://openai.com/research/gpt4
print(encode(1000000))  # 'ezKl' — what the millionth URL's code looks like

This code is intentionally simplified — it uses a global variable for the counter and a dict for the store — but it captures the complete logical flow that a real distributed implementation would replicate at scale. The shorten function is one database write; the expand function is one database read. The simplicity is the point.

With a solid grasp of encoding strategies, you're now equipped to answer one of the most probing questions in any URL shortener interview: "How do you generate the short codes?" You can walk through the trade-offs of hashing vs. counters vs. pre-generation, implement the Base62 math from scratch, and reason about collision risks and concurrency issues. In the next section, we'll take these building blocks and assemble them into a complete system architecture — adding the API layer, database selection, and the full data flow from a user's click to a redirect response.

System Architecture: Components and Data Flow

With a solid understanding of how short keys are generated, we can now zoom out and design the full system that brings a URL shortener to life. This is where the interview gets real: your interviewer wants to see not just that you can generate a short key, but that you can build a coherent, production-grade system around it. We'll walk through every layer, from the API surface to the database, and show how they connect into a cohesive whole.

REST API Design: The Contract Between Client and Server

Every URL shortener exposes two fundamental operations: creating a short URL and resolving it. In REST terms, these map cleanly to two endpoints, and getting their design right is the first thing an interviewer evaluates.

POST /shorten — Creating a Short URL

The write endpoint accepts a long URL and returns a shortened version. Here's what the request and response should look like:

// POST /shorten
// Request body
{
  "originalUrl": "https://www.example.com/very/long/path?with=query&params=true",
  "customAlias": "my-link",      // optional
  "expiresAt": "2025-12-31T23:59:59Z",  // optional ISO 8601 timestamp
  "userId": "usr_abc123"          // optional, if authenticated
}

// 201 Created — Response body
{
  "shortUrl": "https://tinyurl.com/abc123",
  "shortKey": "abc123",
  "originalUrl": "https://www.example.com/very/long/path?with=query&params=true",
  "createdAt": "2024-06-15T10:30:00Z",
  "expiresAt": "2025-12-31T23:59:59Z"
}

Notice the response uses HTTP 201 Created rather than 200 OK. This is a subtle but correct signal that a new resource was created. If the same originalUrl already exists in the system (and you decide to deduplicate), you'd return 200 OK with the existing short URL instead.

⚠️ Common Mistake: Many candidates return HTTP 200 for all successful POST responses. Using 201 for resource creation is a small detail that signals API design maturity to your interviewer.

GET /

The read endpoint is where the magic happens. A client hits GET /abc123, and the server needs to find the original URL and redirect the client there. This raises an important design decision: 301 vs. 302 redirect.

A 301 Moved Permanently tells the browser (and CDNs) to cache the redirect forever. Subsequent visits skip your server entirely — the browser goes straight to the destination. This is great for reducing server load but terrible if you ever need to update or track clicks, since cached redirects bypass your analytics.
A 302 Found (temporary redirect) means every click routes through your server. You pay in latency and infrastructure cost, but you gain the ability to track every visit, update destinations, and expire links gracefully.

🎯 Key Principle: Use 302 redirects in any system where analytics, link updates, or expiration are requirements. Use 301 only for truly permanent, never-changing links where you're optimizing for raw performance.

For TinyURL-style systems, 302 is almost always the right answer in an interview, because interviewers expect you to support click analytics.

## Example Flask route showing redirect logic
from flask import Flask, redirect, abort
from datetime import datetime, timezone

app = Flask(__name__)

@app.route('/<short_key>', methods=['GET'])
def redirect_to_original(short_key):
    # 1. Check in-memory cache first (Redis)
    original_url = cache.get(short_key)

    if not original_url:
        # 2. Fall back to database lookup
        record = db.query(
            "SELECT original_url, expires_at FROM urls WHERE short_key = %s",
            (short_key,)
        )
        if not record:
            abort(404)  # Return 404 if key doesn't exist

        # 3. Check expiration
        if record['expires_at'] and record['expires_at'] < datetime.now(timezone.utc):
            abort(410)  # 410 Gone — link existed but is expired

        original_url = record['original_url']
        cache.set(short_key, original_url, ex=3600)  # Cache for 1 hour

    # 4. Log the click asynchronously (don't block the redirect)
    analytics_queue.push({'key': short_key, 'timestamp': datetime.now(timezone.utc)})

    # 5. Issue 302 redirect
    return redirect(original_url, code=302)

This code demonstrates several important patterns: cache-first lookup, expiration enforcement, asynchronous analytics logging (so it doesn't add latency to the redirect), and the correct use of 410 Gone for expired links rather than 404.

🤔 Did you know? HTTP 410 Gone is semantically more accurate than 404 Not Found for expired short links — it signals the resource existed but is no longer available. Most production URL shorteners use 404 for simplicity, but 410 is the more precise choice.

Database Design: Schema and Storage Decisions

The database is the heart of the system. Let's design the schema first, then reason about which database technology fits best.

The Core URL Table

The primary table is simple by design — URL shorteners are fundamentally a key-value mapping with some metadata:

-- Core URLs table
CREATE TABLE urls (
    short_key     VARCHAR(10)   PRIMARY KEY,  -- e.g., "abc123"
    original_url  TEXT          NOT NULL,      -- the full destination URL
    created_at    TIMESTAMP     NOT NULL DEFAULT NOW(),
    expires_at    TIMESTAMP     NULL,          -- NULL means never expires
    user_id       VARCHAR(50)   NULL,          -- NULL for anonymous links
    click_count   BIGINT        NOT NULL DEFAULT 0,
    is_active     BOOLEAN       NOT NULL DEFAULT TRUE
);

-- Index on user_id for dashboard queries ("show my links")
CREATE INDEX idx_urls_user_id ON urls(user_id);

-- Index on expires_at for cleanup jobs
CREATE INDEX idx_urls_expires_at ON urls(expires_at) WHERE expires_at IS NOT NULL;

Each field has a deliberate purpose:

🔑 short_key is the primary key — this is your most common lookup pattern
📎 original_url uses TEXT not VARCHAR(255) because some URLs are genuinely very long
🕐 expires_at as nullable means you don't waste storage on a sentinel value like a far-future date
👤 user_id enables user dashboards showing all their links
📊 click_count enables basic analytics (though for high-traffic systems, you'd offload this to a separate analytics service)

💡 Pro Tip: Don't use click_count as a frequently-updated column on your main urls table at scale — every redirect would trigger a write lock on that row. Instead, log click events to a separate table or stream, and compute counts asynchronously.

SQL vs. NoSQL: Choosing the Right Storage Engine

This is a decision point that interviewers love to probe. Let's think through it honestly.

A URL shortener has a very specific access pattern: almost exclusively point lookups by primary key (WHERE short_key = 'abc123'). There are almost no complex joins, no multi-table transactions, and no need for flexible querying. This is where NoSQL key-value stores shine.

Dimension	PostgreSQL / MySQL	DynamoDB / Redis
🔍 Point lookup speed	Fast (indexed PK)	Very fast (hash-based)
📈 Horizontal scaling	Hard (sharding complexity)	Native, seamless
🔒 ACID transactions	Yes	Limited (DynamoDB) / None (Redis)
🗂️ Complex queries	Excellent	Poor
💰 Cost at scale	Higher infra cost	Pay-per-request model
🔧 Operational complexity	Moderate	Low (managed services)

DynamoDB is an excellent fit for the primary URL store: it scales horizontally with no operational overhead, handles millions of reads per second natively, and its partition key model maps perfectly to short_key lookups. Redis works brilliantly as a caching layer on top — its in-memory hash structure makes key lookups sub-millisecond.

❌ Wrong thinking: "I'll just use MySQL because I know it." ✅ Correct thinking: "My primary access pattern is point lookups at massive scale. A key-value store like DynamoDB aligns naturally with that pattern and eliminates horizontal scaling complexity."

⚠️ Common Mistake: Candidates often over-engineer by proposing a relational database with complex sharding schemes when a managed NoSQL service would solve the same problem with far less operational complexity. Choose boring, appropriate technology.

Handling Custom Aliases

Custom aliases allow users to request a specific short key (e.g., tinyurl.com/my-brand) instead of a randomly generated one. They add UX value but introduce several design challenges you must address.

Uniqueness Enforcement

Custom aliases must be unique across all URLs. The simplest enforcement mechanism is using the alias as the primary key (or a unique index) in your database — any collision will throw a constraint violation, which you catch and return as a 409 Conflict HTTP response.

Reserved Keywords Blacklist

Certain paths must never be allocated to users because they conflict with your own system routes. You need a reserved keywords blacklist maintained either in-code or in a small config table:

## Reserved keywords that can never be used as custom aliases
RESERVED_KEYWORDS = frozenset([
    # API routes
    'api', 'v1', 'v2', 'shorten', 'health', 'metrics', 'admin',
    # Common web paths
    'about', 'login', 'signup', 'logout', 'dashboard', 'settings',
    # Potentially harmful or confusing
    'null', 'undefined', 'true', 'false', 'favicon.ico', 'robots.txt',
    # Offensive terms would be loaded from a separate moderation list
])

def validate_custom_alias(alias: str) -> tuple[bool, str]:
    """Validate a custom alias. Returns (is_valid, error_message)."""

    # Length constraints: 3-50 characters
    if len(alias) < 3:
        return False, "Alias must be at least 3 characters"
    if len(alias) > 50:
        return False, "Alias must be 50 characters or fewer"

    # Character whitelist: alphanumeric plus hyphens and underscores only
    import re
    if not re.match(r'^[a-zA-Z0-9_-]+$', alias):
        return False, "Alias may only contain letters, numbers, hyphens, and underscores"

    # Reserved keyword check (case-insensitive)
    if alias.lower() in RESERVED_KEYWORDS:
        return False, f"'{alias}' is a reserved keyword and cannot be used"

    return True, ""

This validation function enforces three layers of protection: length constraints (3–50 characters is a reasonable range), a character whitelist to prevent path-traversal attacks and encoding ambiguity, and the reserved keywords blacklist.

💡 Real-World Example: Bit.ly discovered early on that users tried to register aliases like api, logout, and even cdn — all of which would have silently broken parts of their platform. A comprehensive reserved list is non-negotiable.

🧠 Mnemonic: Think of custom alias validation as LCR — Length, Characters, Reserved words. Check all three before accepting any custom alias.

System Diagram: The Full Architecture

Now let's connect all the pieces. Here is the end-to-end data flow for both write (shorten) and read (redirect) operations:

┌─────────────────────────────────────────────────────────────────┐
│                        CLIENT (Browser / App)                   │
└─────────────────────────┬───────────────────────────────────────┘
                          │  HTTP Request
                          ▼
┌─────────────────────────────────────────────────────────────────┐
│                     LOAD BALANCER (L7)                          │
│              (AWS ALB / Nginx / Cloudflare)                     │
│    • SSL termination   • Rate limiting   • DDoS protection      │
└───────────┬────────────────────────────────────────┬────────────┘
            │                                        │
            ▼                                        ▼
┌───────────────────────┐              ┌───────────────────────────┐
│   APP SERVER #1       │              │   APP SERVER #2           │
│   (Stateless)         │    ...       │   (Stateless)             │
│                       │              │                           │
│ POST /shorten         │              │ GET /{shortKey}           │
│ • Validate input      │              │ • Cache lookup first      │
│ • Generate short key  │              │ • DB fallback             │
│ • Write to DB         │              │ • Log click event         │
│ • Invalidate cache    │              │ • Issue 302 redirect      │
└───────────┬───────────┘              └───────────────┬───────────┘
            │                                          │
            │         ┌──────────────────┐            │
            │         │   CACHE LAYER    │◄───────────┘
            │         │     (Redis)      │
            │         │                  │
            │         │ short_key → URL  │
            │         │ TTL: 1–24 hours  │
            │         │ Hit rate: ~90%+  │
            │         └────────┬─────────┘
            │                  │ Cache miss
            │                  ▼
            │    ┌─────────────────────────────┐
            └───►│       PRIMARY DATABASE      │
                 │   (DynamoDB / PostgreSQL)   │
                 │                             │
                 │  urls table:                │
                 │  • short_key (PK)           │
                 │  • original_url             │
                 │  • created_at               │
                 │  • expires_at               │
                 │  • user_id                  │
                 └─────────────────────────────┘

── ASYNC ANALYTICS PIPELINE ──────────────────────────────────────

  App Servers ──► Message Queue ──► Analytics Workers ──► ClickDB
                  (Kafka/SQS)        (aggregate clicks)   (ClickHouse)

Let's walk through each layer in plain language.

Layer 1: Load Balancer

All traffic enters through a Layer 7 load balancer — AWS Application Load Balancer (ALB) or Nginx are typical choices. This layer handles SSL termination (so your app servers receive plain HTTP), distributes requests across app server instances using round-robin or least-connections routing, and can enforce rate limits to protect against abuse. In a production system, you'd front this with Cloudflare or a CDN for DDoS protection and geographic routing.

Layer 2: App Servers (Stateless)

The app servers are stateless — they hold no local state and can be added or removed freely. Statelessness is a critical design property: it means your load balancer can route any request to any server without session affinity. All state lives in the cache and database.

For write operations (POST /shorten), the app server validates input, generates a short key (using the algorithms discussed in Section 2), writes the mapping to the database, and optionally pre-warms the cache with the new entry.

For read operations (GET /{shortKey}), the app server follows a cache-first strategy: check Redis, and only fall back to the database on a cache miss. This is what makes redirects fast at scale.

Layer 3: Cache (Redis)

Redis sits between the app servers and the database, acting as a high-speed in-memory lookup table. Since redirect traffic dwarfs write traffic by orders of magnitude (a popular link might be clicked millions of times), the cache hit rate for popular links approaches 99%.

Cache entries are stored as simple key-value pairs: short_key → original_url. TTLs (time-to-live values) of 1–24 hours balance freshness with load reduction. When a link is deleted or updated, the cache entry must be invalidated immediately — otherwise users hit stale data.

💡 Pro Tip: For expired links, cache the "this key is expired" state too — use a sentinel value like an empty string. This prevents cache penetration, where every request for an expired key hammers the database because the cache always misses.

Layer 4: Database

The database is the source of truth. It's only hit on cache misses, which should be infrequent for popular links. For write operations, it's always hit to persist the new mapping. Replication (covered in Section 4) ensures the database can scale reads independently of writes.

Layer 5: Async Analytics Pipeline

Click tracking is deliberately decoupled from the redirect path. When a redirect happens, the app server pushes a lightweight event (short key + timestamp + user-agent) to a message queue like Kafka or AWS SQS. Separate analytics workers consume these events asynchronously, aggregating click counts, geographic data, and referrer information into an analytics database like ClickHouse.

This decoupling is critical: analytics processing should never add latency to the user's redirect experience. If the analytics queue backs up, users are unaffected.

🎯 Key Principle: The redirect path must be read-optimized and latency-minimal. Every synchronous operation you add to GET /{shortKey} directly affects the user experience for every single click. Defer everything that isn't strictly necessary to async pipelines.

Putting It All Together: A Request's Journey

Let's trace a single redirect request through the system to cement the architecture:

User clicks https://tinyurl.com/abc123 in their browser
DNS resolves to Cloudflare/CDN edge, which forwards to the load balancer
Load balancer terminates SSL, applies rate limiting, routes to App Server #2
App Server extracts abc123, checks Redis → cache hit → retrieves https://www.example.com/long-path
App Server pushes {key: "abc123", ts: now(), ua: "Chrome/120"} to Kafka queue (non-blocking)
App Server returns HTTP 302 Location: https://www.example.com/long-path
Browser immediately requests the destination URL
Total latency from step 1 to step 7: < 10ms on a cache hit

On a cache miss, steps 4–5 include a DynamoDB lookup, adding roughly 5–15ms. The cache miss path is still fast, but the goal is to keep hit rates above 90% so the median user experience is always sub-10ms.

📋 Quick Reference Card: API Design Summary

🔑 Endpoint	🔧 Method	📊 Success Code	⚠️ Error Codes
🔗 /shorten	POST	201 Created	400 Bad Request, 409 Conflict
🔀 /	GET	302 Found	404 Not Found, 410 Gone
📋 /links (user's links)	GET	200 OK	401 Unauthorized
🗑️ /links/	DELETE	204 No Content	403 Forbidden, 404 Not Found

This architecture balances simplicity with production readiness. The stateless app servers enable horizontal scaling, Redis keeps redirect latency minimal, and the async analytics pipeline ensures observability without compromising performance. In the next section, we'll explore what happens when traffic grows to millions of requests per second — and how caching, sharding, and replication keep the system healthy under that load.

Scaling for High Traffic: Caching, Sharding, and Replication

A URL shortener that works beautifully for a hundred users will collapse under the weight of millions. The gap between "it works" and "it scales" is where real system design lives — and where interviews separate strong candidates from great ones. In this section, we move from architecture on paper to architecture under pressure, examining the concrete techniques that allow a URL shortener to serve millions of redirects per second without breaking a sweat.

Understanding the Workload: Reads Dominate Everything

Before reaching for solutions, you need to deeply understand the problem's nature. A URL shortener has a profoundly asymmetric workload: writes (creating short URLs) happen relatively rarely, while reads (redirecting users) happen constantly and at massive scale. A viral tweet with a shortened link might generate 500,000 redirects in the first hour after posting — all resolving the same shortKey to the same long URL.

🎯 Key Principle: When reads outnumber writes by 100:1 or more, your architecture should be optimized ruthlessly for read performance. Every design decision — caching, replication, CDN — flows from this single insight.

Consider a realistic estimate: if a service handles 1 billion redirects per day, that's roughly 11,600 requests per second on average, with peaks potentially 5–10x higher. No single database, no matter how powerful, can handle that query load while also serving low-latency responses to users around the globe. The solution is a layered approach: cache what you can, distribute what you must, and replicate for resilience.

🤔 Did you know? Bitly reported handling over 600 million redirects per month at its peak. At that scale, even saving 1 millisecond per redirect saves over 600,000 seconds of cumulative user wait time every month.

The Cache Layer: Your First Line of Defense

Caching is the single most impactful optimization for a read-heavy system. The idea is deceptively simple: store the result of expensive lookups in fast memory so subsequent requests never touch the database. For URL shorteners, this means storing the mapping shortKey → longURL in an in-memory store like Redis or Memcached.

Choosing Between Redis and Memcached

Both Redis and Memcached are excellent choices, but they differ in meaningful ways:

Feature	Redis	Memcached
🔧 Data structures	Rich (strings, hashes, sorted sets)	Simple key-value only
💾 Persistence	Optional disk persistence	Memory-only
🔒 Replication	Built-in primary-replica	Not built-in
⚡ Performance	Slightly slower due to features	Marginally faster for simple ops

For a URL shortener, Redis is the stronger choice because it supports replication (useful when the cache itself needs to scale), optional persistence (warm cache restarts), and TTL (time-to-live) on individual keys — a feature we'll use heavily.

The Cache-Aside Pattern

The dominant caching strategy for URL shorteners is the cache-aside pattern (also called lazy loading). The application code is responsible for managing the cache:

Check the cache for the shortKey
If found (cache hit), return the longURL immediately
If not found (cache miss), query the database, store the result in cache, then return it

┌──────────┐      1. GET shortKey      ┌─────────────┐
│  Client  │ ─────────────────────────▶│  App Server │
└──────────┘                           └──────┬──────┘
                                              │
                                    2. Cache lookup
                                              │
                                       ┌──────▼──────┐
                                       │    Redis    │
                                       │   Cache     │
                                       └──────┬──────┘
                                              │
                              ┌───────────────┴──────────────┐
                              │                              │
                         HIT ▼                         MISS ▼
                    Return longURL              3. DB Query
                    immediately               ┌────────────┐
                                              │  Database  │
                                              └─────┬──────┘
                                                    │
                                         4. Store in cache
                                            Return longURL

Here's what that looks like in practice with Python and Redis:

import redis
import psycopg2

## Initialize connections
cache = redis.Redis(host='redis-cluster', port=6379, decode_responses=True)
db = psycopg2.connect("dbname=urlshortener user=app")

CACHE_TTL_SECONDS = 86400  # 24-hour TTL for cached mappings

def resolve_short_url(short_key: str) -> str | None:
    """
    Resolves a short key to a long URL using cache-aside pattern.
    Returns the long URL or None if not found.
    """
    # Step 1: Check the cache first (O(1) lookup, ~1ms latency)
    cached_url = cache.get(f"url:{short_key}")
    if cached_url:
        # Cache HIT — return immediately without touching the database
        return cached_url

    # Step 2: Cache MISS — query the database (~5-20ms latency)
    cursor = db.cursor()
    cursor.execute(
        "SELECT long_url FROM urls WHERE short_key = %s",
        (short_key,)
    )
    row = cursor.fetchone()

    if row is None:
        return None  # Short URL doesn't exist

    long_url = row[0]

    # Step 3: Populate cache for future requests
    # TTL ensures stale data eventually expires
    cache.setex(f"url:{short_key}", CACHE_TTL_SECONDS, long_url)

    return long_url

This code follows the cache-aside pattern precisely. Notice the CACHE_TTL_SECONDS constant — this is your TTL (Time-To-Live) strategy. Setting a 24-hour TTL means cached entries automatically expire, preventing the cache from serving permanently stale data if a URL is ever updated or deleted. For most short URLs, which rarely change, this is a safe and efficient tradeoff.

LRU Eviction: Managing Cache Capacity

Your Redis instance has finite memory. When it fills up, it needs a policy for deciding what to evict. The LRU (Least Recently Used) eviction policy is ideal for URL shorteners: it removes the keys that haven't been accessed recently, keeping the "hot" URLs (the ones getting clicked constantly) in cache.

## Redis configuration for URL shortener cache
## Set maximum memory to 8GB
maxmemory 8gb

## Use LRU eviction — removes least recently accessed keys first
maxmemory-policy allkeys-lru

💡 Pro Tip: In practice, 80% of your traffic will resolve to 20% of your URLs (the classic Pareto principle). A well-tuned LRU cache of even moderate size — say, caching the top 10 million most-accessed URLs — can absorb the vast majority of redirect traffic, with cache hit rates exceeding 90%.

⚠️ Common Mistake: Mistake 1: Setting no TTL on cached entries. Without TTL, if a URL is ever updated (e.g., the destination changes), users will keep receiving the old redirect indefinitely until the cache is manually invalidated. Always set a sensible TTL, even if it's long. ⚠️

Database Sharding: Distributing the Load Horizontally

Even with an aggressive cache, cache misses still happen — and when they do, the database needs to handle them. More importantly, the write path (creating new short URLs) always hits the database. At sufficient scale, no single database node can keep up. The answer is sharding: splitting your data across multiple database nodes, each responsible for a subset of the data.

Range-Based Sharding

Range-based sharding partitions data by ranges of the shard key. For example, you might assign shortKeys starting with a–h to Shard 1, i–p to Shard 2, and q–z to Shard 3.

ShortKey: "abc123"  → starts with 'a' → Shard 1
ShortKey: "mno789"  → starts with 'm' → Shard 2  
ShortKey: "xyz456"  → starts with 'x' → Shard 3

┌─────────────┐   ┌─────────────┐   ┌─────────────┐
│   Shard 1   │   │   Shard 2   │   │   Shard 3   │
│  Keys: a–h  │   │  Keys: i–p  │   │  Keys: q–z  │
└─────────────┘   └─────────────┘   └─────────────┘

Range-based sharding is simple to reason about and makes range queries easy, but it has a serious vulnerability: hotspots. If a viral campaign uses URLs starting with t (like tinyurl.com/t...), Shard 3 gets hammered while Shards 1 and 2 sit idle. This uneven distribution is called a hotspot problem, and it's a classic interview pitfall.

Hash-Based Sharding

Hash-based sharding solves the hotspot problem by applying a hash function to the shortKey and using the result to determine the shard. Because good hash functions distribute values uniformly, traffic is spread evenly across all shards regardless of access patterns.

def get_shard_id(short_key: str, num_shards: int) -> int:
    """
    Determines which database shard holds the given short key.
    Uses consistent hashing to minimize data movement when shards are added.
    """
    import hashlib
    
    # Create a stable hash of the short key
    key_hash = int(hashlib.md5(short_key.encode()).hexdigest(), 16)
    
    # Modulo distributes evenly across available shards
    shard_id = key_hash % num_shards
    return shard_id

## Example usage
NUM_SHARDS = 8
print(get_shard_id("abc123", NUM_SHARDS))  # → e.g., Shard 3
print(get_shard_id("mno789", NUM_SHARDS))  # → e.g., Shard 7
print(get_shard_id("xyz456", NUM_SHARDS))  # → e.g., Shard 1

This function takes any shortKey and deterministically maps it to one of NUM_SHARDS shards. The same key always maps to the same shard, so lookups are efficient — you compute the shard ID before querying, and you know exactly which database node to contact.

⚠️ Common Mistake: Mistake 2: Using simple modulo sharding (shortKey % num_shards) without consistent hashing. If you later need to add a 9th shard, simple modulo remaps almost all keys to different shards — requiring a massive, painful data migration. Consistent hashing minimizes this disruption by only remapping the keys that need to move. ⚠️

🧠 Mnemonic: Think of shards like post office zip codes. Range-based is like assigning zip codes sequentially by neighborhood — easy to understand, but popular neighborhoods get overwhelmed. Hash-based is like randomly shuffling mail to evenly fill all post offices — fairer, but you need the routing table to know where anything is.

📋 Quick Reference Card: Sharding Strategy Comparison

Strategy	🟢 Pros	🔴 Cons	🎯 Best For
🗂️ Range-based	Simple, range queries easy	Hotspot risk	Sequential IDs, time-series data
#️⃣ Hash-based	Even distribution	Range queries harder	Random access patterns like shortKeys
🌐 Geo-based	Low latency for users	Uneven global traffic	Global apps with regional data

Replication for Availability: Never Lose a Redirect

Sharding distributes load horizontally. Replication ensures that if any single node fails, the system keeps running. In a primary-replica setup (also called leader-follower), one node is designated the primary (or leader) and accepts all writes. Changes are then asynchronously propagated to one or more replica nodes (followers), which serve read requests.

                    ┌─────────────┐
   Write Path ──▶   │   Primary   │  (accepts all writes)
                    │   (Leader)  │
                    └──────┬──────┘
                           │  Async replication
              ┌────────────┼────────────┐
              ▼            ▼            ▼
        ┌──────────┐ ┌──────────┐ ┌──────────┐
        │ Replica 1│ │ Replica 2│ │ Replica 3│
        │ (Follower│ │ (Follower│ │ (Follower│
        └──────────┘ └──────────┘ └──────────┘
              ▲            ▲            ▲
              └────────────┴────────────┘
                    Read Path (redirect traffic)

This architecture is perfectly matched to our workload. The primary handles the relatively low volume of new URL creations, while the three replicas collectively absorb the enormous flood of redirect lookups. Load balancers distribute read requests across replicas, so adding a fourth replica linearly increases read capacity.

The Replication Lag Problem

Because replication is asynchronous, there's always a brief window — typically milliseconds to seconds — where a replica doesn't yet have the latest data. This is called replication lag. For a URL shortener, this means:

❌ Wrong thinking: A user creates a short URL and immediately clicks it — the redirect hits a replica that hasn't received the write yet, resulting in a "URL not found" error.

✅ Correct thinking: Route reads to the primary for a brief "consistency window" (e.g., 1–2 seconds) after a write, then fall back to replicas. Alternatively, direct the creator's own requests to the primary using read-your-own-writes consistency via sticky sessions or a primary-preference flag.

💡 Real-World Example: DynamoDB addresses this with its ConsistentRead parameter. When set to true, the request always goes to the primary leader, guaranteeing fresh data. For most redirect lookups, eventual consistency is perfectly acceptable — a 200ms lag before a new URL is universally resolvable is invisible to users.

CDN and Geo-Distribution: Speed at the Edge

Even with a perfect cache and optimally sharded database, there's a physical limit: the speed of light. A user in Sydney querying your origin servers in Virginia will experience at minimum ~150ms of round-trip latency — no amount of code optimization overcomes physics.

Content Delivery Networks (CDNs) solve this by placing edge locations (Points of Presence, or PoPs) around the globe. These edge nodes cache responses close to users, dramatically reducing latency.

     User in        User in         User in
     Tokyo          London          São Paulo
       │               │               │
       ▼               ▼               ▼
  ┌─────────┐    ┌─────────┐    ┌─────────┐
  │CDN Edge │    │CDN Edge │    │CDN Edge │
  │  Tokyo  │    │  London │    │  São    │
  └────┬────┘    └────┬────┘    │  Paulo  │
       │              │         └────┬────┘
       └──────────────┴──────────────┘
                       │
                  Cache miss only
                       │
                  ┌────▼────┐
                  │ Origin  │
                  │Servers  │
                  │(US-East)│
                  └─────────┘

For URL shorteners, CDN caching works as follows: the first time tinyurl.com/abc123 is requested from Tokyo, the edge node has no cached response and forwards the request to the origin. The origin returns a 301 Permanent Redirect or 302 Temporary Redirect with appropriate Cache-Control headers. The CDN edge caches this response, and every subsequent request from Tokyo is served locally in single-digit milliseconds.

Choosing 301 vs. 302 Redirects for CDN Compatibility

This choice has real consequences for caching:

301 (Moved Permanently): Browsers and CDNs cache this aggressively. Future requests skip your service entirely, reducing load — but also eliminating your ability to track clicks or update the destination.
302 (Found / Temporary Redirect): Browsers don't permanently cache this. Every redirect checks your service, preserving analytics and flexibility, but increasing load.

🎯 Key Principle: If analytics matter (and they usually do for a URL shortener), use 302 at the application layer, but configure your CDN to cache the response at the edge with a controlled TTL. This gives you edge performance without losing click-tracking visibility.

Geo-Distributed Databases

For the most demanding global scenarios, you can go further than CDN caching by deploying geo-distributed databases — placing database replicas in multiple geographic regions. Writes still go to a single primary (or use multi-master replication), but reads resolve to the nearest regional replica.

Services like CockroachDB, PlanetScale, and AWS Aurora Global Database make this pattern accessible. The tradeoff is complexity: cross-region replication introduces higher latency for writes and more complex consistency guarantees.

💡 Pro Tip: For most URL shortener interview discussions, recommending CDN + read replicas per region is sophisticated enough to impress. Full geo-distributed databases are worth mentioning as a scaling frontier, but drilling into the details risks going out of scope for the allotted interview time.

Putting It All Together: The Full Scaling Stack

None of these techniques exist in isolation. They layer on top of each other to create defense-in-depth against high traffic:

Request: resolve "tinyurl.com/abc123"

  Layer 1: CDN Edge (Tokyo)
  └─ HIT: Return cached 302 response in <5ms
  └─ MISS: Forward to origin

  Layer 2: Redis Cache (App Server)
  └─ HIT: Return longURL from memory in <2ms
  └─ MISS: Query database

  Layer 3: Read Replica (nearest regional DB)
  └─ HIT: Return row in <20ms, populate cache
  └─ MISS (replication lag): Query primary

  Layer 4: Primary Database
  └─ Source of truth, always available

With all four layers working, the vast majority of requests never reach the database. Here's a rough traffic distribution for a well-tuned system:

🌐 CDN edge hit rate: ~60% of requests (popular URLs cached globally)
🧠 Redis cache hit rate: ~35% of remaining requests
📚 Read replica serves: ~4.9% of remaining requests
🔒 Primary database touched: ~0.1% of total traffic

That final number is the goal. The primary database should handle writes and consistency-critical reads only — everything else should be absorbed by faster layers.

🧠 Mnemonic: Remember the scaling layers as "CRRD": CDN → Redis → Read Replica → Database. Traffic flows down this funnel, with each layer catching as much as possible before passing requests deeper.

⚠️ Common Mistake: Mistake 3: Designing the cache and database in isolation without considering cache stampede (also called the thundering herd problem). If a popular URL's cache entry expires and thousands of simultaneous requests miss the cache at once, they all hit the database simultaneously, potentially overwhelming it. Mitigate this with probabilistic early expiration (refreshing the cache slightly before TTL expires) or a mutex/lock that ensures only one request populates the cache while others wait. ⚠️

Summary: A Scaling Toolkit for Millions of Redirects

Scaling a URL shortener from hundreds to millions of requests per second requires layered, complementary techniques — not a single silver bullet. The workload's read-heavy nature makes caching the highest-leverage intervention, with Redis implementing the cache-aside pattern and LRU eviction. Database sharding (preferably hash-based) distributes write load and eliminates single-node bottlenecks. Primary-replica replication multiplies read capacity while providing fault tolerance. And CDN edge locations bring responses physically closer to users worldwide.

In your next system design interview, demonstrating that you understand why each technique is chosen — not just that it exists — is what will set your answer apart. The key insight is always to start with the workload characteristics and let those drive every architectural decision that follows.

Common Pitfalls and Mistakes to Avoid in Interviews

Reaching the system design interview with solid knowledge of algorithms and architecture is only half the battle. The other half is avoiding the subtle but costly mistakes that cause otherwise strong candidates to stumble. URL shortener problems are deceptively approachable — they look simple, which is precisely why interviewers use them to reveal depth of thinking. The candidate who says "I'll just hash the URL and store it" and moves on is giving a very different signal than the one who pauses and asks, "But what happens when two URLs hash to the same key?" This section catalogs the most common pitfalls, explains why they matter, and shows you exactly how to handle each one confidently.

Pitfall 1: Ignoring Collision Handling

⚠️ Common Mistake: Assuming that because you are using MD5 or SHA-256, collisions are "practically impossible" and therefore not worth discussing.

This is one of the most frequent errors in URL shortener interviews. Candidates choose a hashing approach — often MD5 or SHA-256 — take the first 6–8 characters of the hex output, and declare the problem solved. The interviewer then asks, "What happens if two different long URLs produce the same short key?" A candidate without a plan is suddenly in trouble.

Collision is the term for when two different inputs produce the same hash output (or in this case, the same truncated prefix). Even with a 7-character Base62 key giving you 62^7 ≈ 3.5 trillion possibilities, the Birthday Problem tells us that collisions become statistically likely much sooner than you'd intuitively expect — roughly after storing the square root of the key space, or around 1.9 million entries.

How to Address It

There are two clean strategies you should be ready to discuss:

Strategy A — Rehashing with a salt: Append a counter or random salt to the original URL before hashing again.

Strategy B — Database-level uniqueness constraint: Attempt an insert; if it fails due to a unique key violation, generate a new key.

Here is a concrete Python snippet demonstrating the rehash-with-counter approach:

import hashlib
import base64

def generate_short_key(long_url: str, attempt: int = 0) -> str:
    """
    Generate a 7-character short key from a long URL.
    On collision, increment `attempt` to produce a different hash.
    """
    # Append attempt counter to the URL to vary the input on retries
    salted_input = f"{long_url}:{attempt}"
    
    # Compute SHA-256 and take the first 8 bytes
    hash_bytes = hashlib.sha256(salted_input.encode()).digest()[:8]
    
    # Base64-URL encode and trim to 7 characters
    short_key = base64.urlsafe_b64encode(hash_bytes).decode()[:7]
    
    return short_key

def store_url(long_url: str, db) -> str:
    """Attempt to store a URL, retrying with incremented salt on collision."""
    for attempt in range(5):  # Limit retries to avoid infinite loops
        key = generate_short_key(long_url, attempt)
        
        if db.insert_if_not_exists(key, long_url):  # Returns True on success
            return key
    
    raise Exception("Failed to generate unique key after 5 attempts")

This code does two important things: it deterministically varies the input on each retry (by appending the attempt count), and it caps retries to prevent infinite loops in degenerate cases. In practice, a single retry resolves nearly every collision — but having the retry logic is what demonstrates maturity to your interviewer.

💡 Pro Tip: When you present this strategy, also mention that with a counter-based ID approach (using an auto-incrementing database ID converted to Base62), collisions are structurally impossible because each ID is unique by definition. This shows you understand the trade-off between the two major encoding families discussed in earlier sections of this lesson.

Pitfall 2: Choosing the Wrong HTTP Redirect Code

⚠️ Common Mistake: Defaulting to 301 Moved Permanently without understanding what "permanent" means for caching and analytics.

This is a surprisingly nuanced decision that many candidates treat as a throwaway detail. The choice between 301 and 302 has real, measurable consequences.

	301 Permanent Redirect	302 Temporary Redirect
🔒 Browser caching	Browser caches the destination; skips your server on repeat visits	Browser always re-requests from your server
📊 Analytics	Clicks may not hit your server after first visit	Every click passes through your server
⚡ Performance	Lower server load over time	Higher server load, better observability
🎯 Use case	Pure link-shortening with no analytics	Analytics platforms, A/B testing, expirable links

Here is the core mental model:

301 Flow (after first visit):
User → Browser cache → Destination
        (your server never sees the click again)

302 Flow (every visit):
User → Your server → Destination
        (you log every click, track metrics)

❌ Wrong thinking: "301 is better because it's faster."

✅ Correct thinking: "301 is faster but sacrifices analytics visibility. For a service that sells click analytics — like Bitly — 302 is the correct choice. For a pure convenience shortener with no tracking needs, 301 reduces infrastructure load."

🎯 Key Principle: The redirect code is a product decision masquerading as a technical one. Asking your interviewer "Does this system need click tracking?" before answering demonstrates exactly the kind of requirements-first thinking senior engineers are expected to exhibit.

💡 Real-World Example: Bitly uses 301 for uncustomized links on free plans to reduce server load, but switches to 302 for tracked links on paid plans where analytics are the core value proposition. This is a direct business-driven technical decision.

Pitfall 3: Neglecting URL Validation and Sanitization

⚠️ Common Mistake: Accepting any string as a valid URL without discussing what makes input safe, well-formed, or non-duplicate.

A URL shortener that blindly accepts and serves arbitrary input is a security liability. Interviewers evaluating senior candidates pay close attention to whether you think about the trust boundary at the ingestion point.

There are three distinct sub-problems here that candidates routinely conflate or skip entirely:

1. Malformed Input

Not every string is a valid URL. Before shortening, you must validate that the input conforms to URL syntax — correct scheme, valid hostname, properly encoded characters.

from urllib.parse import urlparse
import re

def is_valid_url(url: str) -> bool:
    """
    Validate that a string is a well-formed HTTP/HTTPS URL.
    Rejects javascript:, file:, and other dangerous schemes.
    """
    try:
        parsed = urlparse(url)
        
        # Must have a valid scheme (whitelist approach, not blacklist)
        if parsed.scheme not in ('http', 'https'):
            return False
        
        # Must have a non-empty, plausible hostname
        if not parsed.netloc or len(parsed.netloc) < 3:
            return False
        
        # Reject IP addresses pointing to private/loopback ranges
        # (simplified check — production would use ipaddress module)
        private_patterns = [
            r'^localhost$',
            r'^127\.\d+\.\d+\.\d+$',
            r'^10\.\d+\.\d+\.\d+$',
            r'^192\.168\.\d+\.\d+$',
        ]
        for pattern in private_patterns:
            if re.match(pattern, parsed.hostname or ''):
                return False  # Block SSRF vectors
        
        return True
    except Exception:
        return False

This code demonstrates whitelisting acceptable schemes (only http and https) rather than trying to blacklist dangerous ones — a critical security principle. It also illustrates awareness of SSRF (Server-Side Request Forgery), where an attacker submits an internal IP address to probe your internal network via the shortener.

2. Malicious URLs

Phishing and malware distribution are real threats. A URL shortener that happily shortens http://bank-of-america.secure-login.evil.com is actively harmful. Production systems integrate with services like the Google Safe Browsing API or VirusTotal to check URLs against known threat databases at creation time.

💡 Pro Tip: You don't need to implement threat detection in the interview — you need to name it and explain when it runs (at write time, not read time, to avoid adding latency to the hot redirect path).

3. Duplicate URL Detection

Idempotency at the write layer is another overlooked concern. If the same long URL is submitted 10,000 times, should the system create 10,000 different short codes or return the same one?

The answer depends on requirements — but you must discuss it. A common approach is to maintain a reverse lookup index: a hash of the long URL as a key pointing to its existing short code. Before creating a new entry, check this index first.

Write Path with Deduplication:

  [Client] ──────────────────────────────────────────────┐
                                                         ↓
  [API Server] → hash(longURL) → [Reverse Index Lookup]
                                        │
               ┌───────────────────────┤
               ↓ (found)               ↓ (not found)
       Return existing key     Generate new short key
                                        │
                               [Write to DB + Index]
                                        │
                               Return new short key

Pitfall 4: Overlooking Expiration and Cleanup

⚠️ Common Mistake: Designing the URL shortener as if all URLs live forever, with no TTL mechanism or cleanup strategy.

This is a systems thinking gap. A real URL shortener accumulates billions of rows over time. Without expiration, storage grows unboundedly, and dead links degrade user experience. Yet candidates consistently either forget this dimension entirely or mention it vaguely without explaining the mechanics.

TTL (Time-To-Live) is the standard mechanism. Each URL record stores an optional expires_at timestamp. The shortener checks this field at redirect time and returns a 410 Gone response for expired links rather than following the redirect.

🧠 Mnemonic: Think of TTL as a "best-before date" stamped on each short URL at creation time.

The Two-Phase Cleanup Strategy

Simply checking expiration at read time is necessary but not sufficient — you still have dead rows consuming storage. A well-designed system has two phases:

Phase 1 — Lazy expiration at read time: Check expires_at when a redirect request arrives. If expired, return 410 Gone and optionally mark the row as expired = true. This is fast and requires no background work but doesn't reclaim storage.

Phase 2 — Active purge via background job: A scheduled background worker (cron job, or a Celery/Sidekiq task) runs periodically — perhaps nightly — scanning for rows where expires_at < NOW() and deleting them in batches.

Expiration Architecture:

  [Redirect Request]
         │
         ↓
  [Read short_key from DB]
         │
         ↓
  [Check expires_at]
         │
    ┌────┴────────────────┐
    ↓ (not expired)       ↓ (expired)
  [302/301 Redirect]   [410 Gone response]
                             │
                       [Mark row as stale]

  ─────────────── Background Job (runs nightly) ─────────────────
  [Batch DELETE WHERE expires_at < NOW() AND is_expired = true]
  [Log deletion metrics for capacity planning]

💡 Pro Tip: When discussing the background job, mention that you'd delete in batches (e.g., 1,000 rows at a time) rather than a single massive DELETE statement. A bulk delete of millions of rows locks the table and degrades live traffic — a detail that signals you understand database operational realities.

🤔 Did you know? Redis natively supports TTL on keys, which is one reason it's attractive as a cache layer for URL shorteners. A cached short-to-long mapping automatically evicts when the TTL expires, without any explicit cleanup job needed at the cache layer.

Don't Forget the Cache Invalidation Problem

If you have a Redis cache in front of your database, an expired URL in the database might still have a live entry in the cache. Your redirect logic must either:

Set a TTL on the Redis key that matches the URL's expires_at, so they expire together, or
Always verify the expires_at field in the database on cache hits for links known to have expiration set.

Pitfall 5: Skipping Capacity Estimation

⚠️ Common Mistake: Jumping directly into architecture diagrams without first establishing the scale of the problem.

Capacity estimation is not busywork — it is the foundation that justifies every architectural decision that follows. When a candidate skips this step, they lose the ability to argue for their choices. Why do you need a cache? Why do you need read replicas? Why Base62 and not Base10? Without numbers, these answers become "because it's best practice" — which is unconvincing.

The Numbers That Drive Decisions

Here are the key quantities to estimate, with example values for a medium-scale service:

	Estimation	Implication
📝 Write rate	100 new URLs/sec	Write path must handle 100 RPS
📖 Read rate	10,000 redirects/sec	100:1 read/write ratio — read path is the bottleneck
💾 Storage per URL	~500 bytes (URL + metadata)	Sizing DB storage
📅 Growth per year	100 × 86,400 × 365 ≈ 3.1B rows/year	After 5 years: ~15B rows, ~7.5 TB
🔑 Key space needed	15B < 62^7 ≈ 3.5T	7-char Base62 keys are sufficient
⚡ Cache hit rate	80% of reads hit 20% of URLs	A cache covering top 20% of keys handles most traffic

🎯 Key Principle: The read/write ratio is the single most important number. A 100:1 ratio tells you immediately that the system is read-heavy and that caching, read replicas, and CDN-level optimization are warranted. A candidate who states this ratio early and uses it to justify their architectural choices is demonstrating exactly the reasoning interviewers reward.

Working Through the Estimation Live

In the interview, do the estimation out loud and explicitly. Here is the narrative a strong candidate might deliver:

"Let's assume we're building a service at Bitly's scale — roughly 100 URL creations per second and 10,000 redirects per second. That's a 100:1 read-to-write ratio, so our read path is the critical bottleneck. Over five years, storing roughly 500 bytes per entry including metadata, we're looking at about 7.5 terabytes of data. A 7-character Base62 key gives us 3.5 trillion possible values — that comfortably covers 15 billion entries with room to spare. Given the heavy read load, I'll design the read path around a Redis cache with a target 90% hit rate, and I'll add read replicas to the primary database for cache misses."

Every sentence in that paragraph is a numbered fact leading to a justified decision. That is the model.

📋 Quick Reference Card: Estimation → Decision Mapping

📊 Metric	🔢 Example Value	🏗️ Architectural Decision
📖 Read/write ratio	100:1	Add Redis cache, read replicas
💾 Total storage (5yr)	7.5 TB	Shard DB horizontally
🔑 Total keys (5yr)	15B	Use 7-char Base62 (3.5T capacity)
⚡ Hot URL distribution	80/20 rule	Cache covers top 20% of keys
🌍 Geographic spread	Global users	Add CDN / regional edge caches

Bringing It All Together: The Candidate Checklist

Before leaving the design phase in your interview, mentally run through this checklist:

🧠 Collision handling — Have I described what happens when two URLs map to the same key? Is there a retry strategy?

📚 Redirect code — Have I asked whether analytics are required, and chosen 301 vs. 302 accordingly?

🔧 Input validation — Have I mentioned scheme whitelisting, malicious URL detection, and duplicate handling?

🎯 Expiration — Have I added a expires_at field, a 410 response for expired links, and a background purge job?

🔒 Capacity estimation — Have I calculated write rate, read rate, storage growth, and key space requirements before justifying my architecture?

Each of these items is a signal. Interviewers are not just evaluating whether you can draw a correct diagram — they are evaluating whether you think like someone who has operated systems at scale, where these details are the difference between a 3am incident and a quiet weekend.

💡 Remember: The pitfalls in this section are not obscure edge cases — they are the expected conversation points in a URL shortener interview. An interviewer who asks you to design a URL shortener has almost certainly asked the same question dozens of times. They know exactly which candidates proactively address collision resolution and which ones don't. By treating each of these five areas as a mandatory checkpoint rather than an optional detail, you move from the category of "knows the basics" into "thinks like a senior engineer." That shift is what earns the offer.

Key Takeaways and Quick-Reference Design Cheat Sheet

You've made it to the finish line. By now you understand not just how to design a URL shortener, but why each decision matters — the trade-offs behind encoding strategies, the reasoning behind cache placement, and the subtle pitfalls that separate polished answers from forgettable ones. This final section distills everything into principles, mnemonics, and a quick-reference cheat sheet you can mentally rehearse the night before an interview.

Let's lock it all in.

The Core Decision: Why Base62 + Auto-Increment Wins

If you remember one technical fact from this entire lesson, let it be this: Base62 encoding applied to an auto-incremented integer ID is the most interview-friendly, production-realistic approach to generating short URLs.

Here's the compact mental model of why:

MD5/SHA hashing produces long strings and risks collisions without extra collision-handling logic — more complexity, more bugs.
Random string generation requires a uniqueness check on every write — a round-trip to the database that becomes a bottleneck at scale.
Auto-increment + Base62 is deterministic, collision-free by design, and requires no extra lookup. It also gives you a predictable key length as your ID space grows.

🧠 Mnemonic: "ACE the encoding" — Auto-increment provides the number, Convert it with Base62, Encoded output is your short key.

The Base62 alphabet uses digits 0–9, lowercase a–z, and uppercase A–Z — 62 characters total. A 6-character key gives you 62⁶ ≈ 56 billion unique URLs, which is more than enough headroom for any realistic interview scenario.

## Base62 encoding: converts a numeric ID to a short alphanumeric string
BASE62_ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

def encode_base62(num: int) -> str:
    """Convert an integer ID to a Base62 short key."""
    if num == 0:
        return BASE62_ALPHABET[0]
    
    result = []
    while num > 0:
        result.append(BASE62_ALPHABET[num % 62])
        num //= 62
    
    return ''.join(reversed(result))  # Reverse because we built it backwards

def decode_base62(short_key: str) -> int:
    """Convert a Base62 short key back to its integer ID."""
    num = 0
    for char in short_key:
        num = num * 62 + BASE62_ALPHABET.index(char)
    return num

## Example usage
print(encode_base62(1))         # Output: '1'
print(encode_base62(62))        # Output: '10'
print(encode_base62(125000000)) # Output: 'dnh9c' (a compact 5-char key)
print(decode_base62('dnh9c'))   # Output: 125000000

This snippet is interview-ready. It's clean, commented, and demonstrates that you understand the bidirectional nature of encoding — you can go from ID to key and back again. Notice that decode_base62 is rarely needed in production (you store the mapping in the database), but showing you understand the reversibility signals deeper comprehension.

💡 Pro Tip: If your interviewer asks "what if two users try to shorten the same long URL?", this is a product decision, not a technical constraint. You can either return the existing short key (deduplication) or create a new one. State the trade-off: deduplication saves storage but requires an index on the long URL column; no deduplication is simpler but wastes space. Either answer is valid — what matters is that you name the trade-off explicitly.

The Three-Layer Mantra

Every scalable web system has layers, and URL shorteners are no exception. Internalize this as a three-layer mantra you can recite during the architecture portion of any interview:

┌─────────────────────────────────────────────┐
│              CLIENT REQUEST                 │
└────────────────────┬────────────────────────┘
                     │
┌────────────────────▼────────────────────────┐
│         🔵 API LAYER (Logic)                │
│  • Validates input                          │
│  • Generates short key                      │
│  • Routes read vs. write                    │
└────────────────────┬────────────────────────┘
                     │
┌────────────────────▼────────────────────────┐
│         🟡 CACHE LAYER (Speed)              │
│  • Redis / Memcached                        │
│  • Absorbs ~80-95% of redirect reads        │
│  • TTL-based expiration                     │
└────────────────────┬────────────────────────┘
                     │  (cache miss only)
┌────────────────────▼────────────────────────┐
│         🟢 DATABASE LAYER (Durability)      │
│  • Stores canonical short→long mapping      │
│  • Handles writes and cache-miss reads      │
│  • Source of truth for all data             │
└─────────────────────────────────────────────┘

🎯 Key Principle: Each layer has exactly one job. The API layer handles logic, the cache layer absorbs read load, and the database layer ensures durability. When a layer tries to do another layer's job, you've introduced coupling — the enemy of scalable systems.

The reason this matters in interviews is that many candidates jump straight to the database when asked about reads. Experienced engineers immediately ask: "What's our cache hit rate?" A well-tuned Redis cache with appropriate TTLs can serve 95% of redirect traffic without ever touching the database. That's the difference between a system that handles 10,000 requests/second and one that handles 500,000.

💡 Mental Model: Think of a popular restaurant. The API layer is the host who takes your name and manages the queue. The cache layer is the breadbasket already on the table — satisfying most immediate needs instantly. The database layer is the kitchen — accurate, authoritative, but slow. You don't send every guest to the kitchen directly.

Scalability Checklist

When your interviewer asks "how would you scale this?", run through this checklist mentally before answering. It demonstrates structured thinking and ensures you don't skip critical components.

Sharding Strategy

Database sharding distributes data across multiple nodes to handle write volume and storage limits. For a URL shortener, hash-based sharding on the short key is the most common approach. Take the first character (or a hash of the short key) and route to a shard deterministically.

⚠️ Common Mistake: Sharding on user_id seems logical, but it creates hot shards for power users — a single viral marketing campaign can send millions of requests to one user's shard.

Replication

Read replicas handle redirect lookups (which are far more frequent than writes). Write to the primary; read from replicas. This is often the single highest-impact change for a read-heavy system.

Caching TTL

Not all URLs are created equal. Set shorter TTLs for recently created links (they might be mistakes or tests) and longer TTLs for links with high click volume (they're clearly popular and stable). A tiered TTL strategy maximizes cache efficiency without stale data risk.

Expiration Cleanup Jobs

Expired URLs don't delete themselves. A background job (cron or event-driven) must periodically scan for expired entries and remove them from both the database and cache. Failing to mention this is a common oversight — interviewers notice it.

import time
from datetime import datetime, timezone

## Pseudocode for a background cleanup job
## In production, this would be a scheduled task (e.g., Celery beat, cron, AWS Lambda)

def cleanup_expired_urls(db, cache, batch_size=1000):
    """
    Scans for expired URLs in batches and removes them from DB and cache.
    Runs as a scheduled background job (e.g., every 15 minutes).
    """
    now = datetime.now(timezone.utc)
    
    # Query expired records in batches to avoid locking the table
    expired_keys = db.query(
        "SELECT short_key FROM urls WHERE expires_at < %s LIMIT %s",
        (now, batch_size)
    )
    
    if not expired_keys:
        return  # Nothing to clean up
    
    key_list = [row['short_key'] for row in expired_keys]
    
    # Step 1: Invalidate cache entries first (prevents serving stale redirects)
    cache.delete_many(key_list)  # e.g., Redis UNLINK for async deletion
    
    # Step 2: Delete from database
    db.execute(
        "DELETE FROM urls WHERE short_key = ANY(%s)",
        (key_list,)
    )
    
    print(f"[Cleanup] Removed {len(key_list)} expired URLs at {now.isoformat()}")
    
    # If batch was full, there may be more — schedule another run immediately
    if len(key_list) == batch_size:
        cleanup_expired_urls(db, cache, batch_size)  # Recursive batch processing

This cleanup function demonstrates several production-grade concepts at once: batch processing to avoid table locks, cache invalidation before database deletion (to prevent a race condition where the cache serves a deleted URL), and recursive batching for large backlogs. Mentioning even the concept of this job in an interview shows operational maturity.

Interview Communication Tips

Technical correctness is necessary but not sufficient. Interviewers are also evaluating how you think — your ability to navigate ambiguity, justify decisions, and communicate trade-offs. Here are the three communication habits that consistently separate strong candidates.

1. Clarify Requirements First (Always)

Before drawing a single box on the whiteboard, ask clarifying questions. For a URL shortener specifically:

🎯 "Should short URLs expire, or are they permanent?"
🎯 "Do we need custom aliases (e.g., /my-campaign) or only auto-generated keys?"
🎯 "What's our expected read-to-write ratio?"
🎯 "Are we building this for a global audience or a single region?"

Each answer changes the design meaningfully. Custom aliases require a uniqueness check path. Global audiences require CDN and geo-distributed caches. Spending 3–5 minutes on requirements gathering shows engineering maturity and often reveals scope that simplifies your design.

2. State Assumptions Explicitly

When you don't have enough information to make a definitive choice, say so — and then make an assumption out loud:

"I'll assume a read-to-write ratio of 100:1, which is typical for URL shorteners in production. This means I'll prioritize read path optimization over write throughput."

❌ Wrong thinking: Silently pick an approach and start designing as if it were obvious.

✅ Correct thinking: Surface your assumptions, invite correction, and show that your design choices are conditional on those assumptions.

3. Discuss Trade-offs for Every Major Choice

For each significant design decision, briefly acknowledge the road not taken:

"I'm choosing Redis over Memcached here because Redis supports TTL natively and gives us persistence options, though Memcached would be slightly faster for pure cache workloads."

This isn't wishy-washy — it's proof that you considered alternatives and chose deliberately. Interviewers aren't looking for the single "correct" answer; they're looking for engineers who reason rigorously.

💡 Pro Tip: Structure your trade-off statements as "I chose X over Y because Z, though Y would be better if W." This template forces you to name both options, give a reason, and identify the counter-condition — the complete picture.

Extension Ideas That Leave a Lasting Impression

If you finish the core design with time remaining, or if your interviewer asks "what would you add next?", these extensions demonstrate product thinking and system breadth.

Analytics Tracking

Every short URL click is a data point. A production URL shortener can capture click count, timestamp, referrer, device type, and geographic location (derived from IP). This data flows into an analytics pipeline (e.g., Kafka → Spark → data warehouse) separate from the redirect hot path, so analytics latency doesn't affect redirect speed.

🤔 Did you know? Bitly processes over 600 million link interactions per month and monetizes much of its business through the analytics layer, not the URL shortening itself.

Abuse Prevention

Short URLs are a prime vector for phishing and malware distribution. Mention these mitigation strategies:

🔒 Rate limiting per IP or API key to prevent bulk malicious shortening
🔒 Safe Browsing API integration (Google or similar) to scan destination URLs
🔒 Domain blocklists for known malicious domains
🔒 Manual review queue for flagged URLs

QR Code Generation

QR codes are trivially generated from short URLs server-side using libraries like qrcode in Python. The short URL is already compact enough to produce a clean, scannable QR code — mention this as a cacheable add-on endpoint (/qr/{short_key}).

The Complete Quick-Reference Cheat Sheet

Here is everything condensed into a single reference card. Review this before your interview.

📋 Quick Reference Card: URL Shortener System Design

🏷️ Category	📌 Decision	⚖️ Key Trade-off
🔑 Key Generation	Base62 + Auto-increment ID	Simple & collision-free vs. requires distributed ID coordination at scale
🗄️ Database	MySQL/PostgreSQL (primary) + Read replicas	ACID guarantees vs. NoSQL flexibility; SQL wins for simple key-value lookups
⚡ Cache	Redis with TTL	95% read absorption; adds cache invalidation complexity
🌐 Read Path	Cache → DB fallback	Low latency vs. eventual consistency risk if cache TTL is too long
✍️ Write Path	API → DB (sync) → Cache (async)	Durability first; cache population on first read (lazy) or on write (eager)
📊 Sharding	Hash on short_key	Even distribution vs. cross-shard queries (rare for URL shorteners)
🔁 Redirect Type	HTTP 301 (permanent) or 302 (temporary)	301 is cached by browsers (saves server load); 302 is tracked on every click
⏱️ Expiration	DB timestamp + background cleanup job	Must delete from cache before DB to prevent stale redirects
🌍 Global Scale	CDN + geo-distributed caches	Reduces latency globally; adds cache consistency complexity
📈 Analytics	Async event stream (Kafka)	Decouples analytics from redirect path; requires additional infrastructure

Numbers Worth Memorizing

Interviewers respond well to candidates who reason from concrete numbers. Keep these in your back pocket:

📚 62⁶ ≈ 56 billion — unique keys from a 6-character Base62 string
📚 62⁷ ≈ 3.5 trillion — upgrade path when 6 characters isn't enough
📚 ~100:1 — typical read-to-write ratio for a URL shortener
📚 ~1 KB — approximate storage per URL record (short key + long URL + metadata)
📚 1 TB ≈ 1 billion URLs — rough storage estimate at 1 KB per record
📚 < 10 ms — achievable redirect latency with a warm Redis cache
📚 95%+ — realistic cache hit rate for a well-tuned production deployment

🧠 Mnemonic: "6 characters, 56 billion keys, 95% cache hit." If you can say those three numbers confidently, you've already signaled production experience.

What You Now Understand That You Didn't Before

Let's take stock of the mental model you've built:

🧠 You understand that short URL generation is not arbitrary — it's a deliberate choice between hashing, randomness, and encoding, each with specific trade-off profiles.
🧠 You can articulate why caching is not optional for a URL shortener — it's the architectural layer that makes the read-to-write ratio workable at scale.
🧠 You know that database choice matters less than database usage patterns — read replicas, write-primary routing, and proper indexing often matter more than SQL vs. NoSQL.
🧠 You recognize that operational concerns — expiration cleanup, cache invalidation, abuse prevention — are part of the design, not afterthoughts.
🧠 You have a communication framework for interviews: clarify → assume explicitly → decide → justify with trade-offs.

⚠️ One final critical point: The redirect HTTP status code is not a trivial detail. Choosing 301 (permanent) versus 302 (temporary/found) has real consequences. With 301, browsers cache the redirect and never call your server again — great for performance, terrible for analytics. With 302, every click goes through your server — perfect for click tracking, but increases load. Always explicitly state which you're using and why. Most candidates skip this entirely; mentioning it will set you apart.

Practical Next Steps

Knowledge becomes skill through practice. Here are three concrete actions to solidify this material:

1. Build a mini version. Implement a working URL shortener in your language of choice — just the core encode/decode logic plus a simple key-value store (even SQLite). The act of writing the cleanup job and handling the cache-miss path yourself will make these concepts permanent rather than theoretical.

2. Do a mock system design interview. Use this cheat sheet as your evaluation rubric. Have a peer ask "design a URL shortener" and time yourself: 5 minutes for requirements, 10 minutes for core architecture, 5 minutes for scaling, 5 minutes for extensions. The time pressure reveals which concepts you've truly internalized.

3. Extend to a related problem. URL shorteners are a gateway to adjacent system design challenges: Pastebin (store text blobs instead of URLs), Feature Flag Systems (short keys mapped to configuration instead of URLs), and Link-in-Bio tools (one short URL routing to multiple destinations). Each uses the same three-layer architecture with domain-specific modifications — recognizing the pattern is the skill.

💡 Real-World Example: The engineering team at Bitly published that their early architecture was nearly identical to what you've designed here — a simple key-value store with aggressive caching. They only needed to introduce sharding and distributed ID generation when they crossed tens of millions of daily active URLs. Start simple; scale when the data demands it.

You now have the vocabulary, the architecture, the numbers, and the communication strategy to walk into a URL shortener system design interview with confidence. The system is deceptively simple on the surface — and that simplicity is exactly what makes it such a rich canvas for demonstrating depth. Every choice you justify, every trade-off you name, and every edge case you surface tells the interviewer: this engineer has thought carefully about how systems actually work.

That's the whole game.

📝

Ready to practice?

This lesson has 15 questions to help you learn