You are viewing a preview of this lesson. Sign in to start learning
Back to System Design Interviews for Software Developers with Examples

Requirements Gathering

Learn to ask the right clarifying questions to scope functional and non-functional requirements.

Why Requirements Gathering Makes or Breaks Your Interview

Imagine you're a contractor hired to build a house. The client says, "Build me something to live in." Do you immediately start pouring concrete? Of course not — you ask questions. How many people will live there? What's the budget? Is this in the city or the countryside? Does the client want a single story or multiple floors? A contractor who grabs a shovel before asking these questions isn't impressive — they're reckless. And yet, in system design interviews every single day, talented engineers make exactly this mistake. They hear "Design Twitter" and immediately start drawing databases and load balancers, confidently solving the wrong problem at full speed. Save these concepts to your free flashcards and revisit them before your next interview — because what you're about to learn is the single most leveraged skill in your entire system design toolkit.

This section is about requirements gathering: the deliberate, structured process of understanding what you're building before you decide how to build it. It sounds obvious. It rarely happens. And that gap — between what sounds obvious and what candidates actually do — is exactly where interviews are won and lost.

The Costly Mistake Most Candidates Make

Let's be honest about what typically happens in a system design interview. The interviewer says something like: "Design a URL shortener." The candidate nods, picks up the marker, and immediately writes "API Gateway → Application Server → Database" on the whiteboard. Within sixty seconds, they're debating whether to use PostgreSQL or Cassandra. It feels productive. It looks confident. It is, almost always, a trap.

Here's the problem: "Design a URL shortener" is not a specification. It's a prompt. A starting pistol. The real problem hasn't been defined yet. Consider how radically different the design would need to be depending on the answers to just a few questions:

  • Are we building this for 1,000 users or 1 billion?
  • Do shortened URLs expire, or do they live forever?
  • Do we need analytics on how many times each link was clicked?
  • Should users be able to customize their short codes (bit.ly/my-brand) or are random codes fine?
  • Is read performance more critical than write performance?

A URL shortener for a startup's internal tools team looks nothing like bit.ly at global scale. A candidate who starts designing without asking these questions has made a silent assumption about which problem they're solving — and there's a high chance it's the wrong one.

⚠️ Common Mistake — Mistake 1: Designing before scoping ⚠️ Jumping straight into architecture feels like momentum, but it's actually a red flag to interviewers. It signals that you build first and think later — exactly the instinct that causes expensive rewrites in production engineering.

Wrong thinking: "I know URL shorteners. I'll show them I know the architecture immediately."

Correct thinking: "I've seen URL shorteners before, but I don't know which URL shortener they want me to build. Let me find out."

This distinction — between pattern-matching to a familiar problem and actually solving the problem in front of you — is what separates junior thinking from senior thinking. And interviewers know the difference.

Why Interviewers Leave Problems Vague on Purpose

Here's something most candidates don't realize: the vagueness is intentional. When an interviewer says "Design a notification system" without further specification, they are not being lazy or unprepared. They are running a test. A deliberate, structured test.

🎯 Key Principle: System design interview prompts are intentionally underspecified to evaluate whether you ask the right questions before building anything.

Think about what interviewers are actually trying to assess. They're not primarily evaluating whether you know the CAP theorem or can recite the difference between a message queue and a pub/sub system. Those things matter, but they're table stakes. What senior engineers care about — what they need on their team — is someone with the judgment to ask the right questions at the right time.

In a real engineering role, requirements are almost never fully specified. Product managers give you rough ideas. Stakeholders contradict each other. Business constraints shift mid-project. The ability to navigate ambiguity, ask clarifying questions, and arrive at a well-scoped problem statement is one of the most valuable skills an engineer can have. The system design interview is, among other things, a simulation of that real-world scenario.

🤔 Did you know? Research on engineering team performance consistently shows that the most costly bugs and rewrites don't come from poor implementation — they come from building the wrong thing. Requirements mismatches account for the majority of software project failures. The interview is testing whether you'd contribute to that statistic.

When you ask good clarifying questions early in the interview, you send a powerful signal: "I have worked on real systems. I know that vague requirements produce bad designs. I refuse to build until I understand what I'm building." That signal is worth more than any particular architectural pattern you might demonstrate later.

What "Senior Engineering Instincts" Actually Look Like

You've probably heard the advice "think like a senior engineer" without a clear explanation of what that actually means in practice. In the context of system design interviews, senior engineering instincts manifest in a very specific and observable way during requirements gathering.

A senior engineer approaches a new problem with what you might call productive skepticism — not cynicism, but a healthy suspicion that the problem as stated is not the problem as it exists. They ask questions not because they don't know the answer, but because they understand that their assumptions might be wrong, and wrong assumptions compound.

Consider this mental model:

Vague Prompt
     │
     ▼
┌─────────────────────────────────────────┐
│           Decision Point                │
├─────────────────────────────────────────┤
│                                         │
│  Path A: Jump to Design                 │
│  ─────────────────────                  │
│  • Fast start (illusory)                │
│  • High variance in outcome             │
│  • Design may be irrelevant             │
│  • Interviewers mentally check out      │
│                                         │
│  Path B: Gather Requirements First      │
│  ──────────────────────────────         │
│  • 3-5 minute investment                │
│  • Scoped, defensible problem           │
│  • Shared mental model with interviewer │
│  • Design is anchored in real tradeoffs │
│                                         │
└─────────────────────────────────────────┘
     │
     ▼
 Well-Scoped Problem Statement
     │
     ▼
 Architecture That Actually Answers
  the Question That Was Asked

Path B takes a few minutes at the start of your 45-60 minute interview. But those minutes fundamentally change the nature of everything that comes after. Instead of defending arbitrary architectural choices, you're defending choices that logically follow from agreed-upon constraints. That's a completely different — and far more productive — conversation.

💡 Mental Model: Think of requirements gathering as buying insurance for your entire design. A small investment upfront dramatically reduces the probability of a catastrophic design failure later in the interview.

The Trajectory Effect: How the First Five Minutes Shapes Everything

A 45-60 minute system design interview is not a flat, uniform experience. It has structure, momentum, and a trajectory. The direction that trajectory points is almost entirely determined by what happens in the first five to ten minutes.

Here's a practical illustration. Two candidates are given the same prompt: "Design a ride-sharing application like Uber."

Candidate A immediately starts drawing: "So we'll need a user service, a driver service, a trip service, maybe a payments service..." They're off to the races. Fifteen minutes in, the interviewer asks: "How are you handling real-time location updates?" The candidate pauses, realizes they haven't considered scale, and asks: "Wait, how many concurrent rides are we expecting?" The interviewer replies: "About 500,000 at peak." The candidate's face falls slightly. Their stateless API design won't work at that scale. They backtrack. The interview has derailed.

Candidate B pauses after hearing the prompt. "Before I start designing, I'd like to make sure I understand what we're optimizing for. A few quick questions: Are we focusing on the matching experience — connecting riders and drivers — or the full end-to-end product including payments and ratings? And what's our scale target?" The interviewer answers. The candidate follows up: "Got it — so we're handling real-time geolocation at about 500,000 concurrent rides. That tells me we're going to need to think carefully about our location update architecture from the start." The design that follows is shaped around the constraints that matter.

Same prompt. Radically different interviews. The difference is entirely in those first five minutes.

🧠 Mnemonic: Think SCOPE before you draw:

  • S — Scale: what are the traffic and data volume expectations?
  • C — Core features: what's in scope vs. explicitly out of scope?
  • O — Operations: read-heavy, write-heavy, or balanced?
  • P — Performance: what are the latency and availability requirements?
  • E — Edge cases: any unusual constraints or must-handle scenarios?

Requirements Gathering as a Signal of Real-World Experience

Let's connect this to something concrete. Below is a simplified example of the kind of code a junior engineer might write versus the kind of thinking a senior engineer brings before writing any code. The code itself isn't what we're evaluating — it's the thought process that precedes it.

Imagine you're designing a notification service. A junior engineer might immediately scaffold something like this:

## Junior approach: jumping straight to implementation
## without understanding requirements

class NotificationService:
    def send_notification(self, user_id: str, message: str):
        # Just send an email, seems reasonable?
        email = self.get_user_email(user_id)
        self.email_client.send(email, message)
        print(f"Sent notification to {user_id}")

This code makes at least five hidden assumptions: that email is the right channel, that delivery confirmation isn't needed, that there's no retry logic required, that notifications don't need to be prioritized, and that the system is synchronous. These assumptions might all be wrong.

A senior engineer, after gathering requirements, would know — before writing a line of code — that they need something fundamentally different:

## Senior approach: design shaped by requirements gathered upfront
## Requirements established:
##   - Multi-channel: email, SMS, push notifications
##   - High volume: ~10M notifications/day
##   - Delivery guarantees: at-least-once with idempotency
##   - Priority levels: CRITICAL, HIGH, NORMAL, LOW
##   - Async processing required (can't block user requests)

from enum import Enum
from dataclasses import dataclass
from typing import Optional

class NotificationChannel(Enum):
    EMAIL = "email"
    SMS = "sms"
    PUSH = "push"

class Priority(Enum):
    CRITICAL = 1  # Process immediately, dedicated queue
    HIGH = 2      # Process within seconds
    NORMAL = 3    # Process within minutes
    LOW = 4       # Best-effort, batch processing

@dataclass
class NotificationRequest:
    user_id: str
    message: str
    channel: NotificationChannel
    priority: Priority
    idempotency_key: str       # Prevent duplicate delivery
    metadata: Optional[dict] = None

class NotificationService:
    def enqueue_notification(self, request: NotificationRequest):
        """
        Enqueues to priority-specific queue for async processing.
        Returns immediately - does not block the caller.
        """
        queue = self.get_queue_for_priority(request.priority)
        queue.publish(
            payload=request,
            dedup_key=request.idempotency_key  # Idempotency at queue level
        )

The second implementation isn't just better code — it's a different class of solution. And none of those improvements came from knowing more programming patterns. They came from asking better questions first.

💡 Real-World Example: Engineers at large-scale companies like Amazon, Google, and Netflix have documented that their internal design review processes mandate requirements clarification before any architecture diagram is drawn. These companies didn't adopt this practice because it sounds good in theory — they adopted it because building the wrong thing at scale is extraordinarily expensive. The system design interview is, in part, a test of whether you've internalized this lesson.

The Defensible Design: Your Anchor for the Entire Interview

There's one more critical reason why requirements gathering matters, and it has everything to do with how design discussions evolve over the course of an interview. Interviewers will challenge your choices. They'll ask "why did you choose this database?" or "what happens when this service goes down?" or "what if traffic suddenly spikes 10x?"

If your design was built on undefined assumptions, every challenge becomes existential. You find yourself defending choices you can't actually justify, because you never established the ground truth that would make those choices logical.

But if you gathered requirements first, you have an anchor. "I chose a NoSQL document store here because we established that our read:write ratio is 100:1, and our access pattern is almost always a single-key lookup — we don't need relational joins for this use case." That's a defensible position. You're not guessing. You're reasoning from stated constraints.

Here's what this looks like as a comparison:

 WITHOUT REQUIREMENTS              WITH REQUIREMENTS
 ─────────────────────            ──────────────────────
                                  
 Interviewer: "Why                Interviewer: "Why
 Cassandra?"                      Cassandra?"
      │                                │
      ▼                                ▼
 "Well... it scales well,         "We agreed we need
 and I've used it before,         write throughput of
 and I thought it might           ~500K writes/second
 be a good fit..."                with eventual                           
                                  consistency acceptable.
 [WEAK - not anchored             Cassandra's LSM-tree
  in requirements]                architecture is
                                  optimized for exactly
                                  this pattern."
                                  
                                  [STRONG - anchored
                                   in requirements]

📋 Quick Reference Card: What Requirements Gathering Achieves

🎯 Outcome 📚 Why It Matters 🔧 Impact on Interview
🔒 Scoped problem Prevents solving the wrong problem Every design decision is relevant
🧠 Shared context Interviewer and candidate align No surprises or backtracking
📚 Defensible choices Architecture follows from constraints Challenges become discussions, not attacks
🔧 Demonstrated judgment Proves senior engineering instincts Elevates the entire conversation
🎯 Time efficiency 45-60 min is used optimally Deeper dive into interesting tradeoffs

Setting the Right Tone from the Start

There's one final dimension to requirements gathering that candidates almost never discuss: tone and collaboration. System design interviews are not monologues. The best interviews feel like two engineers working through a problem together. Requirements gathering is how you establish that collaborative tone from the very beginning.

When you ask good questions, you transform the dynamic. Instead of a candidate performing for a judge, you have two people on the same side of the table, figuring something out together. Interviewers — who are engineers themselves — genuinely enjoy this. It's much more interesting to discuss "given that we need 99.99% availability, how does that change our approach to the database layer?" than to watch someone recite a memorized architecture.

💡 Pro Tip: At the end of your requirements gathering phase, briefly summarize what you've learned before you start designing. Say something like: "Okay, so based on what we've discussed: we're targeting 1 million DAUs, the core feature is X, we're read-heavy, and availability is more important than consistency. Does that match your understanding?" This creates an explicit shared foundation and gives the interviewer a chance to correct any misalignment before you've invested time in the wrong direction.

This simple habit — summarize and confirm — is one of the highest-leverage moves you can make in the first ten minutes of a system design interview. It costs thirty seconds and it buys you the confidence to design boldly, knowing you're solving the right problem.

🎯 Key Principle: Requirements gathering isn't just about collecting information. It's about establishing a shared reality with your interviewer — a foundation of agreed-upon constraints from which your entire design can be logically derived and confidently defended.

In the sections that follow, you'll learn exactly how to read and deconstruct ambiguous prompts, a repeatable framework for asking the right questions in the right order, and how to apply all of this to real interview scenarios. But everything that comes next builds on what you've understood here: that the single highest-leverage investment you can make in a system design interview is the time you spend understanding the problem before you start solving it.

The Anatomy of a System Design Problem Statement

When an interviewer leans across the table — or types into a shared document — and says "Design Twitter," they are not being careless. They are being deliberate. That two-word prompt is a carefully constructed test, and the test begins before you draw a single box or name a single service. It begins the moment you decide how to read that sentence.

This section is about developing the skill of reading system design prompts the way a surgeon reads an X-ray: methodically, with trained eyes that see not just what is visible but what the shadows imply and what the absence of certain features reveals.

Why Prompts Are Deliberately Sparse

System design interviews use short, open-ended prompts by design. A prompt like "Design a URL shortener" or "Design a ride-sharing service" is intentionally stripped of the constraints, scale, and context that would make the problem well-defined. This ambiguity serves a specific evaluative purpose: it separates candidates who react from candidates who investigate.

In the real world, product requirements are never handed to engineers in a neat specification document. Engineers work alongside product managers, data scientists, and customers to discover what the system truly needs to do. An interviewer who gives you a sparse prompt is simulating exactly that environment. They want to see if you instinctively ask the right questions before committing to a design.

🎯 Key Principle: The prompt is not the problem. The prompt is the entry point to the problem. Your job in the first few minutes is to transform a vague sentence into a scoped, agreed-upon set of requirements.

The Three Layers of Every Prompt

Every system design prompt, regardless of how simple it appears, contains information across three distinct layers. Learning to see all three simultaneously is the foundational skill of requirements gathering.

PROMPT: "Design a URL shortener"

┌─────────────────────────────────────────────────────────┐
│  LAYER 1: EXPLICIT                                      │
│  What the prompt literally states                       │
│  → The system shortens URLs                             │
│  → There is input (long URL) and output (short URL)     │
├─────────────────────────────────────────────────────────┤
│  LAYER 2: IMPLIED                                       │
│  What the domain knowledge tells us to expect           │
│  → Users must be able to retrieve the original URL      │
│  → Short URLs must be unique                            │
│  → Redirection must happen quickly                      │
├─────────────────────────────────────────────────────────┤
│  LAYER 3: MISSING                                       │
│  What is absent and must be uncovered                   │
│  → How many URLs shortened per day?                     │
│  → How long should short URLs persist?                  │
│  → Do users need accounts? Analytics? Custom aliases?   │
│  → Read-to-write ratio?                                 │
└─────────────────────────────────────────────────────────┘

Most candidates only engage with Layer 1. Strong candidates engage with all three before they start designing. Let's explore each layer in depth.

Layer 1: The Explicit Content

Explicit content is what the prompt states in plain language. For "Design Twitter," the explicit content is minimal: a social platform exists, and users interact with it. For "Design a distributed message queue," the explicit content tells you the system is distributed, involves messages, and functions as a queue (implying ordering and producer-consumer semantics).

Reading the explicit content carefully still requires attention. Specific word choices carry weight. "Design a real-time chat system" immediately implies that latency is a first-class concern. "Design a scalable notification service" signals that the interviewer cares about horizontal scaling from the outset. Never skim past adjectives in a prompt — they are often the interviewer's way of nudging you toward the constraints they care most about.

Layer 2: The Implied Content

Implied content emerges from your domain knowledge about the type of system being described. This is where experience with software systems pays dividends. If you are asked to design a search engine, your knowledge of how search works implies that you will need a crawler, an indexer, and a query processor — even though none of those are mentioned.

💡 Mental Model: Think of implied content as the physics of the domain. Just as a bridge engineer knows that gravity exists without it being stated in the project brief, a software engineer knows that a social feed must handle fan-out, that a payment system must handle idempotency, and that a file storage service must handle concurrent writes — because these are the natural forces acting on those systems.

Implied content is also where hidden complexity lives. "Design a notification system" sounds simple until you apply domain knowledge and realize it implies:

  • 📱 Multiple delivery channels (push, email, SMS, in-app)
  • 🔄 Retry logic for failed deliveries
  • 🎯 User preference management
  • 📊 Delivery receipts and read tracking
  • 🔒 Rate limiting to avoid spamming users

None of these are stated. All of them are implied by what notification systems actually do in the real world.

Layer 3: The Missing Content

Missing content is where the real design decisions hide. This is the information that neither the prompt nor domain knowledge can supply — it requires a conversation with your interviewer. Scale, specific user behaviors, non-functional requirements, and edge cases all live in this layer.

⚠️ Common Mistake: Many candidates treat missing content as unimportant because it is not stated. In reality, missing content often determines the entire architecture. A URL shortener handling 100 requests per day is a weekend project. A URL shortener handling 100 million redirects per day requires distributed caching, database sharding, and CDN integration. Same prompt. Radically different systems.

Recognizing Hidden Complexity

One of the most important mental habits you can develop is treating simple-sounding prompts with respectful suspicion. The simplicity of the prompt is inversely proportional to your confidence about what you are building.

Consider the prompt: "Design a leaderboard system." At first glance, this seems straightforward — store scores, sort them, display the top N. A junior developer might sketch a database table with a SELECT ... ORDER BY score DESC LIMIT 100 query and call it done. But apply the three-layer framework and the complexity becomes visible immediately.

Let's look at what happens when you annotate this prompt with questions about scale and behavior:

## Naive implementation — works for small scale
## This is what a "simple" leaderboard looks like

def get_top_players(db_connection, limit=100):
    """
    Works fine for thousands of users.
    Falls apart at millions — full table scan on every request.
    No caching, no pre-computation, no pagination.
    """
    query = """
        SELECT user_id, username, score
        FROM player_scores
        ORDER BY score DESC
        LIMIT %s
    """
    cursor = db_connection.cursor()
    cursor.execute(query, (limit,))
    return cursor.fetchall()

## The questions that change this design entirely:
## 1. How many players? (1K vs 100M changes everything)
## 2. How often do scores update? (Real-time vs daily batch)
## 3. Global leaderboard or per-game/per-region?
## 4. What's the acceptable latency? (10ms vs 2s)
## 5. Does a player need to see their own rank?
##    (Ranking arbitrary players is much harder than top-N)

The comment at the bottom of that code block contains five questions, each of which could fundamentally alter the architecture. If scores update in real-time and users need to see their own rank among 100 million players, you are now designing a system that requires Redis sorted sets, asynchronous score processing, and careful thinking about consistency tradeoffs. The initial ORDER BY query is not just inefficient — it is architecturally wrong.

🤔 Did you know? The "rank of an arbitrary player" problem (finding where a specific user sits in a global ranking) is significantly harder than the "top-N players" problem. Finding the top 100 is a bounded operation. Finding that a specific user is ranked 4,782,301st requires either a full sort or clever data structures like sorted sets or rank trees.

Categorizing Unknowns: Behavior vs. Quality

Once you have identified that a prompt contains missing content, you need a mental filing system for organizing those unknowns before you start asking questions. The most powerful categorization is the distinction between functional requirements (behavior) and non-functional requirements (quality).

Functional requirements describe what the system does: the features it provides, the operations it supports, and the workflows it enables. These map to the verbs and nouns of the system.

Non-functional requirements describe how well the system does it: its performance, availability, durability, consistency, and security properties. These map to the adjectives and adverbs of the system.

PROMPT: "Design a file storage service (like Dropbox)"

┌──────────────────────────────┬──────────────────────────────────┐
│  FUNCTIONAL (Behavior)       │  NON-FUNCTIONAL (Quality)        │
│  What does it DO?            │  How WELL does it do it?         │
├──────────────────────────────┼──────────────────────────────────┤
│  ▸ Upload files              │  ▸ Max file size supported        │
│  ▸ Download files            │  ▸ Upload/download latency target │
│  ▸ Sync across devices       │  ▸ Availability (99.9% vs 99.99%) │
│  ▸ Share files with others   │  ▸ Durability (data loss tolerance)│
│  ▸ Version history           │  ▸ Consistency (sync lag tolerance)│
│  ▸ Folder organization       │  ▸ Scale (# users, total storage) │
│  ▸ Conflict resolution       │  ▸ Security (encryption at rest?) │
└──────────────────────────────┴──────────────────────────────────┘

This two-column mental model is powerful because it prevents a common failure mode: candidates who gather functional requirements thoroughly but neglect non-functional ones. Non-functional requirements often drive the most significant architectural decisions. Whether a system needs 99.9% or 99.999% availability is the difference between a simple primary-replica setup and an active-active multi-region deployment.

🧠 Mnemonic: Use "BSACCS" to remember the key non-functional dimensions:

  • Bandwidth / Throughput
  • Scalability
  • Availability
  • Consistency
  • Capacity / Storage
  • Security

You do not need to ask about all six in every interview, but running through them mentally ensures you do not overlook a dimension that turns out to be critical.

The Silent Annotation Technique

Knowing the three layers and the functional/non-functional distinction is theoretical knowledge. The silent annotation technique is how you convert that knowledge into a concrete, in-interview practice.

Here is the protocol:

  1. When the interviewer gives you the prompt, pause. Do not speak immediately. A 15-30 second silence while you think is professional, not awkward. Interviewers expect it.

  2. If you have a whiteboard or shared document, write the prompt down. Physically writing the prompt engages a different cognitive mode and slows your racing mind.

  3. Mentally — or literally — annotate the prompt with question marks. Underline every noun (it is a potential entity in your system). Circle every adjective (it may imply a quality requirement). Put a question mark next to everything that is ambiguous.

  4. Categorize your annotations into the functional/non-functional grid before you open your mouth.

  5. Only then, begin speaking — with a brief summary of what you understood, followed by your first question.

Let's see this technique applied to a real prompt in code-comment form, which is a useful way to practice this skill on paper:

// PROMPT: "Design a ride-sharing service"
// Let's annotate silently before speaking

/*
  EXPLICIT:
  - "ride-sharing" → matching riders with drivers
  - "service" → implies multiple users, likely a platform

  IMPLIED (domain knowledge):
  - Real-time location tracking for both riders and drivers
  - Pricing / fare calculation
  - Trip state machine: [requested → matched → in_progress → completed]
  - Payment processing
  - Ratings system
  - Maps / routing integration

  MISSING — FUNCTIONAL questions to ask:
  ❓ Scope: just the matching/dispatch, or the full platform?
  ❓ Rider features: scheduled rides? Multiple stops?
  ❓ Driver features: surge pricing visibility? Route optimization?
  ❓ Do we include payments, or treat that as an external service?

  MISSING — NON-FUNCTIONAL questions to ask:
  ❓ Scale: how many concurrent riders/drivers? (city vs. global)
  ❓ Latency: how fast must a match be found? (< 10s?)
  ❓ Availability: what happens if matching service is down?
  ❓ Consistency: can two drivers be matched to same rider?

  PRIORITY ORDER (most impactful unknowns first):
  1. Scale — determines whether we need distributed systems
  2. Match latency requirement — drives algorithm choice
  3. Scope (payments, ratings in or out?) — scopes the work
  4. Consistency guarantees — determines transaction complexity
*/

// NOW I'm ready to speak: 
// "Before I dive in, let me make sure I understand the scope..."

Notice the priority ordering at the end of that annotation. This is a crucial discipline. You will rarely have time to ask every question on your list, and asking questions in a scattered order signals disorganized thinking. Always ask the questions that most significantly affect the architecture first.

💡 Pro Tip: Start your verbal response with a one-sentence summary of your understanding before you ask questions. For example: "It sounds like we're designing the core matching and dispatch system for a ride-sharing platform, similar to Uber or Lyft. Before I start designing, I have a few questions to scope this correctly." This demonstrates that you understood the prompt and frames your questions as deliberate scoping, not confusion.

How Assumptions Compound: A Worked Example

To make concrete why the silent annotation technique matters, consider what happens when two candidates receive the same prompt — "Design a messaging system" — but make different implicit assumptions.

Candidate A Candidate B
Assumed users Millions globally 10,000 within an enterprise
Assumed message type Text + media + reactions Text-only
Assumed read pattern Read once, mark delivered Persistent, searchable archive
Assumed availability 99.99%, consumer SLA 99.9%, enterprise acceptable
Architecture result Distributed, sharded, global CDN Monolith + PostgreSQL + S3

Both candidates built a messaging system. Neither built the messaging system the interviewer had in mind. And crucially, neither knows how far off they are until 20 minutes in, when the misalignment surfaces and there is no time to recover.

Wrong thinking: "I'll just pick reasonable assumptions and go — the interviewer will correct me if I'm wrong."

Correct thinking: "I'll surface my key assumptions explicitly before designing, so the interviewer can align with me or redirect me early."

The difference is not just about being correct. It is about demonstrating the collaborative, investigation-first mindset that strong engineers bring to real product work.

Putting It All Together: The Anatomy in One View

Before moving on to the structured framework in the next section, let's consolidate the anatomy of a prompt into a single reference model you can carry mentally into any interview.

╔══════════════════════════════════════════════════════════════╗
║          ANATOMY OF A SYSTEM DESIGN PROMPT                  ║
╠══════════════════════════════════════════════════════════════╣
║  1. READ the three layers                                   ║
║     • Explicit   → What is literally stated?                ║
║     • Implied    → What does domain knowledge tell me?      ║
║     • Missing    → What MUST I ask before designing?        ║
╠══════════════════════════════════════════════════════════════╣
║  2. CATEGORIZE the missing content                          ║
║     • Functional  → WHAT does the system do?                ║
║     • Non-Functional → HOW WELL does it do it? (BSACCS)     ║
╠══════════════════════════════════════════════════════════════╣
║  3. PRIORITIZE your questions                               ║
║     • Which unknowns most affect the architecture?          ║
║     • Ask those first                                       ║
╠══════════════════════════════════════════════════════════════╣
║  4. SUMMARIZE before speaking                               ║
║     • "My understanding is... before I design, I'd like     ║
║       to clarify a few things..."                           ║
╚══════════════════════════════════════════════════════════════╝

📋 Quick Reference Card:

🔍 Layer 📝 What It Contains 🔧 How to Handle It
🟢 Explicit Literally stated features Accept and build on
🟡 Implied Domain-knowledge expectations Surface and confirm
🔴 Missing Scale, quality, scope unknowns Ask before designing
⚙️ Functional System behaviors and features List and prioritize
📊 Non-Functional Performance, availability, scale Apply BSACCS framework

The anatomy framework is not just a technique for interviews — it is a model for how careful engineers approach any new system they are asked to build. The interview context makes the skill visible and evaluable, but the underlying habit of reading problems deeply before solving them is what separates engineers who build the right thing from engineers who build things right.

In the next section, we will take this analytical foundation and build a structured, repeatable questioning framework on top of it — giving you a conversational playbook for transforming any ambiguous prompt into a well-scoped design brief.

A Structured Framework for Gathering Requirements

Every system design interview begins the same way: an interviewer hands you a deceptively simple prompt — "Design Twitter" or "Build a URL shortener" — and then waits. What happens in the next five to eight minutes will shape every architectural decision that follows. A structured requirements gathering framework gives you a repeatable, confident way to navigate this opening phase regardless of what problem lands in front of you.

Think of this framework as a lens that brings a blurry photograph into focus. Without it, you're guessing at what to build. With it, you can rapidly orient yourself, align with the interviewer, and lay a foundation that makes every subsequent design choice feel intentional rather than accidental.


The Two-Axis Model: Functional vs. Non-Functional Requirements

The first thing to internalize is that all requirements live on one of two axes. This two-axis model is the organizing principle behind every question you'll ask.

Functional requirements describe what the system must do — the behaviors, features, and user-facing capabilities. When a user opens Instagram and sees a feed, posts a photo, or searches for an account, those are functional capabilities. They answer the question: "What does this system need to accomplish?"

Non-functional requirements describe how the system must perform — the qualities and constraints that govern its operation. Latency, availability, consistency, throughput, and security all live here. They answer the question: "Under what conditions must this system operate, and how well must it do so?"

REQUIREMENTS
     │
     ├── FUNCTIONAL (What?)
     │        ├── Core user actions
     │        ├── Data operations (read/write/delete)
     │        └── Business rules & workflows
     │
     └── NON-FUNCTIONAL (How?)
              ├── Scale (users, requests/sec, data volume)
              ├── Latency targets
              ├── Availability & reliability (SLA)
              ├── Consistency model
              └── Security & compliance

🎯 Key Principle: Always separate functional from non-functional requirements before diving into design. Conflating them leads to architectural confusion — you might over-engineer a simple CRUD feature while under-engineering a latency-sensitive query path.

Why does this separation matter so much? Because functional and non-functional requirements drive fundamentally different design choices. A system that must support 10 users and a system that must support 10 million users might have identical functionality on paper but need completely different architectures underneath. The two-axis model forces you to surface both dimensions explicitly.

We'll explore each axis in depth in the lessons that follow. For now, the key habit to build is this: before you draw a single box on the whiteboard, make sure you've gathered at least three to five clear entries on both axes.


Opening with Clarifying Questions: Users, Use Cases, and Workflows

Before you touch technical constraints, you need to understand who you're building for and what they're trying to accomplish. This is a discipline that separates senior engineers from junior ones: senior engineers know that the right answer to a technical question depends entirely on the human context surrounding it.

Begin every requirements conversation with three grounding questions:

  1. Who are the users? (and are there multiple types?)
  2. What are the primary use cases? (what jobs does this system do for them?)
  3. What does the core workflow look like end-to-end?

Let's make this concrete. Suppose you're given the prompt: "Design a notification system."

Wrong thinking: Jump straight to "We'll need Kafka for the message queue, and Redis for deduplication..."

Correct thinking: "Before I go further, let me clarify a few things. Who sends these notifications — the platform itself, third-party services, or both? Who receives them — end users, internal teams, or other systems? And are we talking about real-time in-app alerts, email, SMS, or all of the above?"

Those three clarifying questions could completely change your design. A notification system for internal ops teams is architecturally different from a consumer push notification system handling 50 million devices. You can't know which you're building until you ask.

💡 Pro Tip: Frame your clarifying questions as curiosity, not interrogation. Say "I want to make sure I'm building the right thing — can I ask a few quick questions about the users and use cases?" Interviewers interpret this as maturity, not stalling.

Once you understand the users and use cases, move to primary workflows — the step-by-step sequences of actions the system must support. Workflows reveal the real data flow and expose hidden requirements that use cases alone miss.

For a ride-sharing system, the core workflow might look like this:

USER WORKFLOW: Rider requests a ride

1. Rider opens app → views map
2. Rider inputs destination → sees fare estimate  
3. Rider confirms request → system matches driver
4. Driver accepts → real-time location updates begin
5. Ride completes → payment processed → rating prompted

HIDDEN REQUIREMENTS SURFACED:
- Real-time location streaming (non-functional: low latency)
- Matching algorithm (functional: driver-rider pairing)
- Payment integration (functional: third-party dependency)
- Rating storage (functional: write after every trip)

Notice what happened: walking through one workflow surfaced five distinct system requirements that a use-case-level description ("riders can request rides") would have buried.


The 'Who, What, at What Scale' Pattern

You need a fast, repeatable pattern for scoping any system in the first two to three minutes of an interview. The Who, What, at What Scale pattern does exactly that. Think of it as a rapid triangulation — three coordinates that together locate the system in design space.

🧠 Mnemonic: W-W-SWho uses it, What must it do, at What Scale must it operate.

Here's how to apply each dimension:

Who

Identify user types and their distinct interaction patterns. Are there multiple roles (e.g., readers vs. writers, admins vs. end users, internal services vs. external clients)? Different user types often imply different access patterns, which can cascade into different data models and API designs.

What

Narrow the feature scope ruthlessly. In a 45-minute interview, you cannot design all of Facebook. Ask: "If we had to pick the three most critical features for this system, what would they be?" This question is not an admission of ignorance — it's a signal that you understand real engineering involves tradeoffs.

At What Scale

Quantify the scale. Push for numbers. Rough estimates are fine; the goal is to understand the order of magnitude:

  • How many daily active users?
  • What's the read-to-write ratio?
  • How much data do we need to store per year?
  • What are the peak traffic patterns?

Here's a practical Python snippet that illustrates the kind of back-of-the-envelope calculation you should be doing mentally (or verbally) during this phase:

## Back-of-envelope scale estimation
## Example: Estimating storage needs for a photo-sharing service

DAILY_ACTIVE_USERS = 50_000_000       # 50M DAU
PHOTOS_UPLOADED_PER_USER_PER_DAY = 2  # average
AVG_PHOTO_SIZE_MB = 3                 # compressed
RETENTION_YEARS = 5

## Daily storage requirement
daily_photos = DAILY_ACTIVE_USERS * PHOTOS_UPLOADED_PER_USER_PER_DAY
daily_storage_mb = daily_photos * AVG_PHOTO_SIZE_MB
daily_storage_tb = daily_storage_mb / (1024 ** 2)  # Convert MB to TB

## Total storage over retention period
total_storage_tb = daily_storage_tb * 365 * RETENTION_YEARS

print(f"Daily photos uploaded: {daily_photos:,}")
print(f"Daily storage needed: {daily_storage_tb:.1f} TB")
print(f"5-year storage needed: {total_storage_tb:,.0f} TB")

## Output:
## Daily photos uploaded: 100,000,000
## Daily storage needed: 286.1 TB
## 5-year storage needed: 523,300 TB (~500 PB)

This calculation tells you immediately that you're in petabyte territory — which means you need distributed object storage (like S3), a CDN, and a metadata database that can handle hundreds of millions of records. None of those architectural choices are obvious from the prompt alone; they emerge from asking "at what scale?"

💡 Real-World Example: When engineers at Netflix design features, they think in terms of streaming hours per day (billions), not just user counts. The unit of measurement matters as much as the number. Always ask: what's the right unit for this system's load — requests/sec, messages/sec, bytes/sec, or concurrent connections?


Timeboxing the Requirements Phase

Here is one of the most commonly violated rules of system design interviews: candidates spend too long on requirements and run out of time to design.

The requirements phase should last five to eight minutes. No more. This sounds short, but it's actually plenty of time to ask focused questions, capture the answers, and establish a shared scope — if you're using a structured framework rather than meandering through questions as they occur to you.

TYPICAL 45-MINUTE INTERVIEW TIME BUDGET

  0:00 ──── Requirements Gathering ────────── 0:07
  0:07 ──── High-Level Architecture ─────────── 0:15
  0:15 ──── Deep Dive (2-3 components) ───────────────── 0:35
  0:35 ──── Trade-offs & Scaling Discussion ──── 0:43
  0:43 ──── Q&A / Wrap-up ─── 0:45

  Requirements = ~15% of total time

⚠️ Common Mistake: Spending 15-20 minutes in requirements because you're asking redundant questions or diving into implementation details prematurely. If you catch yourself discussing database indexes or caching strategies during requirements gathering, you've drifted. Pull back.

The way to stay within the timebox is to use the W-W-S pattern as a checklist, not a free-form conversation. Ask your "who" questions (one to two minutes), your "what" questions (two to three minutes), and your "scale" questions (one to two minutes), then summarize and move on.

A useful internal signal: when you have enough information to draw the system's top-level components on a whiteboard, you have enough requirements. You don't need to know every edge case before you can sketch a rough architecture.

💡 Pro Tip: Use a soft close to exit the requirements phase gracefully. Say: "Based on what we've discussed, here's what I'm going to design..." and read back a brief summary. This invites the interviewer to correct any misunderstandings before you invest time in a design, and it signals that you're moving forward with intention.


Writing Requirements Down Visibly

This is the most underrated tactic in requirements gathering, and it costs you nothing: write your requirements down on the whiteboard (or shared document) in real time, as the interviewer provides information.

Why does this matter?

First, it signals organization. An interviewer watching you capture requirements in a structured list sees someone who works methodically — a quality they want in a colleague making architectural decisions.

Second, it creates a shared reference. Once requirements are written down, both you and the interviewer are looking at the same thing. This prevents the conversation from drifting and makes it easy to point back to a requirement when justifying a design choice later. ("We said we need 99.9% availability — that's why I'm recommending active-active replication here.")

Third, it reduces cognitive load. Holding five to ten requirements in working memory while also thinking about architecture is hard. Offloading them to the board frees mental bandwidth for the design work ahead.

Here's a template you can use to structure your visible requirements list:

=== REQUIREMENTS ===

FUNCTIONAL
  [F1] Users can upload videos (max 1GB each)
  [F2] Users can stream videos with adaptive bitrate
  [F3] System recommends related videos per session
  [F4] Creators can view basic analytics (views, watch time)

NON-FUNCTIONAL  
  [N1] Scale: 5M DAU, 500K uploads/day, 100M streams/day
  [N2] Latency: <200ms for video start time
  [N3] Availability: 99.99% uptime
  [N4] Storage: ~50 PB total (5-year horizon)
  [N5] Consistency: eventual consistency acceptable for view counts

OUT OF SCOPE
  - Live streaming
  - Monetization / ads
  - Comments and community features

Note the Out of Scope section. Explicitly documenting what you're not building is as valuable as documenting what you are building. It prevents scope creep during the design phase and shows the interviewer that you're making deliberate tradeoffs, not forgetting features.

The labeling system (F1, N1) pays dividends later. When you say "the reason I'm choosing Cassandra here ties back to N1 and N5" — the high scale and eventual consistency tolerance — you've instantly elevated your explanation from anecdote to architecture.

## Pseudocode representation of requirements as a structured object
## This mirrors the mental model you should maintain throughout the interview

requirements = {
    "functional": [
        {"id": "F1", "description": "Users can upload videos up to 1GB"},
        {"id": "F2", "description": "Users can stream with adaptive bitrate"},
        {"id": "F3", "description": "System recommends related videos"},
        {"id": "F4", "description": "Creators view basic analytics"},
    ],
    "non_functional": [
        {"id": "N1", "metric": "scale",       "value": "5M DAU, 100M streams/day"},
        {"id": "N2", "metric": "latency",     "value": "<200ms video start time"},
        {"id": "N3", "metric": "availability", "value": "99.99% uptime"},
        {"id": "N4", "metric": "storage",     "value": "~50 PB over 5 years"},
        {"id": "N5", "metric": "consistency", "value": "eventual OK for counts"},
    ],
    "out_of_scope": [
        "Live streaming",
        "Monetization and ads",
        "Comments and community"
    ]
}

## During the design phase, always trace your decisions back to these IDs
def justify_decision(component: str, requirement_ids: list[str]) -> str:
    return f"{component} is chosen to satisfy {', '.join(requirement_ids)}"

print(justify_decision("Cassandra", ["N1", "N5"]))
## → "Cassandra is chosen to satisfy N1, N5"

This code isn't something you'd run in an interview — it's a mental model made executable. The discipline of linking every architectural choice to a documented requirement is what transforms a system sketch into a reasoned design.


Putting the Framework Together

Here's the full framework assembled into a step-by-step sequence you can memorize and apply:

REQUIREMENTS GATHERING FRAMEWORK (5-8 minutes)

STEP 1 — WHO (1-2 min)
  Ask: Who are the users? Are there multiple user types?
  Capture: User roles and their distinct behaviors

STEP 2 — WHAT (2-3 min)  
  Ask: What are the 3 core features? What can we scope out?
  Capture: Functional requirements list (F1, F2, F3...)

STEP 3 — AT WHAT SCALE (1-2 min)
  Ask: DAU? Read/write ratio? Data volume? Peak patterns?
  Capture: Non-functional requirements list (N1, N2, N3...)

STEP 4 — SUMMARIZE & CONFIRM (30 sec)
  Read back your requirements list.
  Ask: "Does this match what you had in mind?"

STEP 5 — TRANSITION
  "Great — let me start with a high-level architecture."

📋 Quick Reference Card:

🎯 Phase 🔧 Key Questions 📚 Output
🧑 Who Who uses it? Multiple roles? User type list
⚙️ What Core features? What's out of scope? Functional req list
📈 Scale DAU? RPS? Data size? Latency target? Non-functional req list
✅ Confirm Does this match your intent? Shared alignment

🤔 Did you know? Studies of software project failures consistently identify unclear or missing requirements as a top cause of overruns and rework. The same dynamic plays out in miniature in every system design interview — candidates who skip requirements gathering often paint themselves into an architectural corner they have to awkwardly redesign mid-interview.

The framework above isn't magic — it's a scaffold. Your questions will vary with every problem, and experienced interviewers appreciate when you adapt intelligently rather than robotically reciting a checklist. But the underlying structure — who, what, scale, confirm — gives you a reliable starting point that keeps you oriented when the problem feels overwhelming.

⚠️ Common Mistake: Treating the requirements phase as a formality to rush through before the "real" work begins. Requirements gathering is real work. It's the work of understanding the problem before solving it. Interviewers evaluate your judgment throughout this phase — how precisely you ask questions, how quickly you synthesize answers, and how well you separate essential from incidental complexity.

By the time you leave the requirements phase, you and the interviewer should be looking at the same clearly labeled list of functional and non-functional requirements, with out-of-scope items explicitly noted. That list becomes the contract your design must satisfy — and the foundation you'll refer back to every time you make an architectural tradeoff in the minutes that follow.

Applying the Framework: Live Walkthroughs with Example Problems

Knowing a framework in theory is one thing. Watching it breathe in a real interview is another. In this section we put the requirements gathering framework to work on two contrasting prompts — a ride-sharing service and a distributed key-value store — so you can see how the same disciplined questioning process produces radically different but precisely right scoping decisions for each problem. Pay close attention not just to which questions get asked, but to how each answer from the interviewer unlocks the next question. Requirements gathering is a conversation, not a checklist.


Walkthrough One: Designing a Ride-Sharing Service Like Uber

The interviewer opens with the classic prompt:

"Design a ride-sharing service like Uber."

Nine words. They describe an entire company. Before you write a single box on a whiteboard, your job is to turn that sentence into a blueprint with real constraints. Here is how the conversation unfolds using the structured framework.

Step 1 — Clarify the Core Use Case

Your first move is to identify which slice of the product you are actually designing. Uber contains dozens of systems: driver onboarding, payment processing, fraud detection, maps rendering, surge pricing, customer support tooling, and more.

You: "When you say ride-sharing service, should I focus on the core booking and matching flow — rider requests a ride, driver accepts, both track the trip in real-time — or do you also want me to cover payment, driver onboarding, and incentive systems?"

Interviewer: "Focus on the core trip lifecycle: requesting, matching, and real-time tracking. Payments you can treat as a black box."

Now you have a bounded problem. You have not wasted twenty minutes designing a billing system they do not care about.

Step 2 — Probe for Scale

Scale is the single most design-altering dimension in system design. An Uber-sized system and a regional taxi app sharing the same name have almost nothing in common architecturally.

You: "What scale are we targeting? Should I think about Uber's actual global scale, or a startup at launch — say a single city?"

Interviewer: "Think Uber-scale. Roughly 20 million trips per day globally."

Now you can do back-of-envelope estimation in your head:

20,000,000 trips/day
÷ 86,400 seconds/day
≈ 231 trips/second (average)

Peak factor ~5x → ~1,150 trips/second at peak

Driver location updates (every 5 seconds per active driver):
  ~5 million active drivers at peak
  → 1,000,000 location writes/second

This single number — a million location writes per second — instantly tells you that a naive relational database will collapse and that you need a purpose-built geospatial write-heavy store. You would not have known that without asking.

Step 3 — Surface Real-Time Constraints

You: "How fresh does the driver's location need to be on the rider's screen? And how quickly does matching need to happen after a rider requests?"

Interviewer: "Location should feel live — maybe a 3–5 second update interval is fine. Matching should happen within 30 seconds ideally, though a few seconds either way is acceptable."

This answer surfaces two critical latency requirements: a soft real-time constraint on location (periodic polling or WebSockets acceptable) and a near-real-time constraint on matching (seconds, not milliseconds — not a hard real-time system). You now know you do not need sub-millisecond infrastructure, but you absolutely cannot batch matching in five-minute jobs.

Step 4 — Establish Geographic Scope and Multi-Region Needs

You: "Is this a single-region deployment, or do we need to support multiple geographic regions with data residency requirements?"

Interviewer: "Global. Assume we operate in 70 countries. But data residency and compliance details can be simplified — don't go deep on GDPR."

Now you know you need a multi-region architecture with at least some geographic routing, but you can park regulatory complexity.

Step 5 — Confirm Non-Functional Requirements

You: "What are the availability expectations? Is it acceptable for the service to have brief downtime, or are we targeting five-nines?"

Interviewer: "High availability is important — 99.9% is fine for this discussion. We don't need to over-engineer for five-nines today."

💡 Pro Tip: 99.9% availability = ~8.7 hours of downtime per year. 99.999% = ~5 minutes. Asking this question saves you from designing an astronomically complex consensus system when a simpler one will do.

Step 6 — Translate Everything into a Written Summary

Before drawing a single line, state your understanding out loud and write it in a corner of the whiteboard. This is your requirements summary statement.

=== Ride-Sharing Requirements Summary ===
Scope    : Trip lifecycle — request, match, real-time tracking
Scale    : 20M trips/day; ~1M driver location writes/second at peak
Latency  : Location update ≤ 5s; match completion ≤ 30s
Geo      : Global (70 countries), multi-region, GDPR out of scope
Avail.   : 99.9% uptime target
Out of   : Payments, driver onboarding, incentives, support tools
Scope

Then you say: "Does this capture what you're looking for?" The interviewer confirms or corrects. Now you design.

Here is a simplified ASCII map of what the requirements gathering conversation just determined:

Prompt: "Design Uber"
         |
         v
 +-------------------+
 | Clarify Use Case  |  --> Trip lifecycle only
 +-------------------+
         |
         v
 +-------------------+
 |  Probe for Scale  |  --> 20M trips/day, 1M loc writes/sec
 +-------------------+
         |
         v
 +-------------------+
 | Real-Time Needs   |  --> 5s location, 30s match
 +-------------------+
         |
         v
 +-------------------+
 | Geographic Scope  |  --> Global, multi-region
 +-------------------+
         |
         v
 +-------------------+
 | Availability SLA  |  --> 99.9%
 +-------------------+
         |
         v
 [Requirements Summary Statement]
         |
         v
   BEGIN DESIGN ✓

Walkthrough Two: Designing a Distributed Key-Value Store

Now the interviewer shifts gears entirely:

"Design a distributed key-value store."

This prompt is deceptively different from the Uber prompt. It describes infrastructure, not a user-facing product. The questions you ask — and the design decisions that hinge on the answers — are almost entirely different. Watch how the framework adapts.

Step 1 — Identify the Primary Access Pattern

A key-value store can be read-heavy (like a cache), write-heavy (like a logging sink), or balanced. This matters more than almost any other dimension.

You: "Is this store read-heavy, write-heavy, or roughly balanced? And what's the expected read-to-write ratio?"

Interviewer: "Mostly read-heavy. Think something like 80% reads, 20% writes."

An 80/20 read-to-write split immediately suggests that read replicas, eventual consistency, and caching layers are reasonable design choices. If the answer had been 90% writes, you would be optimizing for write throughput and thinking about LSM trees and write-ahead logs instead.

Step 2 — Uncover Consistency Requirements

Consistency is the central battleground of distributed systems design, and this is where many candidates make their biggest mistake: they assume strong consistency without asking.

You: "What consistency model does the client expect? Strong consistency — where every read sees the latest write — or is eventual consistency acceptable, where replicas may lag briefly?"

Interviewer: "Eventual consistency is fine. The use case is configuration data for microservices. A few seconds of lag is acceptable."

This single answer unlocks a massive simplification. You can now use a leaderless replication model with quorum reads/writes or a leader-based model with async replication — both far simpler to operate than a linearizable consensus system. If the interviewer had said "strong consistency, it's for financial records," your entire architecture would need to shift toward Raft or Paxos-based consensus.

🎯 Key Principle: The consistency model is the single most architecture-altering requirement in a distributed storage system. Never assume it — always ask.

Step 3 — Establish Data Size and Retention

You: "What's the expected size of individual values? Small blobs like JSON config, or potentially large objects like images or videos?"

Interviewer: "Small values. Keys are strings up to 256 bytes. Values are JSON blobs, typically under 1 KB, occasionally up to 1 MB."

You: "And how much data in total? Is this terabytes, petabytes?"

Interviewer: "Assume 1 TB of data at steady state, growing slowly."

Now you can size your storage nodes. 1 TB of data is very manageable — you might fit it on three or four commodity nodes with replication. If the answer had been 1 petabyte, you would need to think about sharding strategies from the outset.

## Quick sizing estimation you might jot in notes during the interview

data_size_tb = 1  # total dataset
replication_factor = 3  # standard for fault tolerance
node_storage_tb = 2  # usable storage per node (with OS overhead)

nodes_needed = (data_size_tb * replication_factor) / node_storage_tb
print(f"Minimum storage nodes: {int(nodes_needed)} nodes")
## Output: Minimum storage nodes: 2 nodes
## (In practice, you'd add capacity buffer → 5-6 nodes)

## Read throughput estimation
read_qps = 100_000  # 100K reads/second
read_per_node = 20_000  # conservative single-node read capacity
nodes_for_reads = read_qps / read_per_node
print(f"Nodes needed for read throughput: {int(nodes_for_reads)} nodes")
## Output: Nodes needed for read throughput: 5 nodes
## Read throughput is the binding constraint here, not storage

This snippet mirrors the kind of rough estimation you would do mentally or on a notepad during an interview. It reveals that read throughput, not storage capacity, is the binding constraint — a conclusion you reach only because you asked the right questions.

Step 4 — Ask About Failure Tolerance and Durability

You: "What are the durability requirements? If a node crashes mid-write, is data loss acceptable — like a cache — or must every write be persisted?"

Interviewer: "Durability is important. We can't lose committed writes."

You: "And availability expectations? Should the store remain available if a minority of nodes fail?"

Interviewer: "Yes, it should tolerate node failures. Assume we're fine with a few seconds of degraded performance but not full unavailability."

This tells you you need write-ahead logging (WAL) for durability and a replication strategy that ensures availability during partial failures — classic AP system territory on the CAP theorem spectrum, consistent with the eventual consistency answer from Step 2.

Step 5 — Surface Operational and API Requirements

You: "What operations does the API need to support? Just get and set, or also atomic operations like compare-and-swap, transactions, or TTL-based expiration?"

Interviewer: "Basic get, set, delete. TTL support would be nice. No transactions needed."

TTL (time-to-live) expiration is a small but consequential requirement — it means your store needs a background garbage collection or expiration sweep mechanism. Without asking, you might design a store that holds data forever and surprises the team when disk fills up.

Step 6 — Write the Summary Statement
=== Key-Value Store Requirements Summary ===
Access    : Read-heavy (80R / 20W); ~100K reads/sec
Values    : Small (≤ 1KB typical, ≤ 1MB max); keys ≤ 256 bytes
Data Size : 1 TB steady state, slow growth
Consist.  : Eventual consistency acceptable
Durabil.  : No data loss on commit; WAL required
Avail.    : Tolerates minority node failure; brief degradation OK
API       : GET, SET, DELETE + TTL expiration
Out of    : Transactions, strong consistency, multi-datacenter sync
Scope

💡 Pro Tip: Notice that the key-value store summary mentions almost nothing about real-time constraints or geography — the dimensions that dominated the Uber summary. This is the proof that requirements gathering is not a rigid checklist. It is a dialogue that surfaces what this specific system actually needs.


Requirements Gathering as Dialogue, Not Monologue

Compare the two walkthroughs above and you will notice something important: the questions follow the same framework categories (use case, scale, consistency, availability, operations), but the pivot logic after each answer is completely different.

In the Uber walkthrough, when the interviewer said "20 million trips per day," you did not continue asking abstract questions. You computed a derived requirement — a million location writes per second — and used that to inform your next question about geographic distribution. The interviewer's answer became fuel for the follow-up.

In the key-value walkthrough, when the interviewer said "eventual consistency is fine," that single answer closed an entire branch of questions (you no longer needed to ask about consensus group sizes, leader election timeouts, or linearizability guarantees) and opened another (you now asked about failure tolerance in the AP context).

This branching, responsive quality is what separates strong candidates from average ones. Here is a simplified model of how answer-driven pivoting works:

       Ask base question
              |
      +-------+-------+
      |               |
  Answer A         Answer B
  (e.g.,           (e.g.,
  strong            eventual
  consistency)      consistency)
      |               |
      v               v
 Follow-up:       Follow-up:
 "What is your    "How many seconds
  tolerance for    of lag are
  stale reads?"    acceptable?"
      |               |
      v               v
  Leads to          Leads to
  consensus         replication
  design            factor design

⚠️ Common Mistake: Treating requirements gathering as a fixed script you recite regardless of the interviewer's answers. If the interviewer tells you it is a single-city deployment, asking about multi-region data residency is wasted time that signals you are not listening.


Recognizing When You Have Enough to Proceed

One of the subtlest skills in this phase is knowing when to stop asking and start designing. There is no exact formula, but here is a practical test you can apply:

The Four-Quadrant Check — before moving to design, verify you can answer at least one concrete question in each quadrant:

Quadrant Example Question You Should Be Able to Answer
🎯 Functional What are the 2–3 core operations the system must support?
📊 Scale What is the peak request rate or data volume?
Latency/Consistency What is the acceptable response time or consistency model?
🔒 Availability/Durability What is the uptime SLA and data loss tolerance?

If any quadrant is completely blank, you need at least one more question. If all four have rough answers, you have enough to begin — even if those answers are estimates or assumptions.

💡 Mental Model: Think of yourself as a pilot doing a pre-flight checklist. You do not need to inspect every rivet, but you cannot take off with an unknown fuel level. The four quadrants are your minimum-safe checklist.

Here is a practical code-style representation of the decision logic you run in your head:

def ready_to_design(requirements: dict) -> bool:
    """
    Returns True when minimum requirements are established.
    This mirrors the mental checklist you run before leaving
    the requirements phase in an interview.
    """
    required_quadrants = [
        "functional_scope",    # What does the system do?
        "scale_estimate",      # How much load?
        "latency_or_consistency",  # How fast or how consistent?
        "availability_durability",  # How reliable?
    ]
    
    for quadrant in required_quadrants:
        if quadrant not in requirements or requirements[quadrant] is None:
            print(f"⚠️  Missing: {quadrant} — ask one more question")
            return False
    
    print("✅ Minimum requirements established — begin design")
    return True

## Ride-sharing example after walkthrough:
ride_sharing_reqs = {
    "functional_scope": "Trip lifecycle: request, match, track",
    "scale_estimate": "20M trips/day, 1M location writes/sec peak",
    "latency_or_consistency": "Location 5s; match <30s",
    "availability_durability": "99.9% uptime, durability not critical for location",
}

ready_to_design(ride_sharing_reqs)
## Output: ✅ Minimum requirements established — begin design

The function is deliberately simple — the point is not the code, it is the mental model. Run this check before you transition. If any quadrant returns a warning, ask the targeted follow-up. Then proceed.

🤔 Did you know? Studies of senior engineers in technical interviews show that the highest-rated candidates spend on average 5–8 minutes on requirements before touching the design — roughly 20–25% of a 35-minute system design session. Candidates who spend less than 2 minutes are rated significantly lower on "problem understanding," even when their designs are technically sound.


Putting Both Walkthroughs Side by Side

📋 Quick Reference Card: Uber vs. Key-Value Store Requirements Comparison

Dimension 🚗 Ride-Sharing 🗄️ Key-Value Store
🎯 Core use case Trip lifecycle GET / SET / DELETE + TTL
📊 Scale signal 1M location writes/sec 100K reads/sec, 1 TB data
⚡ Latency focus 5s location, 30s match Read latency < 10ms
🔄 Consistency Real-time match critical Eventual consistency fine
🌍 Geography Global, multi-region Single region
🔒 Durability Location loss OK; match critical No committed write loss
❌ Out of scope Payments, onboarding Transactions, strong consistency

This table crystallizes the key lesson: the same framework, applied to two different prompts, produces two completely different requirement profiles — and therefore two completely different designs. The framework is not a cookie-cutter template; it is a lens that focuses your questions on what actually matters for the system in front of you.


A Note on Interviewer Engagement During This Phase

Not every interviewer will give crisp, detailed answers. Some will say "use your best judgment" or "make reasonable assumptions." This is not a trap — it is a gift. When an interviewer defers to you, you state your assumption explicitly and move on:

"I'll assume we're targeting Uber's global scale — about 20 million trips per day — and design accordingly. If that's off, we can revisit. Does that sound reasonable?"

This demonstrates ownership, confidence, and the ability to drive a technical discussion — exactly the qualities a senior engineer must show. The worst response to "use your best judgment" is a long silence followed by a timid question. Name your assumption, validate it quickly, and proceed.

Correct thinking: State the assumption confidently, get a nod or correction, and move forward. Requirements gathering is complete when both you and the interviewer share a common understanding — written down — of what is being built.

Wrong thinking: Interpreting "use your best judgment" as permission to skip requirements entirely and jump straight to drawing boxes. Without an explicit summary, you and the interviewer may be imagining entirely different systems, and you will not discover that misalignment until it is too late.

When you have your written summary, have confirmed it with the interviewer, and have checked all four quadrants, you are ready to move into design. The work you did in these five to eight minutes will pay dividends for the rest of the session — every architectural decision you make will be anchored to a real constraint, not a guess.

Common Requirements Gathering Mistakes and How to Avoid Them

Even candidates who understand the importance of requirements gathering—and who have studied frameworks for doing it well—still fall into a predictable set of traps. These mistakes are not random. They cluster around a handful of recurring patterns, and once you can name them, you can catch yourself before they derail your interview. This section catalogs the five most damaging errors, explains the subtle psychology behind each one, and gives you concrete corrective techniques you can practice today.

Think of this section as a quality-control checklist. After you learn a requirements gathering framework, you need a second layer of awareness: the ability to audit your own questioning behavior in real time and steer away from the potholes that swallow otherwise strong candidates.


Mistake 1: Over-Questioning ⚠️

Over-questioning is the trap of asking so many questions—including low-value, obvious, or redundant ones—that the interviewer begins to perceive you as indecisive, anxious, or incapable of making judgment calls. Ironically, candidates who over-question are usually trying to demonstrate thoroughness. The effect is the opposite.

Imagine you are asked: "Design a URL shortener." An over-questioning candidate might ask:

  • "Should it support HTTPS?"
  • "Do users need to log in?"
  • "Should the short codes be exactly 6 characters?"
  • "What database should I use?"
  • "Should I use SQL or NoSQL?"
  • "Do you want me to design the front end too?"
  • "Should the links expire?"
  • "How many engineers will maintain this?"

Some of these questions have value. Many do not. Asking whether the system should support HTTPS in 2024 is like asking whether a car needs wheels. Asking which database to use is not a requirements question at all—it is a solution question that you should answer yourself based on the requirements. Asking about team size is irrelevant to system behavior.

The core problem with over-questioning is signal dilution. Every question you ask is a signal. High-value questions signal that you understand what matters. Low-value questions drown out your good questions and suggest you cannot distinguish important from trivial.

Over-questioning signal pattern:

[Good Q] → [Bad Q] → [Bad Q] → [Good Q] → [Bad Q]
                ↓
  Interviewer perception: "This person asks a lot."
  Good questions get averaged down.

Focused questioning signal pattern:

[Good Q] → [Good Q] → [Good Q]
                ↓
  Interviewer perception: "This person asks the right things."
The Corrective Technique: Filter Before You Ask

Before voicing a question, run it through a two-second mental filter:

  1. Would the answer change my design in a meaningful way? If yes, ask. If no, make a reasonable assumption and state it aloud.
  2. Is this a requirements question or a solution question? Requirements questions clarify what the system must do or handle. Solution questions are ones you should answer yourself.

💡 Pro Tip: Aim for 4–7 high-signal questions per requirements phase. If you find yourself drafting a tenth question, stop and assess whether you are circling around anxiety rather than genuine ambiguity.

🎯 Key Principle: Stating a reasonable assumption confidently is often more impressive than asking a question about it. "I'll assume short links expire after 30 days by default, since that's a common default—let me know if you'd like to adjust that" demonstrates judgment. Asking "Should links expire?" in isolation does not.


Mistake 2: Under-Questioning ⚠️

The mirror image of over-questioning, under-questioning occurs when a candidate makes large, unstated assumptions and proceeds directly into design without validating them with the interviewer. This is far more dangerous than over-questioning, because the damage is invisible until it is catastrophic.

A candidate who under-questions might hear "Design a notification system" and immediately begin drawing a message queue architecture for push notifications—never having confirmed whether the system needs to handle email, SMS, in-app alerts, or all three. Fifteen minutes later, when the interviewer asks "How would this handle SMS rate limiting from carriers?", the candidate realizes their entire design assumed push notifications only.

❌ Wrong thinking: "I don't want to waste time on questions. I'll just start designing and adjust if needed."

✅ Correct thinking: "Two minutes of well-chosen questions prevents twenty minutes of redesign and signals that I think before I build."

The insidious thing about under-questioning is that candidates who do it often feel confident. They mistake speed for competence. In a real engineering role, barreling into implementation without requirements alignment is how teams build the wrong thing for months.

The Corrective Technique: The MECE Question Audit

MECE (Mutually Exclusive, Collectively Exhaustive) is a thinking tool from consulting that applies beautifully here. Before you stop asking questions, mentally check that your questions have covered each major dimension of the system without overlapping each other:

MECE Requirements Coverage Check
─────────────────────────────────────────────────────
Dimension           Covered?   Example question asked
─────────────────────────────────────────────────────
Core functionality  ✅         What are the primary use cases?
Scale / traffic     ✅         What's the expected request volume?
User context        ✅         Who are the users and where are they?
Consistency / SLA   ❌         (forgot to ask!)
Data retention      ✅         How long must data be stored?
─────────────────────────────────────────────────────
→ Gap identified: add one SLA question before proceeding.

This audit takes about 15 seconds once internalized. Run it mentally after your initial questions and before you summarize requirements.

💡 Mental Model: Think of requirements as the foundation of a building. You would not build on a foundation with a gap in it just because the rest looks solid. One missing dimension can make the whole structure unstable.


Mistake 3: Asking Yes/No Questions Instead of Open-Ended Discovery Questions ⚠️

Closed questions—those that can be answered with yes, no, or a single number—are efficient in daily conversation but deeply limiting in a system design interview. They constrain the interviewer's response to a binary or scalar value, stripping away context, nuance, and the richer information that helps you make better design decisions.

Consider the difference:

Closed Question Open-Ended Question
"Do we need high availability?" "What are the availability expectations, and what happens to the business if the system is down for an hour?"
"Is this read-heavy?" "How do read and write patterns compare, and are there spikes at particular times?"
"Should we support mobile?" "Who are the primary users, and how do they typically access the system?"
"Do we need search?" "How do users find content within the system?"

The closed question about high availability will get you a "yes" and nothing more. The open-ended version might surface that the system is internal-only, that downtime of less than 4 hours is acceptable, and that there's a nightly batch job that must complete before market open—all information that shapes your architecture.

🤔 Did you know? Research on interview techniques consistently shows that open-ended questions yield 3–5x more usable information than closed questions on the same topic. The same principle applies when you are the interviewer of your system's requirements.

The Corrective Technique: Start with "What" and "How"

The simplest way to avoid yes/no questions is to begin with "What" or "How" instead of "Do", "Is", "Should", or "Can".

Question Rewriting Formula

Closed pattern:    [Do/Is/Should/Can] + [subject] + [predicate]?
Open pattern:      [What/How] + [subject] + [context/constraint]?

Examples:

❌  "Should the system be globally distributed?"
✅  "What does the geographic distribution of users look like,
     and are there regulatory constraints on where data lives?"

❌  "Do we need real-time updates?"
✅  "How quickly does data need to reach the end user after
     an event occurs, and what's the impact of a 30-second delay?"

❌  "Is consistency more important than availability?"
✅  "What happens to a user or the business if they see
     stale data for a few seconds? Is there a scenario where
     that's unacceptable?"

Notice that the open-ended versions do not just swap one word. They invite the interviewer to share business context, which is what separates a requirements conversation from a trivia quiz.

💡 Pro Tip: When you ask an open-ended question and get a partial answer, you can follow up with a brief clarifying closed question. Use closed questions as follow-ups, not as openers.


Mistake 4: Conflating Requirements with Solutions ⚠️

This mistake is subtler than the others and often goes unnoticed by the candidate even as it unfolds. Requirements describe what the system must accomplish—behaviors, constraints, and outcomes from the user's perspective. Solutions describe how the system will accomplish them—technologies, algorithms, and architectural patterns.

When candidates conflate the two, they start describing implementation details during the requirements phase, before they have enough information to justify those decisions. This is problematic for two reasons. First, it closes off better alternatives prematurely. Second, it signals to the interviewer that the candidate does not separate problem space from solution space—a critical engineering discipline.

❌ Wrong thinking (during requirements): "We'll need Kafka for the event streaming and a Redis cache in front of the database."

✅ Correct thinking (during requirements): "We need to understand how many events the system generates per second and what latency is acceptable before a consumer processes them."

Here is a practical illustration. Suppose you are designing a ride-sharing system and you say during requirements: "We'll use WebSockets to push driver location updates to riders." You have jumped to a solution (WebSockets) before establishing the requirement (how frequently location must update, and how many concurrent riders are watching a given driver). Maybe polling every 2 seconds is sufficient. Maybe the interviewer was planning to discuss this trade-off with you as a key design decision. By announcing the solution in the requirements phase, you short-circuit both possibilities.

Requirement vs. Solution: A Diagnostic Framework

Ask yourself: "Does this statement describe WHAT the system
does, or HOW it does it?"

                    +------------------+
       REQUIREMENT  |  The system must  |  → Belongs in requirements phase
                    |  deliver messages  |
                    |  within 1 second  |
                    |  of being sent.   |
                    +------------------+
                             |
                             ↓
                    +------------------+
          SOLUTION  |  We'll use Kafka  |  → Belongs in design phase
                    |  with consumer    |
                    |  groups and a     |
                    |  Redis TTL cache. |
                    +------------------+

If your "requirement" sentence includes a product name,
a protocol, or an algorithm, it is probably a solution.
The Corrective Technique: The "So That" Test

If you catch yourself describing a technology or mechanism during requirements, apply the "So That" test: restate the requirement in the form "The system must [behavior] so that [user or business outcome]." If you cannot complete the sentence naturally, you are probably describing a solution, not a requirement.

  • ❌ "We need a CDN." → Cannot complete the test cleanly.
  • ✅ "The system must serve static assets with low latency so that users in Asia experience response times under 200ms." → Clean requirement. CDN is a potential solution to it.

🎯 Key Principle: Lock the what before you touch the how. The requirements phase produces a problem statement. The design phase produces a solution. Mixing them produces confusion.


Mistake 5: Failing to Revisit and Adjust Requirements During the Design Phase ⚠️

Requirements gathering is not a one-time event that closes the moment you pick up a marker and start drawing boxes. It is an ongoing dialogue. As your design evolves, you will encounter moments where a constraint you accepted becomes problematic, a scale number you estimated feels wrong, or a use case you did not consider becomes relevant. Strong candidates treat these moments as opportunities. Weak candidates either ignore them (and produce a flawed design) or panic (and abandon structure entirely).

The failure mode here looks like this: you establish requirements, summarize them, and then treat that summary as immutable. Twenty minutes later, you realize your design cannot meet the latency requirement you agreed on, but instead of surfacing this, you quietly adjust the design in a way that violates it—hoping the interviewer does not notice. They always notice.

💡 Real-World Example: In a real engineering environment, requirements change. Product managers learn new information from users. Business priorities shift. Engineers discover technical constraints that make certain requirements infeasible at the stated cost. The ability to flag these conflicts, negotiate adjustments, and re-align the team is a core professional skill. System design interviews specifically test whether you have this skill.

The Corrective Technique: The Requirements Checkpoint

Build explicit requirements checkpoints into your design narration. These are brief moments—10 to 20 seconds—where you surface a tension between your emerging design and a previously stated requirement, propose a resolution, and confirm alignment before proceeding.

The language for a requirements checkpoint sounds like this:

"I want to flag something. Earlier we said the system needs to return search results in under 100ms. As I'm designing the indexing layer, I'm realizing that across the data volume we discussed, achieving that consistently would require significant infrastructure investment. I'd like to propose relaxing that to 250ms for the 99th percentile while keeping sub-100ms for the median. Does that work, or is 100ms a hard constraint?"

This single move does several impressive things simultaneously:

  • 🧠 It demonstrates that you are tracking the relationship between requirements and design in real time.
  • 📚 It shows you understand the cost implications of requirements.
  • 🔧 It shows you can negotiate trade-offs rather than just accept or ignore constraints.
  • 🎯 It keeps the interviewer engaged as a collaborator rather than an audience member.

Here is a simple code-comment pattern that mirrors this thinking in real engineering practice—writing requirements as assertions that the design must satisfy:

## Requirements as executable constraints (illustrative pattern)
## In system design interviews, this mindset keeps you honest.

class SystemRequirements:
    """
    Captures the agreed requirements as named constants.
    Reference these when making design decisions to surface
    conflicts early rather than late.
    """
    # Functional requirements
    MAX_URL_LENGTH = 2048          # characters
    SHORT_CODE_UNIQUENESS = True   # no collisions allowed
    LINK_EXPIRY_DAYS = 30          # default TTL

    # Non-functional requirements
    READ_LATENCY_P99_MS = 50       # milliseconds
    WRITE_LATENCY_P99_MS = 200     # milliseconds
    AVAILABILITY_SLA = 99.9        # percentage
    DAILY_ACTIVE_USERS = 50_000_000

    # Derived scale estimates (calculated from requirements)
    READS_PER_SECOND = DAILY_ACTIVE_USERS * 10 // 86400  # ~5,787 RPS
    WRITES_PER_SECOND = DAILY_ACTIVE_USERS * 1 // 86400  # ~579 RPS

    @classmethod
    def validate_design_decision(cls, decision_name: str, latency_ms: int):
        """
        Simulates a checkpoint: raises if a design decision
        would violate a latency requirement.
        """
        if latency_ms > cls.READ_LATENCY_P99_MS:
            raise ValueError(
                f"Design decision '{decision_name}' projects {latency_ms}ms "
                f"but requirement is {cls.READ_LATENCY_P99_MS}ms. "
                f"Revisit or renegotiate the requirement."
            )
        return True

## Example: catching a conflict during design
try:
    SystemRequirements.validate_design_decision(
        decision_name="synchronous DB lookup without cache",
        latency_ms=180  # realistic without caching
    )
except ValueError as e:
    print(f"⚠️ Requirements conflict detected: {e}")
    # This is your cue to surface the conflict to the interviewer
    # and either add a cache or renegotiate the latency SLA.

This code is not something you would write during an interview—it is a thinking tool. The mental habit it encodes is: name your requirements explicitly, reference them when making decisions, and surface conflicts immediately rather than designing around them silently.

Here is a second example showing how requirements translate into capacity estimates, a calculation you would perform early and then reference throughout:

## Capacity estimation from requirements
## Run this kind of calculation after requirements, before design.

def estimate_storage_requirements(
    daily_active_users: int,
    writes_per_user_per_day: int,
    avg_record_size_bytes: int,
    retention_days: int
) -> dict:
    """
    Translates scale requirements into concrete storage numbers.
    Surfaces whether requirements are feasible before design begins.
    """
    writes_per_day = daily_active_users * writes_per_user_per_day
    bytes_per_day = writes_per_day * avg_record_size_bytes
    total_bytes = bytes_per_day * retention_days

    return {
        "writes_per_day": writes_per_day,
        "gb_per_day": bytes_per_day / (1024 ** 3),
        "total_tb": total_bytes / (1024 ** 4),
        "feasibility_note": (
            "Standard single-node storage" if total_bytes < 1e12
            else "Requires distributed storage solution"
        )
    }

## URL shortener example with agreed requirements
result = estimate_storage_requirements(
    daily_active_users=50_000_000,
    writes_per_user_per_day=1,       # each user creates ~1 link/day
    avg_record_size_bytes=500,       # URL + metadata
    retention_days=365
)

for key, value in result.items():
    print(f"{key}: {value}")

## Output:
## writes_per_day: 50000000
## gb_per_day: 23.28
## total_tb: 8.52
## feasibility_note: Requires distributed storage solution
##
## → This surfaces immediately that a single Postgres instance
##   won't meet requirements. Raise this before designing storage.

Running (or narrating) this kind of estimate early in the design phase is a form of requirements validation. If the numbers surprise you or the interviewer, that is a requirements checkpoint moment—not a crisis.


Putting It All Together: A Mistake Recognition Map

Use this reference during practice sessions to self-diagnose which mistake you are making in the moment:

                      REQUIREMENTS PHASE MISTAKE MAP

  Are you asking questions?                    
         │
    ┌────┴────┐
   YES       NO ──────────────────────────────────→ Under-questioning ⚠️
    │                                               (Add MECE audit)
    │
  Are your questions open-ended?
    │
  ┌─┴──────┐
 YES       NO ────────────────────────────────────→ Closed questions ⚠️
  │                                                 (Rewrite with What/How)
  │
  Are you asking fewer than 8 questions?
  │
 ┌┴─────────┐
YES         NO ──────────────────────────────────→ Over-questioning ⚠️
 │                                                  (Filter: would answer
 │                                                   change my design?)
 │
 Are questions about behavior, not technology?
 │
┌┴──────────┐
YES         NO ──────────────────────────────────→ Conflating req/sol ⚠️
 │                                                  (Apply "So That" test)
 │
 Are you surfacing req conflicts during design?
 │
┌┴───────────┐
YES          NO ─────────────────────────────────→ No checkpoints ⚠️
 │                                                  (Add checkpoint language)
 │
 ✅ Strong requirements gathering

A Note on Meta-Awareness

The deepest corrective technique for all five mistakes is the same: meta-awareness, the ability to observe your own behavior while it is happening. Most candidates are so focused on the content of their answers that they lose awareness of their process. Interviewers evaluate both.

Practice requirements gathering out loud with a partner or in front of a recording device. Watch back and count your questions. Classify each one: open or closed? Requirements or solution? High-value or low-value? Did you revisit requirements when you encountered a design conflict?

🧠 Mnemonic: Remember the acronym ORACLE to check your requirements behavior in real time:

  • O — Open-ended questions, not yes/no
  • R — Requirements, not solutions
  • A — Assumptions stated explicitly when not asking
  • C — Checkpoints during design
  • L — Lean (4–7 questions, not 15)
  • E — Exhaustive across dimensions (MECE audit)

📋 Quick Reference Card:

Mistake Signal to Interviewer Corrective Technique
🔴 Over-questioning Indecisive, anxious Filter: does answer change design?
🔴 Under-questioning Overconfident, risky MECE dimension audit
🔴 Closed questions Shallow inquiry Start with What/How
🔴 Req/solution conflation Poor problem framing "So That" test
🔴 No mid-design checkpoints Ignores trade-offs Explicit checkpoint language

Recognizing a mistake in the moment—and correcting it aloud—is itself a high-signal behavior. Interviewers know candidates are not perfect. They are watching for self-awareness and adaptability. Saying "Actually, let me reframe that as a requirement rather than jumping to a solution" demonstrates exactly the engineering judgment that separates senior candidates from junior ones.

Key Takeaways and Your Requirements Gathering Cheat Sheet

You started this lesson with what most candidates start with: a vague sense that you should "ask some questions" before diving in. You finish it with something far more powerful — a structured philosophy and a repeatable toolkit that transforms the requirements phase from an awkward preamble into the highest-signal five minutes of your entire interview.

This final section is your consolidation point. We will crystallize the core principles, hand you a cheat sheet you can internalize before your next interview, and point you toward the road ahead.


The One Principle That Anchors Everything

🎯 Key Principle: No design decision is defensible without a requirement that justifies it.

This is not a soft guideline. It is the load-bearing wall of every good system design. When you choose to use a message queue instead of direct HTTP calls, you should be able to point at a specific requirement — perhaps "the ingestion pipeline must tolerate a 10× traffic spike without dropping events" — that made that choice rational. When you decide to shard your database, you should be able to say "we established that we expect 500 million user records with a write-heavy access pattern." Every component, every trade-off, every architectural boundary should trace back to something a user, a business stakeholder, or an operational constraint actually needs.

Without that traceability, your design is not engineering — it is decoration. Interviewers know the difference immediately.

Wrong thinking: "I'll add a cache here because caches are generally good for performance."

Correct thinking: "We established that the read-to-write ratio is 100:1 and latency must stay under 50ms. A cache in front of the database is the most direct way to satisfy both constraints simultaneously."

The second answer demonstrates that you are solving this problem, not a generic problem. That distinction is what separates senior-level thinking from junior-level pattern-matching.


What You Now Know That You Didn't Before

Before working through this lesson, requirements gathering probably felt like a formality — something you do to appear thorough before getting to the "real" work of drawing boxes and arrows. Here is what has actually shifted in your understanding:

Before This Lesson After This Lesson
🔸 Requirements are a warm-up 🟢 Requirements are the interview's highest-signal phase
🔸 Ask a few obvious questions 🟢 Use a structured framework covering users, features, scale, and constraints
🔸 Ambiguity is a problem to tolerate 🟢 Ambiguity is an opportunity to demonstrate judgment
🔸 Jump to design when you feel ready 🟢 Enforce the five-minute rule without exception
🔸 All requirements are the same 🟢 Functional and non-functional requirements are distinct branches that drive different decisions
🔸 More features = more impressive 🟢 Ruthless scoping is a senior engineering skill

That table represents a genuine mental model upgrade. Carry it with you.


Your Requirements Gathering Cheat Sheet

This is the section you will want to bookmark. The questions below are organized into four categories that map directly to the framework introduced earlier in this lesson. They are written to be prompt-agnostic — they work whether you are designing a URL shortener, a ride-sharing backend, a distributed logging system, or a real-time multiplayer game.

🧠 Mnemonic: Remember USSCUsers, Scale, Scope, Constraints. Ask questions in that order and you will never leave a critical dimension unexplored.

Users and Access Patterns
  • 🧠 Who are the primary users of this system? (Consumers, businesses, internal engineers?)
  • 🧠 Are all users equivalent, or are there multiple tiers with different access levels?
  • 🧠 Where are users located geographically? Do we need multi-region support?
  • 🧠 What devices or clients will interact with this system? (Mobile, browser, API consumers?)
  • 🧠 Is access read-heavy, write-heavy, or balanced?
Scale and Volume
  • 📚 How many daily active users do we expect at launch? At maturity?
  • 📚 What is the expected read request volume per second?
  • 📚 What is the expected write request volume per second?
  • 📚 What is the average and maximum size of a single data record?
  • 📚 Do we anticipate bursty traffic patterns, or is load relatively uniform?
  • 📚 How much total data do we expect to store over one year? Five years?
Scope and Core Features
  • 🔧 Which features are in scope for this design session?
  • 🔧 Which features are explicitly out of scope?
  • 🔧 What is the single most important user action this system must support?
  • 🔧 Are there related systems we need to integrate with, or do we own the full stack?
  • 🔧 Is real-time delivery required, or is eventual consistency acceptable?
Constraints and Non-Negotiables
  • 🎯 What is the acceptable latency for the critical path? (p50? p99?)
  • 🎯 What availability SLA is expected? (99.9%? 99.99%?)
  • 🎯 Are there regulatory or compliance requirements? (GDPR, HIPAA, SOC 2?)
  • 🎯 What is the consistency model? Can we tolerate stale reads?
  • 🎯 Are there cost constraints or infrastructure preferences?
  • 🎯 Do we need to support offline operation or intermittent connectivity?

💡 Pro Tip: You do not need to ask all of these questions in every interview. The skill is knowing which questions are load-bearing for the specific prompt in front of you. For a messaging system, delivery guarantees are critical. For a search service, query latency and index freshness dominate. Develop the judgment to prioritize.



The Five-Minute Rule

Always spend at least five minutes on requirements before drawing a single component. This is non-negotiable, and it deserves its own section because candidates violate it constantly — not out of laziness, but out of anxiety. The whiteboard feels like a vacuum that demands to be filled. Silence feels like incompetence. The urge to start drawing is nearly physiological.

Resist it.

Five minutes of structured requirements gathering does three things simultaneously:

  1. It prevents catastrophic pivots. Nothing wastes more interview time than designing a system for thirty minutes and then discovering you misunderstood the core constraint. A single question — "Is this a consumer product or an internal tool?" — can completely change your architecture.

  2. It signals seniority. Junior engineers want to demonstrate knowledge by talking about technology. Senior engineers demonstrate judgment by understanding the problem first. Interviewers are explicitly watching for this distinction.

  3. It gives you a narrative skeleton. Once you have your requirements documented, every subsequent design decision becomes a sentence: "Because we need X, I am choosing Y." That sentence structure is the backbone of a coherent, followable design presentation.

⚠️ Common Mistake: Treating the five-minute rule as a ceiling rather than a floor. For a complex prompt — a distributed database, a global content delivery system — you may need eight or ten minutes. The rule is a minimum guarantee, not a timer that ends your thinking.

Here is what the five-minute rule looks like in practice, translated into a simple time-boxing script:

REQUIREMENTS GATHERING TIME BOX
────────────────────────────────────────────────────────
Minutes 0–1   Read the prompt aloud, paraphrase it back.
              "Just to make sure I understand, we're building..."

Minutes 1–3   Ask USSC questions in priority order.
              Capture answers visibly (write on the whiteboard).

Minutes 3–4   Derive rough scale estimates from the answers.
              "So if we have 10M DAUs and each user posts once a day..."

Minute 4–5    State your scoping decisions explicitly.
              "I'll focus on X and Y. I'm going to set aside Z."
              Confirm the interviewer agrees before proceeding.

Minute 5+     Now you may draw your first component.
────────────────────────────────────────────────────────

Notice that this is not a passive Q&A session. You are actively writing requirements on the board as you gather them. That visible artifact becomes the contract you design against for the rest of the session.


Turning Requirements into Code: A Practical Preview

Requirements do not live only on whiteboards. In real engineering work, and increasingly in interview settings that include a coding component, requirements translate directly into data structures, API contracts, and configuration schemas. Understanding this connection reinforces why precision in requirements matters so much.

Consider a requirements conversation that produced these scoped decisions for a notification service:

  • Users: 50 million subscribers
  • Core feature: Send push, email, and SMS notifications
  • Latency: Push must deliver within 2 seconds, email within 30 seconds
  • Reliability: At-least-once delivery for all channels
  • Out of scope: User preference management (handled by a separate service)

Those requirements immediately shape an API contract:

## notification_service.py
## API contract derived directly from requirements gathering
## Requirements source:
##   - At-least-once delivery → idempotency_key required
##   - Multiple channels → channel enum required
##   - SLA tracking → priority field maps to delivery queue selection

from enum import Enum
from dataclasses import dataclass
from typing import Optional

class NotificationChannel(Enum):
    PUSH = "push"       # SLA: 2 seconds
    EMAIL = "email"     # SLA: 30 seconds
    SMS = "sms"         # SLA: 30 seconds

class NotificationPriority(Enum):
    HIGH = "high"       # Routes to low-latency queue
    STANDARD = "standard"  # Routes to standard throughput queue

@dataclass
class SendNotificationRequest:
    user_id: str
    channel: NotificationChannel
    template_id: str
    payload: dict
    idempotency_key: str          # Enforces at-least-once without duplicates
    priority: NotificationPriority = NotificationPriority.STANDARD
    scheduled_at: Optional[str] = None  # ISO8601; None means send immediately

## Note: recipient_address is intentionally absent.
## Requirements established that user preference management
## is OUT OF SCOPE — a separate service owns address resolution.

Every field in this data class is traceable to a specific requirement. The idempotency_key exists because of the at-least-once delivery requirement. The absence of a recipient_address field is a direct consequence of scoping out preference management. This is requirements-driven design in its most concrete form.

Now consider how scale requirements translate into infrastructure configuration decisions:

## queue_config.py
## Scale requirements established:
##   - 50M subscribers, assume 5% receive a notification on any given day
##   - 2.5M notifications/day ≈ ~29 notifications/second average
##   - Peak factor of 10x assumed for event-driven spikes → ~290/second peak
##   - Push SLA: 2 seconds → high-priority queue, low consumer group latency
##   - Email/SMS SLA: 30 seconds → standard queue, higher batching acceptable

QUEUE_CONFIG = {
    "push_notifications": {
        "topic": "notifications.push",
        "partitions": 12,             # Supports ~300 msgs/sec at target throughput
        "replication_factor": 3,      # At-least-once durability requirement
        "retention_ms": 3_600_000,    # 1 hour: failed deliveries retry within SLA
        "consumer_group_lag_alert_threshold": 500,  # Alert before SLA breach
    },
    "email_notifications": {
        "topic": "notifications.email",
        "partitions": 6,              # Lower throughput, higher batching
        "replication_factor": 3,
        "retention_ms": 86_400_000,   # 24 hours: email retry window is longer
        "consumer_group_lag_alert_threshold": 5000,
    },
}

The partition counts, retention windows, and alert thresholds in this configuration are not arbitrary — they are derived from the numbers established during requirements gathering. This is what interviewers mean when they say they want to see "justified decisions."

💡 Real-World Example: At companies like Stripe and Twilio, every major service has a requirements document — sometimes called a design doc or RFC — where the first section is always "Constraints and Non-Goals." The non-goals section is the engineering equivalent of your scoping decisions in an interview. It exists precisely because without it, every stakeholder assumes their use case is in scope.



Connecting Forward: Functional and Non-Functional Requirements

Everything you have practiced in this lesson is the foundation for the next major topic in this course: the formal separation of functional requirements and non-functional requirements as the two main branches of structured system scoping.

Here is a preview of how these branches diverge:

REQUIREMENTS
     │
     ├─── FUNCTIONAL ──────────────────────────────────────────┐
     │    "What the system does"                               │
     │    • User can upload a photo                            │
     │    • System sends a confirmation email on registration  │
     │    • API returns results ranked by relevance            │
     │    Drives: API design, data models, service boundaries  │
     │                                                         │
     └─── NON-FUNCTIONAL ──────────────────────────────────────┘
          "How well the system does it"
          • 99.99% availability
          • p99 latency under 100ms
          • Horizontally scalable to 1M concurrent users
          Drives: Infrastructure choices, replication strategy,
                  caching layers, consistency models

When you asked "What should the system do?" during requirements gathering, you were collecting functional requirements. When you asked "How fast? How available? How consistent?" you were collecting non-functional requirements. The framework you learned in this lesson feeds both branches cleanly.

🤔 Did you know? The ISO/IEC 25010 quality model — the international standard for software quality — defines eight non-functional quality characteristics including performance efficiency, reliability, and security. Most system design interviews implicitly test your ability to reason about a subset of exactly these eight dimensions.

Understanding that these two branches drive different downstream decisions is what makes you precise rather than vague when discussing trade-offs. You will explore each branch in depth in the next lessons.


Practice Recommendation: The Blank Page Drill

Knowledge without practice is decoration. Here is a specific exercise designed to build the requirements gathering muscle before your next interview.

The Blank Page Drill:

Take five well-known system design interview prompts. For each one, set a timer for ten minutes. During those ten minutes, your only job is requirements gathering. You may not draw architecture diagrams. You may not list technologies. You may not think about databases. You may only ask and answer requirements questions as if you were in a live interview — write them down on paper or in a text file.

Suggested prompts to use:

  1. Design Twitter's home timeline feed
  2. Design a distributed rate limiter
  3. Design Google Drive
  4. Design a real-time leaderboard for a gaming platform
  5. Design a hotel reservation system

For each prompt, your ten-minute output should include:

  • 🎯 Three to five functional requirements (what the system must do)
  • 🎯 Three to five non-functional requirements with specific numbers
  • 🎯 Two to three explicit out-of-scope decisions
  • 🎯 A rough back-of-envelope scale estimate

Do this drill five times across different sessions and you will notice two things: first, your questions become faster and more targeted; second, you will start to see patterns in what requirements actually matter for different categories of systems (storage systems, communication systems, computation pipelines). That pattern recognition is senior-level intuition.

## blank_page_drill_template.py
## Use this structure to record your practice sessions programmatically.
## Run it as a lightweight CLI to time yourself and capture outputs.

import time
import json
from datetime import datetime

def run_drill(prompt: str, time_limit_minutes: int = 10):
    """Interactive requirements gathering drill with timer."""
    
    session = {
        "prompt": prompt,
        "started_at": datetime.now().isoformat(),
        "functional_requirements": [],
        "non_functional_requirements": [],
        "out_of_scope": [],
        "scale_estimates": [],
    }
    
    start_time = time.time()
    deadline = start_time + (time_limit_minutes * 60)
    
    print(f"\n🎯 PROMPT: {prompt}")
    print(f"⏱  You have {time_limit_minutes} minutes. No designing allowed.\n")
    
    categories = [
        ("functional_requirements", "Functional requirement (what it must do)"),
        ("non_functional_requirements", "Non-functional requirement (include a number)"),
        ("out_of_scope", "Explicit out-of-scope decision"),
        ("scale_estimates", "Scale estimate with calculation"),
    ]
    
    for key, label in categories:
        print(f"\n--- {label.upper()} ---")
        print("Enter items one per line. Empty line to move to next category.")
        while time.time() < deadline:
            remaining = int(deadline - time.time())
            entry = input(f"  [{remaining}s left] > ").strip()
            if not entry:
                break
            session[key].append(entry)
    
    session["completed_at"] = datetime.now().isoformat()
    session["elapsed_seconds"] = int(time.time() - start_time)
    
    print("\n📋 SESSION SUMMARY")
    print(json.dumps(session, indent=2))
    return session

## Example usage:
## run_drill("Design a distributed rate limiter")

This script is a practical drill companion. It forces you to categorize your requirements in real time and logs your sessions so you can review how your thinking evolves over multiple practice rounds.


Final Critical Reminders

⚠️ Critical Point 1: Requirements gathering is not a phase you complete and leave behind. Interviewers will introduce new information mid-session — a constraint you did not ask about, a scale assumption they reveal when you propose a particular design. When that happens, loop back explicitly: "That changes my earlier assumption about consistency. Let me update my requirements and show you how that shifts the design." That response is a senior engineer's response.

⚠️ Critical Point 2: Writing requirements on the whiteboard is not optional. Keeping them only in your head means the interviewer cannot follow your reasoning trail, and you cannot reference them when justifying later decisions. Make your requirements visible. They are your contract.

⚠️ Critical Point 3: Requirements gathering done well is collaborative, not interrogative. You are not conducting a deposition. You are thinking out loud with a partner who happens to know the answers. Phrase questions with context: "I want to make sure I get the consistency model right — is this a system where a user reading their own post immediately after writing it is critical, or can we tolerate a short propagation delay?" That kind of question shows you already understand the trade-off space, which is exactly what the interviewer wants to see.


📋 Quick Reference Card: Requirements Gathering at a Glance

🔧 Phase 📚 What You Do 🎯 What You Produce
🧠 Read Paraphrase the prompt aloud Confirmation of shared understanding
🔍 Ask Users Identify who uses it and how User types, access patterns
📏 Ask Scale Derive load and storage numbers Concrete figures for design
✂️ Scope Declare what's in and out Explicit feature boundary
🚧 Constrain Surface SLAs and hard limits Non-functional requirement list
✅ Confirm Get interviewer sign-off A contract you can design against


What Comes Next

With requirements gathering as your foundation, the next lessons in this course will build the full architectural reasoning stack on top of it:

  • 🔧 Functional Requirements Deep Dive: How to translate user stories into precise API contracts, data models, and service boundaries — with worked examples across different system categories.

  • 🎯 Non-Functional Requirements Deep Dive: How to reason about availability, latency, consistency, and durability with the mathematical precision that distinguishes strong candidates — including how to derive infrastructure decisions from SLA numbers.

  • 📚 Capacity Estimation: How to turn the scale numbers you gathered in requirements into concrete storage, bandwidth, and compute estimates that justify architectural choices.

Every one of those lessons assumes you arrive with clearly articulated requirements. That is no longer a gap in your preparation. You now know exactly how to produce them.

💡 Remember: The best system designers are not the ones who know the most architectural patterns. They are the ones who know which pattern is right for this problem, and they know that because they understood the problem before they started solving it. That discipline begins here, with requirements, every single time.