You are viewing a preview of this lesson. Sign in to start learning
Back to Mastering Leet Code Algorithms

String Algorithms

String Sorts, Tries, Substring Search, Regular Expressions, Data Compression

Why String Algorithms Are the Backbone of Coding Interviews

Think about the last time you used a search engine, sent a text message, or let autocomplete finish your sentence. Every one of those moments was powered, at its core, by string algorithms. Now imagine sitting across from a Google or Amazon interviewer, and they slide a problem across the table: "Given a string, find the longest substring without repeating characters." Do you reach for a solution confidently โ€” or does your mind go blank? Grab the free flashcards embedded throughout this lesson to lock in the key concepts as you go, because by the end of this section, you'll understand not just what string algorithms are, but why they are the single most important category you can master before your next technical interview.

Strings are everywhere in software. They carry user input, encode DNA sequences, represent file paths, and form the foundation of natural language processing. Yet despite their ubiquity, string problems have a reputation for catching even experienced engineers off guard. The reason isn't that the problems are impossibly hard โ€” it's that strings have subtle properties that change how you must think about solutions. This section pulls back the curtain on those properties, sets the stage for everything that follows in this lesson, and gives you a mental framework for approaching any string problem you encounter.

The Numbers Don't Lie: Strings Dominate Technical Interviews

Let's start with a fact that should immediately sharpen your focus. An analysis of LeetCode interview reports from candidates at top-tier companies โ€” Google, Amazon, Meta, Microsoft, and Apple โ€” reveals that string-related problems appear in over 30% of all reported interview questions. That makes strings the single most frequently tested algorithmic category, edging out arrays, trees, and dynamic programming in raw frequency.

๐Ÿค” Did you know? On LeetCode alone, there are over 600 problems tagged with "String" as of 2024 โ€” more than any other single tag. Companies like Amazon have entire interview rounds dedicated exclusively to string manipulation because it maps directly to real engineering work: parsing logs, validating inputs, searching documents, and building compilers.

Why do interviewers love string problems so much? Because a single string problem can simultaneously test:

  • ๐Ÿง  Algorithmic thinking โ€” Can you identify the right pattern?
  • ๐Ÿ“š Data structure knowledge โ€” Do you know when to use a hash map vs. a sliding window?
  • ๐Ÿ”ง Language fluency โ€” Are you aware of built-in string methods and their costs?
  • ๐ŸŽฏ Edge case awareness โ€” Can you handle empty strings, single characters, and Unicode?
  • ๐Ÿ”’ Complexity reasoning โ€” Do you know the difference between an O(n) and O(nยฒ) approach?

A single problem, five dimensions of evaluation. That's extraordinary leverage for an interviewer โ€” and it's exactly why you need to own this topic completely.

Why Strings Are Uniquely Tricky: The Immutability Trap

Here is where many self-taught developers hit an invisible wall. In languages like Python, Java, and JavaScript, strings are immutable โ€” once created, they cannot be changed in place. This is not a minor implementation detail. It is a fundamental constraint that reshapes how you design efficient solutions.

Consider the seemingly innocent act of building a result string by concatenating characters one at a time:

## โš ๏ธ Naive approach โ€” looks innocent, hides a performance disaster
def reverse_string_naive(s: str) -> str:
    result = ""
    for char in s:        # For each character...
        result = char + result  # ...we create a BRAND NEW string object
    return result
    # Time complexity: O(nยฒ) โ€” each concatenation copies all previous characters
    # Space complexity: O(nยฒ) โ€” temporary strings pile up in memory

This code works โ€” but at a hidden cost. Because strings are immutable in Python, every + operation allocates a new string in memory and copies all existing characters into it. For a string of length n, this means roughly 1 + 2 + 3 + ... + n = n(n+1)/2 copy operations. That's O(nยฒ) time complexity for what looks like a simple loop.

The correct approach leverages a list as a mutable buffer, which is the idiomatic Python pattern:

## โœ… Efficient approach โ€” use a mutable buffer, then join once
def reverse_string_efficient(s: str) -> str:
    chars = list(s)          # Convert to mutable list: O(n) time and space
    left, right = 0, len(s) - 1
    
    while left < right:      # Two-pointer technique
        chars[left], chars[right] = chars[right], chars[left]  # Swap in-place
        left += 1
        right -= 1
    
    return ''.join(chars)    # One single join at the end: O(n)
    # Time complexity: O(n) โ€” single pass through the array
    # Space complexity: O(n) โ€” the character list

The difference? The efficient version performs a single join at the end, rather than creating a new string on every iteration. The join operation is O(n) and happens exactly once. This is a pattern you will use repeatedly across dozens of LeetCode problems.

๐Ÿ’ก Mental Model: Think of a string like a sentence written in wet cement โ€” once it dries (once the string is created), you can't edit individual letters. If you want to make changes, you have to pour a whole new slab. Smart string algorithms minimize how often you "pour new cement" by working in a scratchpad (a list or array) and only solidifying the final result once.

โŒ Wrong thinking: "I can build up a result string by concatenating inside a loop โ€” it's just one operation per step."

โœ… Correct thinking: "Each concatenation on an immutable string is a full copy. I'll use a list as a buffer and join once at the end to stay O(n)."

The Core Problem Categories You Must Know

String problems on LeetCode aren't a random jungle โ€” they cluster into recognizable families. Learning to identify which family a problem belongs to is half the battle. Here are the five major categories you'll encounter, along with a quick sense of what makes each one distinctive:

Pattern Matching

Pattern matching problems ask you to find occurrences of one string (the pattern) inside another (the text). The naive approach โ€” sliding the pattern across the text character by character โ€” runs in O(n ร— m) time, where n is the text length and m is the pattern length. Advanced algorithms like Knuth-Morris-Pratt (KMP) bring this down to O(n + m) by exploiting structural information in the pattern itself. You'll encounter these in problems like Implement strStr() and Repeated Substring Pattern.

Substring search problems ask questions like: "Find the longest substring without repeating characters" or "Find the minimum window substring that contains all characters of T." These problems almost always yield to the sliding window pattern โ€” a technique where you maintain a dynamic window over the string and expand or contract it based on a condition.

Anagrams

Anagram problems hinge on the insight that two strings are anagrams if and only if they have identical character frequency distributions. The canonical tool here is a frequency hash map (or a fixed-size array of 26 integers for lowercase English letters). Problems like Group Anagrams and Find All Anagrams in a String are staples in Google and Amazon interviews.

Palindromes

Palindrome problems ask about strings that read the same forwards and backwards. They range from simple checks (Valid Palindrome) to fiendishly complex optimization problems (Longest Palindromic Subsequence). Palindromes introduce you to two-pointer techniques, dynamic programming on strings, and the elegant Manacher's algorithm.

Sliding Window

While sliding window appears within other categories, it deserves its own mention because it is the most reusable pattern across all of string algorithms. The sliding window technique maintains a contiguous subarray or substring satisfying some constraint, moving the window boundaries to avoid redundant computation. It converts many O(nยฒ) brute-force approaches into clean O(n) solutions.

## Sliding window template โ€” memorize this skeleton
def sliding_window_template(s: str) -> int:
    left = 0
    window_state = {}    # Hash map tracking what's in our current window
    result = 0
    
    for right in range(len(s)):          # Expand window to the right
        char = s[right]
        window_state[char] = window_state.get(char, 0) + 1  # Add char to window
        
        while window_violates_constraint(window_state):  # Shrink from left
            left_char = s[left]
            window_state[left_char] -= 1
            if window_state[left_char] == 0:
                del window_state[left_char]
            left += 1
        
        result = max(result, right - left + 1)  # Update answer
    
    return result
    # Time: O(n) โ€” each character enters and exits the window at most once
    # Space: O(k) โ€” where k is the size of the character set

This template is intentionally abstract โ€” the magic is in defining what "violates the constraint" means for your specific problem. Swap in different conditions and you solve dozens of different problems with the same underlying structure.

๐ŸŽฏ Key Principle: Most string problems map to one of five families: pattern matching, substring search, anagrams, palindromes, or sliding window. When you see a new problem, your first question should be: "Which family does this belong to?" That diagnosis alone will point you toward the right tool.

String Mastery Unlocks Adjacent Topics

Here's something that doesn't get said enough: mastering string algorithms is a force multiplier across your entire algorithmic skill set. The techniques you learn for strings don't stay confined to string problems โ€” they bleed into dynamic programming, graph algorithms, and even system design.

Consider dynamic programming on strings. Problems like Edit Distance, Longest Common Subsequence, and Regular Expression Matching are technically DP problems โ€” but the strings are the input, and understanding how to index into them, compare them character by character, and reason about prefixes is prerequisite knowledge. You can't solve hard DP problems involving strings if you haven't first built fluency with basic string operations.

Consider tries (prefix trees). A trie is a tree-shaped data structure specifically designed for storing and querying strings. It powers autocomplete engines, spell checkers, and IP routing tables. Understanding tries requires you to think of strings as sequences of characters with shared prefixes โ€” a mental model you build naturally through string algorithm practice.

Consider graph problems. Word ladder problems (Word Ladder I and II on LeetCode) represent words as nodes in a graph and edges as single-character substitutions. Solving them requires BFS over a graph โ€” but the ability to enumerate single-character neighbors of a string is a pure string skill.

String Algorithms โ”€โ”€โ–บ Sliding Window โ”€โ”€โ–บ Two Pointers โ”€โ”€โ–บ Array Algorithms
        โ”‚
        โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Frequency Maps โ”€โ”€โ–บ Hash Table Mastery
        โ”‚
        โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Prefix/Suffix Logic โ”€โ”€โ–บ DP on Strings
        โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Trie Structures โ”€โ”€โ–บ Advanced Tree Problems

๐Ÿ’ก Real-World Example: The engineers who built Gmail's search โ€” finding emails containing a specific phrase across billions of messages โ€” relied on the same sliding window and pattern matching algorithms you'll learn in this lesson. The scale is different; the core ideas are identical.

๐Ÿง  Mnemonic: Think of string algorithms as a master key ring. Each technique you learn (sliding window, two pointers, hashing) is a key. And the beautiful thing about this key ring is that the same keys open doors in completely different rooms โ€” DP, graphs, trees. The more keys on your ring, the more doors you can open.

Setting Expectations: Complexity Targets for String Problems

Before you dive into practicing, you need calibrated expectations about what good solutions look like in terms of time and space complexity. Interviewers at top companies have implicit benchmarks. Knowing these targets helps you recognize when you've found an acceptable solution versus when you should keep optimizing.

๐Ÿ“‹ Quick Reference Card: String Problem Complexity Targets

๐ŸŽฏ Problem Type โšก Time Target ๐Ÿ’พ Space Target ๐Ÿ”ง Key Technique
๐Ÿ” Single-pass scan O(n) O(1) or O(k) Two pointers
๐ŸชŸ Substring window O(n) O(k) Sliding window
๐Ÿ” Anagram detection O(n) O(1) fixed Frequency array
๐Ÿ“ Palindrome check O(n) O(1) Two pointers
๐Ÿงฉ Pattern matching O(n + m) O(m) KMP / Rabin-Karp
๐Ÿ“Š Edit distance O(n ร— m) O(n ร— m) or O(n) DP
๐ŸŒณ Prefix queries O(n) build, O(m) query O(n ร— ฮฃ) Trie

Here, n is the length of the primary string, m is the length of a pattern or secondary string, k is the size of the character set (often 26 for lowercase English, 128 for ASCII, or 65,536 for Unicode), and ฮฃ is the alphabet size.

โš ๏ธ Common Mistake โ€” Mistake 1: Accepting O(nยฒ) solutions for sliding window problems. โš ๏ธ If you find yourself writing a nested loop to check every possible substring, pause. Almost every "find the optimal substring" problem has an O(n) sliding window solution. An O(nยฒ) answer will pass small test cases but time out on large inputs โ€” exactly the inputs interviewers use to differentiate strong candidates.

โš ๏ธ Common Mistake โ€” Mistake 2: Forgetting space complexity when declaring victory. โš ๏ธ A solution that runs in O(n) time but uses O(nยฒ) space hasn't really won. Always report both dimensions. For string problems specifically, watch out for solutions that store every possible substring โ€” there are O(nยฒ) substrings of a length-n string, and storing them all blows your space budget.

๐Ÿ’ก Pro Tip: When an interviewer says "can you optimize that?" after your first solution, the most likely target improvement is one of these three: (1) moving from O(nยฒ) to O(n) time using sliding window, (2) moving from O(n) to O(1) extra space using in-place two pointers, or (3) moving from O(n ร— m) to O(n + m) time using KMP for pattern matching. These are the three most common optimization jumps in string problems.

Your Starting Point: A Concrete Benchmark

Let's ground everything above in a single, concrete problem that you've probably already seen โ€” Longest Substring Without Repeating Characters (LeetCode #3). This problem is beloved by interviewers precisely because it rewards all the thinking we've discussed: recognizing the sliding window pattern, understanding why naive O(nยฒ) is unacceptable, and using a hash map correctly.

A candidate who solves this in O(n) time, explains their reasoning about immutability and buffer choice, and proactively discusses edge cases (empty string, all-unique characters, all-same characters) demonstrates exactly the kind of holistic string thinking that earns strong hire signals.

That's your benchmark. Not just getting the right answer โ€” but demonstrating that you understand why the right answer is right, where the naive approach fails, and what the solution cost is in both time and space.

By the time you finish all six sections of this lesson, that benchmark will feel comfortable. You'll have a mental library of patterns, a toolkit of techniques, and โ€” critically โ€” the judgment to know which tool fits which problem.

The sections ahead build in a deliberate order: we'll establish foundational theory in Section 2, learn the three most powerful patterns in Section 3, climb to advanced techniques in Section 4, drill edge cases and pitfalls in Section 5, and consolidate everything into a cheat sheet in Section 6. Each section builds on the last. String algorithms aren't memorized โ€” they're understood. And understanding starts here.

๐ŸŽฏ Key Principle: The goal of this lesson isn't to teach you 600 string problems. It's to teach you the 10-15 patterns that underlie all 600 of them. Pattern recognition is the superpower that separates engineers who struggle with string problems from those who solve them confidently on a whiteboard under pressure.

Let's build that superpower โ€” one pattern at a time.

Core String Concepts: Characters, Encodings, and Fundamental Operations

Before you can solve string problems efficiently, you need to understand what a string actually is at the machine level. Most developers treat strings as black boxes โ€” sequences of text that just work โ€” until they hit a performance cliff or an edge case that exposes how little they understood about what was happening underneath. This section tears off that black box and gives you the mental model that separates developers who grind through string problems from those who see through them.

ASCII vs Unicode: Why Encoding Is an Algorithm Design Decision

At its core, a string is a sequence of numbers. The character encoding is the mapping that says which number corresponds to which human-readable symbol. This sounds like a low-level implementation detail, but it directly shapes how you approach LeetCode problems.

ASCII (American Standard Code for Information Interchange) is the original encoding. It uses 7 bits, giving it 128 possible values (0โ€“127). The printable characters โ€” letters, digits, punctuation โ€” live in the range 32โ€“126. Lowercase letters aโ€“z map to 97โ€“122, and uppercase Aโ€“Z map to 65โ€“90. This relationship is load-bearing for many interview problems.

ASCII Table (partial):

  Char   Dec   Binary
  ----   ---   ------
  'A'     65   0100 0001
  'Z'     90   0101 1010
  'a'     97   0110 0001
  'z'    122   0111 1010
  '0'     48   0011 0000
  '9'     57   0011 1001
  ' '     32   0010 0000

The distance between 'a' and 'A' is exactly 32 โ€” a single bit flip. This is why ord('a') - ord('A') == 32 and why you can toggle case with a bitwise XOR: char ^ 32. More practically, ord(c) - ord('a') gives you a zero-indexed integer from 0โ€“25 for any lowercase letter. This arithmetic is the foundation of the frequency array technique you'll see shortly.

Unicode is ASCII's ambitious successor. It aims to encode every character in every human writing system โ€” over 140,000 characters spanning 150+ scripts. The most common implementation is UTF-8, a variable-width encoding where ASCII characters still take 1 byte, but other characters can take 2, 3, or 4 bytes. Python 3 strings are Unicode by default. Java char is 16-bit UTF-16 internally.

๐ŸŽฏ Key Principle: When a LeetCode problem states "lowercase English letters only," that's your signal to use a size-26 frequency array. When it says "printable ASCII," use size 128. When it says "any Unicode characters," reach for a hash map.

โš ๏ธ Common Mistake โ€” Mistake 1: Assuming one character equals one byte. In Python 3, len("cafรฉ") returns 4, but if you were working at the byte level, รฉ is 2 bytes in UTF-8. In most LeetCode problems this doesn't matter, but in problems involving byte-length constraints or stream processing, this distinction is critical.

๐Ÿค” Did you know? The emoji ๐Ÿ˜€ is a single Unicode code point (U+1F600) but takes 4 bytes in UTF-8 and two char values in Java (called a surrogate pair). A naive s.length() in Java would return 2 for a string containing only that emoji.

The Hidden Cost of String Operations

Now that you understand what characters are, let's talk about the operations you perform on them โ€” because this is where most developers unknowingly write O(nยฒ) code while thinking they wrote O(n) code.

String Concatenation

In Python and Java, strings are immutable. This word will come up again, but first understand its consequence: every time you concatenate two strings, the runtime cannot modify the original โ€” it must allocate a new block of memory and copy both strings into it.

Concatenation: "hello" + " world"

Step 1: Allocate new memory of size 11
Step 2: Copy "hello" (5 chars) into new block
Step 3: Copy " world" (6 chars) into new block
Step 4: Return new string "hello world"
Original strings are untouched (and eventually garbage collected)

One concatenation is O(n). But what happens inside a loop?

## โŒ Wrong: O(nยฒ) string building
def reverse_naive(s: str) -> str:
    result = ""
    for char in s:          # n iterations
        result = char + result  # each concat is O(len(result))
    return result
    # Total work: 0 + 1 + 2 + ... + n = O(nยฒ)

This is the silent O(nยฒ) trap. The loop runs n times, and each concatenation copies an increasingly large string. For n=10,000, this performs roughly 50 million character copies. The fix is to collect characters in a list (O(1) amortized append) and join at the end:

## โœ… Correct: O(n) string building
def reverse_efficient(s: str) -> str:
    result = []             # list of chars
    for char in s:
        result.append(char) # O(1) amortized
    result.reverse()        # O(n) one-time cost
    return "".join(result)  # O(n) one-time join
    # Total work: O(n)

In Java, the equivalent is StringBuilder. Never use String += inside a loop in Java โ€” use StringBuilder.append() instead, then call .toString() once at the end.

๐Ÿ’ก Pro Tip: Python's "".join(list) is not just a convenience โ€” it's a fundamentally different algorithm. join computes the total length first, allocates once, and fills in each piece. It's guaranteed O(n) regardless of how many pieces you join.

Slicing Complexity

Slicing โ€” taking a substring like s[i:j] in Python โ€” always creates a new string. It is O(k) where k is the length of the slice, not O(1). This becomes a problem when you slice inside a loop:

## โš ๏ธ This loop is O(nยฒ), not O(n)
for i in range(len(s)):
    substring = s[i:]  # O(n-i) copy each iteration
    process(substring)

The alternative is to pass index pointers instead of actual substrings. Rather than process(s[i:]), pass process(s, i, len(s)) and let the function work with indices. This pattern is used heavily in algorithms like KMP and sliding window.

String Comparison

Comparing two strings with == is O(n) in the worst case โ€” the runtime must check each character until it finds a mismatch or confirms equality. This matters in algorithms that compare many candidate strings, like in a naive anagram search that checks equality at each window position.

๐Ÿ“‹ Quick Reference Card: String Operation Complexities

Operation Python Example Time Complexity Notes
๐Ÿ”’ Access by index s[i] O(1) Direct memory address
๐Ÿ“ Length len(s) O(1) Cached in object header
โž• Concatenation s + t O(n + m) New allocation required
โœ‚๏ธ Slice s[i:j] O(j - i) Copies characters
๐Ÿ” Comparison s == t O(n) worst case Short-circuits on mismatch
๐Ÿ”Ž Search (in) p in s O(nยทm) naive Can be O(n) with KMP
๐Ÿ”— Join "".join(lst) O(total chars) Single allocation

The Two-Pointer Technique on Strings

The two-pointer technique is one of the most elegant tools in the string algorithm toolkit. The core idea is to maintain two index variables that traverse the string simultaneously, allowing you to examine relationships between characters at different positions without nested loops.

There are two primary configurations:

Configuration 1: Opposite Ends (converging)

  s = "r a c e c a r"
       โ†‘           โ†‘
     left         right
     
Both pointers move inward until they meet.
Used for: palindrome checking, reversals, two-sum on sorted arrays.


Configuration 2: Same Direction (fast/slow)

  s = "a a b b c"
       โ†‘ โ†‘
     slow fast

Fast pointer scouts ahead; slow pointer marks a boundary.
Used for: removing duplicates, finding unique characters, partitioning.

Let's see the opposite-ends approach in action with palindrome verification:

def is_palindrome(s: str) -> bool:
    """
    Check if s is a palindrome using two converging pointers.
    Time: O(n), Space: O(1)
    """
    left, right = 0, len(s) - 1
    
    while left < right:
        # If characters at both ends match, move both pointers inward
        if s[left] != s[right]:
            return False  # Mismatch found โ€” not a palindrome
        left += 1
        right -= 1
    
    return True  # All corresponding pairs matched

## Trace for s = "racecar":
## left=0 'r', right=6 'r' โ†’ match, move inward
## left=1 'a', right=5 'a' โ†’ match, move inward
## left=2 'c', right=4 'c' โ†’ match, move inward
## left=3, right=3 โ†’ left not < right, exit loop โ†’ True

The power here is O(n) time and O(1) space. Compare this to the naive approach of reversing the string (s == s[::-1]), which uses O(n) extra space and does twice the work.

๐Ÿ’ก Mental Model: Think of the two pointers as two people walking toward each other across a bridge, checking that the tiles they step on are mirrored. They don't need to remember every tile they've passed โ€” they only care about the current pair.

The fast/slow variant solves a different class of problems. Imagine you want to remove duplicate adjacent characters in-place style. The slow pointer marks where the next valid character should go; the fast pointer scans forward:

def remove_duplicates(s: str) -> str:
    """
    Remove adjacent duplicate characters.
    e.g., "aabbcc" โ†’ "abc", "aabba" โ†’ "b" (stack approach)
    
    Here: deduplicate adjacent duplicates, keeping one of each.
    Time: O(n), Space: O(n) for output list
    """
    chars = list(s)  # Convert to list for in-place style manipulation
    slow = 0         # slow marks the boundary of our deduplicated prefix
    
    for fast in range(len(chars)):
        # Only write a character if it's different from the last written
        if slow == 0 or chars[fast] != chars[slow - 1]:
            chars[slow] = chars[fast]
            slow += 1
    
    return "".join(chars[:slow])

## Trace for "aabbcc":
## fast=0 'a': slow=0, always write โ†’ chars[0]='a', slow=1
## fast=1 'a': chars[1-1]='a' == 'a', skip
## fast=2 'b': chars[0]='a' != 'b', write โ†’ chars[1]='b', slow=2
## fast=3 'b': chars[1]='b' == 'b', skip
## fast=4 'c': chars[1]='b' != 'c', write โ†’ chars[2]='c', slow=3
## fast=5 'c': chars[2]='c' == 'c', skip
## Result: chars[:3] = ['a','b','c'] โ†’ "abc"

๐ŸŽฏ Key Principle: Two pointers eliminate the need for nested loops by maintaining invariants โ€” guaranteed true conditions about what the pointers represent at each step. Identify your invariant before coding, and your loop logic writes itself.

Frequency Arrays and Hash Maps: O(1) Character Tracking

A massive category of string problems asks you to count, compare, or track the frequency of characters. The naive approach โ€” scanning the entire string for each query โ€” is O(n) per query. The smart approach preprocesses the string into a frequency structure that answers queries in O(1).

Frequency arrays exploit the bounded character set. If you know the string contains only lowercase English letters, there are exactly 26 possible characters. An array of size 26 can store the count of each:

String: "leetcode"

Index:  0  1  2  3  4  5  6  ... 11 ... 14 ... 19 ...
Char:   a  b  c  d  e  f  g  ...  l  ...  o  ...  t  ...
Count:  0  0  1  1  2  0  0  ...  1  ...  1  ...  1  ...

freq['c' - 'a'] = freq[2] = 1
freq['e' - 'a'] = freq[4] = 2

Building this array is O(n); each subsequent lookup is O(1). Here's how this enables the classic anagram check:

def is_anagram(s: str, t: str) -> bool:
    """
    Determine if t is an anagram of s.
    An anagram uses exactly the same characters with the same frequencies.
    Time: O(n), Space: O(1) โ€” the array is fixed size 26
    """
    if len(s) != len(t):
        return False  # Different lengths can't be anagrams
    
    freq = [0] * 26  # Index 0 = 'a', index 25 = 'z'
    
    # Increment for s, decrement for t
    # If they're anagrams, every count cancels back to 0
    for cs, ct in zip(s, t):
        freq[ord(cs) - ord('a')] += 1
        freq[ord(ct) - ord('a')] -= 1
    
    # All frequencies must be zero for a true anagram
    return all(count == 0 for count in freq)

## s="anagram", t="nagaram" โ†’ True
## s="rat", t="car" โ†’ False (r:1 vs r:1, a:1 vs a:1, t:1 vs 0, c:0 vs 1)

When the character set is larger or unknown, swap the array for a hash map (Python dict or collections.Counter). The lookup is still O(1) average, but now you handle Unicode, digits, symbols, and spaces without changing your logic.

๐Ÿง  Mnemonic: "26 for lowercase, 52 for both cases, 128 for full ASCII, dict for the rest." Use the smallest structure that covers your input โ€” smaller means better cache performance and simpler code.

โš ๏ธ Common Mistake โ€” Mistake 2: Using a set when you need a frequency count. A set tells you whether a character appears, not how many times. If the problem involves duplicate counts (anagram checking, finding a character that appears exactly twice), you need a frequency map, not a set.

โŒ Wrong thinking: "I just need to check which characters are present, so a set is fine for the anagram problem."

โœ… Correct thinking: "An anagram requires the same character counts, not just the same character set. 'aab' and 'abb' share the same character set but are not anagrams."

String Immutability: The Python and Java Trap

The concept of immutability deserves its own focused treatment because it's the root cause of many performance bugs and conceptual errors in string programming.

In Python, when you write s[0] = 'x', you get a TypeError. The string object s is locked in memory โ€” its bytes cannot be altered. Every transformation โ€” uppercasing, replacing, stripping โ€” returns a brand-new string object while leaving the original intact. This is a deliberate design choice that enables safe string sharing (two variables can point to the same string object without fear), but it makes "in-place" manipulation impossible.

In Java, String objects are equally immutable, and the Java compiler even interns common strings (stores them in a pool so identical literals share one object). StringBuffer (thread-safe) and StringBuilder (single-thread, faster) exist precisely to provide a mutable sequence of characters.

Python String Immutability:

  s = "hello"           t = s.replace('h', 'j')
  
  Memory:               Memory:
  s โ”€โ†’ [h|e|l|l|o]     s โ”€โ†’ [h|e|l|l|o]  (unchanged!)
                        t โ”€โ†’ [j|e|l|l|o]  (new allocation)

The practical solution in Python is to convert the string to a list, perform your in-place style operations on the list (since lists are mutable), and join back at the end:

def reverse_vowels(s: str) -> str:
    """
    Reverse only the vowels in a string.
    e.g., "hello" โ†’ "holle", "leetcode" โ†’ "leotcede"
    
    Strategy: Convert to list, use two-pointer swap, join.
    Time: O(n), Space: O(n) for the list
    """
    chars = list(s)         # Mutable copy โ€” O(n)
    vowels = set('aeiouAEIOU')
    left, right = 0, len(chars) - 1
    
    while left < right:
        # Advance left until we find a vowel
        while left < right and chars[left] not in vowels:
            left += 1
        # Advance right until we find a vowel
        while left < right and chars[right] not in vowels:
            right -= 1
        # Swap the two vowels
        if left < right:
            chars[left], chars[right] = chars[right], chars[left]
            left += 1
            right -= 1
    
    return "".join(chars)   # Reconstruct string โ€” O(n)

## "hello" โ†’ list: ['h','e','l','l','o']
## left finds 'e' at index 1, right finds 'o' at index 4 โ†’ swap
## โ†’ ['h','o','l','l','e'] โ†’ "holle"

๐Ÿ’ก Pro Tip: In Python, list(s) and "".join(chars) are your entry and exit points for mutable string manipulation. The pattern is so common that you should internalize it as a single mental unit: convert โ†’ manipulate โ†’ join.

โš ๏ธ Common Mistake โ€” Mistake 3: Calling .replace() or .upper() and expecting it to modify the original. These methods return new strings. If you write s.replace('a', 'b') without capturing the result, nothing changes. You must write s = s.replace('a', 'b') or assign to a new variable.

๐Ÿค” Did you know? Python actually optimizes some string operations. For very short strings (often 20 characters or fewer), CPython may avoid the O(nยฒ) behavior of repeated concatenation due to reference count tricks. But this is an implementation detail you should never rely on โ€” the asymptotic behavior is still O(nยฒ) and this optimization disappears at scale.

Bringing It All Together

These five concepts โ€” encoding, operation complexity, two pointers, frequency structures, and immutability โ€” are not independent topics. They form an interconnected foundation. When you pick up a new string problem, you're implicitly making decisions in all five domains simultaneously:

New String Problem Decision Flow:

  Is input ASCII only?  โ”€โ”€YESโ”€โ”€โ†’ Use array (size 26 or 128)
         โ”‚
         NO
         โ†“
  Use hash map (dict / HashMap)
  
  Need to modify string?  โ”€โ”€YESโ”€โ”€โ†’ Convert to list (Python)
         โ”‚                         Use StringBuilder (Java)
         NO
         โ†“
  Work with index arithmetic directly
  
  Multiple traversals needed? โ”€โ”€YESโ”€โ”€โ†’ Two-pointer or sliding window
         โ”‚
         NO
         โ†“
  Single pass with frequency tracking

Master these building blocks and the patterns in the next section โ€” sliding window, hashing, and two pointers applied together โ€” will click into place immediately. You're not learning isolated tricks; you're learning a language, and these are the vocabulary and grammar that make every advanced sentence possible.

Essential Pattern Playbook: Sliding Window, Two Pointers, and Hashing

If there is a single lesson that will transform your performance on string problems, it is this one. The three patterns covered here โ€” sliding window, two pointers, and hashing โ€” appear, in some form, in the majority of medium and hard string problems on LeetCode. More importantly, they compose well: many problems require you to layer two of them together. By the end of this section you will have reusable templates for each pattern, a clear mental model for when to reach for each one, and annotated code you can adapt immediately.


The Sliding Window Pattern

The sliding window is a technique for processing a contiguous subarray or substring by maintaining a window โ€” a left and right boundary โ€” that you slide across the input without resetting to the beginning on each step. This collapses what would naively be an O(nยฒ) or O(nยณ) scan into an O(n) pass.

๐ŸŽฏ Key Principle: The window represents the "current candidate" answer. At every step, you either expand the window to include more characters, or shrink it to satisfy a constraint. The key insight is that you never need to revisit characters you have already processed.

Fixed-Size vs. Dynamic Windows

There are two flavors of sliding window problems, and recognizing which one you are facing is the first decision to make.

Fixed-size windows are simpler: the window length k is given. You slide a window of exactly k characters across the string and compute something about each window. Finding all anagrams of a pattern in a string (LeetCode 438) is a classic example โ€” the window size equals the pattern length.

Dynamic windows are more nuanced: the window expands and shrinks based on whether a constraint is satisfied. The window size is not predetermined; instead, you expand the right boundary to include new characters and shrink the left boundary when the window becomes invalid. Longest Substring Without Repeating Characters (LeetCode 3) is the canonical example.

Fixed window (k=3) sliding over "abcde":

[a b c] d e   โ†’ process window
 a [b c d] e   โ†’ slide right by 1
 a  b [c d e]  โ†’ slide right by 1

Dynamic window expanding and shrinking:

"abcba"
 [a] b c b a       left=0, right=0  valid
 [a b] c b a       left=0, right=1  valid
 [a b c] b a       left=0, right=2  valid
  a [b c b] a  โ† 'b' repeated! shrink left
  a  b [c b] a     left=2, right=3  valid again
The Universal Sliding Window Template

The template below captures the dynamic window pattern. Memorize the shape of it โ€” the logic inside the while block changes, but the outer structure is almost always the same.

def sliding_window_template(s: str) -> int:
    left = 0
    window = {}         # tracks character counts inside the window
    result = 0

    for right in range(len(s)):
        # 1. EXPAND: add s[right] into the window
        char = s[right]
        window[char] = window.get(char, 0) + 1

        # 2. SHRINK: while the window is invalid, move left forward
        while window_is_invalid(window):   # replace with your condition
            left_char = s[left]
            window[left_char] -= 1
            if window[left_char] == 0:
                del window[left_char]
            left += 1

        # 3. UPDATE RESULT: at this point the window is always valid
        result = max(result, right - left + 1)

    return result

The three-step loop body โ€” expand, shrink, update โ€” is the heartbeat of every dynamic window solution. The only thing that changes between problems is the definition of "invalid" and what you update in the result.

Deep Dive: LeetCode 3 โ€” Longest Substring Without Repeating Characters

This problem asks for the length of the longest substring with all unique characters. The window constraint is: no character may appear more than once. The window becomes invalid the moment a duplicate is added.

def length_of_longest_substring(s: str) -> int:
    char_count = {}   # frequency map of characters in the current window
    left = 0
    max_len = 0

    for right in range(len(s)):
        # Expand: include s[right] in the window
        c = s[right]
        char_count[c] = char_count.get(c, 0) + 1

        # Shrink: if s[right] created a duplicate, slide left
        # until the duplicate is gone
        while char_count[c] > 1:
            char_count[s[left]] -= 1
            if char_count[s[left]] == 0:
                del char_count[s[left]]
            left += 1

        # At this point, window [left..right] has no duplicates
        max_len = max(max_len, right - left + 1)

    return max_len

## Trace on "abcba":
## right=0 (a): window={a:1}, max=1
## right=1 (b): window={a:1,b:1}, max=2
## right=2 (c): window={a:1,b:1,c:1}, max=3
## right=3 (b): window={a:1,b:2,c:1} โ†’ shrink โ†’ left=1
##              window={b:1,c:1}, max still 3
## right=4 (a): window={b:1,c:1,a:1}, max=3

Time complexity: O(n) โ€” each character enters and exits the window at most once. Space complexity: O(min(m, n)) where m is the alphabet size.

โš ๏ธ Common Mistake: A common error is resetting left = right + 1 whenever you find a duplicate instead of shrinking step by step. This works on some inputs but fails when the same character appears three or more times in a row, because you skip over the correct left boundary.

LeetCode 76 โ€” Minimum Window Substring

This is a harder problem: given strings s and t, find the smallest substring of s that contains all characters of t. This is still a dynamic window, but the "validity" condition is richer โ€” the window must contain all required characters with sufficient frequency.

The trick is to track how many distinct character requirements are currently satisfied. When formed == required, the window is valid and you try to shrink.

from collections import Counter

def min_window(s: str, t: str) -> str:
    if not t or not s:
        return ""

    need = Counter(t)          # characters and counts required
    required = len(need)       # number of DISTINCT chars needed
    formed = 0                 # how many distinct chars are satisfied
    window = {}
    left = 0
    ans = float("inf"), 0, 0   # (length, left, right)

    for right in range(len(s)):
        c = s[right]
        window[c] = window.get(c, 0) + 1

        # Check if adding c just satisfied its requirement
        if c in need and window[c] == need[c]:
            formed += 1

        # Try to shrink from the left while the window is valid
        while formed == required and left <= right:
            # Record if this is the smallest valid window so far
            if right - left + 1 < ans[0]:
                ans = (right - left + 1, left, right)

            # Remove s[left] from window and shrink
            left_c = s[left]
            window[left_c] -= 1
            if left_c in need and window[left_c] < need[left_c]:
                formed -= 1   # window is now invalid again
            left += 1

    return "" if ans[0] == float("inf") else s[ans[1]: ans[2] + 1]

๐Ÿ’ก Mental Model: Think of formed as a progress bar. You expand right until the bar hits 100%, then squeeze left as hard as you can while keeping it at 100%, recording the window size. The moment squeezing drops the bar below 100%, you stop shrinking and expand right again.


The Two-Pointer Pattern

Where the sliding window uses two pointers to define a window moving in one direction, the two-pointer pattern is broader: two indices move through the data, sometimes from opposite ends toward the middle, sometimes at different speeds. This pattern is especially powerful for problems involving symmetry, sorted structures, or in-place transformations of character arrays.

๐Ÿง  Mnemonic: Think of two pointers as two people walking toward each other in a hallway. They stop and interact whenever they meet a condition, then continue walking.

Palindrome Checking

The classic two-pointer application: place one pointer at the start and one at the end. If the characters match, move both inward. If they ever differ, it is not a palindrome.

"racecar"
 r a c e c a r
 โ†‘           โ†‘   r == r โœ“, move inward
   โ†‘       โ†‘     a == a โœ“, move inward
     โ†‘   โ†‘       c == c โœ“, move inward
       โ†‘         pointers crossed โ†’ palindrome!

This extends naturally to Valid Palindrome II (LeetCode 680), where you are allowed to delete one character. When a mismatch is found at positions left and right, you check both sub-cases: skip left and check s[left+1..right], or skip right and check s[left..right-1]. If either is a palindrome, the answer is true.

Reversing Words In-Place

Two pointers shine for in-place manipulation of character arrays. To reverse words in a sentence (LeetCode 151 style), the two-phase approach is elegant:

  1. Reverse the entire array using two pointers from ends inward.
  2. Reverse each word individually by finding word boundaries with two pointers scanning forward.
"the sky is blue"
Step 1 โ†’ reverse whole: "eulb si yks eht"
Step 2 โ†’ reverse each word: "blue is sky the"

This achieves O(n) time and O(1) extra space โ€” a common interview requirement when the input is a mutable character array.

Two Pointers on Sorted / Partitioned Data

When characters are sorted (or you can sort without penalty), two pointers moving toward each other solve many problems optimally. For example, given a sorted array of characters and a target pair sum (treating characters as their ASCII values), a left pointer starting at index 0 and a right pointer starting at the end will find a pair in O(n) versus O(nยฒ) for the naive approach.

๐Ÿ’ก Pro Tip: Two-pointer problems often have the phrase "in-place", "without extra space", or involve a sorted structure. Palindrome problems almost always use two pointers. If you see both of those signals, reach for two pointers before anything else.

โš ๏ธ Common Mistake: Forgetting to handle the case where both pointers land on characters to skip (spaces, punctuation) simultaneously. Always advance the pointer past all invalid characters before comparing, not just one at a time.


Hashing for String Problems

Hashing โ€” storing data in a dictionary or array indexed by character โ€” transforms frequency counting from O(nยฒ) to O(n) and makes equality comparisons between character distributions O(1) after a one-time O(n) build. This unlocks an entire class of anagram, grouping, and frequency-based problems.

๐ŸŽฏ Key Principle: Two strings are anagrams of each other if and only if their character frequency maps are identical. You never need to sort to detect anagrams.

Building a Frequency Map

For ASCII lowercase strings, a length-26 integer array is often faster in practice than a dictionary, because array access has no hashing overhead. The mapping is ord(c) - ord('a') to convert a character to an index.

def build_freq(s: str) -> list:
    freq = [0] * 26
    for c in s:
        freq[ord(c) - ord('a')] += 1
    return freq

Comparing two such arrays is O(26) = O(1). This is the foundation for both LeetCode 242 and 438.

LeetCode 242 โ€” Valid Anagram

Check whether two strings s and t are anagrams. Build frequency maps for both and compare.

def is_anagram(s: str, t: str) -> bool:
    if len(s) != len(t):
        return False

    freq = [0] * 26

    # Increment for s, decrement for t in a single pass
    for cs, ct in zip(s, t):
        freq[ord(cs) - ord('a')] += 1
        freq[ord(ct) - ord('a')] -= 1

    # All counts must be zero if they are anagrams
    return all(count == 0 for count in freq)

This elegant one-pass approach avoids building two separate arrays. If the net effect of every character is zero, the strings are anagrams. Time: O(n). Space: O(1) (fixed 26-element array).

LeetCode 438 โ€” Find All Anagrams in a String

This problem combines hashing with a fixed-size sliding window: slide a window of length len(p) across s and check at each position whether the window is an anagram of p.

The naรฏve approach rebuilds the frequency map from scratch at each window position โ€” O(n ร— k). The optimized approach updates the frequency map incrementally as the window slides: remove the outgoing character and add the incoming character. The comparison remains O(26) = O(1).

from collections import Counter

def find_anagrams(s: str, p: str) -> list:
    result = []
    if len(p) > len(s):
        return result

    p_count = Counter(p)
    window_count = Counter(s[:len(p)])  # initialize first window

    if window_count == p_count:
        result.append(0)

    for i in range(len(p), len(s)):
        # Add the incoming character on the right
        incoming = s[i]
        window_count[incoming] += 1

        # Remove the outgoing character on the left
        outgoing = s[i - len(p)]
        window_count[outgoing] -= 1
        if window_count[outgoing] == 0:
            del window_count[outgoing]  # keep Counter clean for == comparison

        # Check if current window is an anagram
        if window_count == p_count:
            result.append(i - len(p) + 1)

    return result

## Example: s="cbaebabacd", p="abc" โ†’ [0, 6]
## Window [cba] == {a:1,b:1,c:1} โœ“  โ†’ index 0
## Window [bae] โœ—
## Window [aeb] โœ—
## Window [eba] โœ—
## Window [bab] โœ—
## Window [aba] โœ—
## Window [bac] == {a:1,b:1,c:1} โœ“  โ†’ index 6

๐Ÿค” Did you know? Comparing two Python Counter objects with == is O(k) where k is the number of distinct keys โ€” for a 26-character alphabet, this is effectively O(1). But for Unicode strings with potentially thousands of distinct characters, you may want a different validity check, such as tracking the number of characters that are "in balance."

Grouping Anagrams (LeetCode 49)

Hashing takes a different form here: you need a canonical key that all anagrams share. Sorting each word gives "eat" โ†’ "aet", "tea" โ†’ "aet" โ€” the same key. Alternatively, use the tuple of a 26-element frequency array as the dictionary key, which avoids the O(k log k) sort cost.

Input:  ["eat","tea","tan","ate","nat","bat"]
Keys:    "aet" "aet" "ant" "aet" "ant" "abt"
Groups:  {"aet": ["eat","tea","ate"],
          "ant": ["tan","nat"],
          "abt": ["bat"]}

Choosing the Right Pattern

Knowing the patterns is only half the battle. The other half is pattern recognition โ€” reading a problem statement and immediately knowing which tool to reach for.

๐Ÿ“‹ Quick Reference Card: Pattern Selection Guide

๐Ÿ” Problem Signal ๐ŸŽฏ Pattern to Reach For โšก Why It Works
๐Ÿ”ง "Longest/shortest substring with constraint" Sliding Window (dynamic) Window expands/shrinks based on validity
๐Ÿ”ง "All substrings of length k" Sliding Window (fixed) Constant-size window slides across
๐Ÿ”’ "Is a palindrome", "reverse in place" Two Pointers Symmetry exploited from both ends
๐Ÿ“š "Is anagram", "contains all chars of t" Hashing Frequency map comparison
๐Ÿง  "Find all anagram positions" Sliding Window + Hashing Fixed window with frequency tracking
๐ŸŽฏ "No extra space allowed" + symmetry Two Pointers In-place with O(1) space

๐Ÿ’ก Pro Tip: Scan the problem for these keyword clusters before writing any code:

  • ๐Ÿง  "substring", "subarray", "contiguous" โ†’ Almost always sliding window
  • ๐Ÿ“š "anagram", "permutation", "contains all characters" โ†’ Hashing, possibly combined with window
  • ๐Ÿ”ง "palindrome", "reverse", "in-place" โ†’ Two pointers
  • ๐ŸŽฏ "minimum window", "longest without" โ†’ Dynamic sliding window with a shrink condition
Layering Patterns Together

Many medium and hard problems require combining two patterns. LeetCode 76 (Minimum Window Substring) uses a dynamic sliding window for the outer loop and hashing to track character frequencies inside the window. LeetCode 438 uses a fixed sliding window with hash map comparison at each step. Recognizing the layers separately โ€” "I need a window here, and I need frequency tracking here" โ€” is what separates fluent problem-solvers from those who stare at a blank editor.

โŒ Wrong thinking: "This problem uses hashing, so I just need a Counter and I am done."

โœ… Correct thinking: "This problem needs to scan substrings (sliding window) AND check character frequencies inside each window (hashing). I will use a window to select candidates and a frequency map to validate them."


Putting It All Together

These three patterns โ€” sliding window, two pointers, and hashing โ€” form a toolkit that covers an enormous swath of LeetCode string problems. The sliding window gives you an efficient way to scan contiguous substrings without redundant work. Two pointers let you exploit symmetry and sort order for in-place transformations. Hashing compresses the expensive operation of comparing character sets into a constant-time lookup.

The workflow for every new string problem should now look like this:

1. Read the problem โ†’ identify keywords
2. Is "contiguous" or "substring" involved?       โ†’ Consider sliding window
3. Is symmetry, reversal, or in-place involved?  โ†’ Consider two pointers
4. Is frequency, anagram, or grouping involved?  โ†’ Consider hashing
5. Do I need BOTH a window AND frequency?        โ†’ Combine them
6. Code the pattern โ†’ test on edge cases

In the next section, we will move beyond these foundational patterns into advanced techniques โ€” KMP for pattern matching, Tries for prefix problems, and dynamic programming for questions that require remembering decisions across subproblems. But those advanced tools build on top of what you have learned here. A solid sliding window and a well-built frequency map will carry you further in string interviews than any advanced algorithm applied without understanding.

Advanced Techniques: KMP, Tries, and Dynamic Programming on Strings

You've already mastered sliding windows, two pointers, and hashing. Now it's time to climb to the summit. The algorithms in this section are the ones that separate candidates who clear hard LeetCode problems from those who stall. They require deeper upfront investmentโ€”you'll need to understand why they work, not just how to code themโ€”but once internalized, they unlock entire families of problems that would otherwise be intractable.

Let's build each technique from first principles, walk through canonical LeetCode problems, and establish the mental models that make these algorithms feel inevitable rather than magical.


The Knuth-Morris-Pratt (KMP) Algorithm

The naive approach to pattern matching in a text checks every possible starting position and compares character by characterโ€”giving O(n ร— m) time in the worst case. KMP achieves O(n + m) by never re-examining a character it has already processed. The key insight is deceptively simple: when a mismatch occurs, the pattern itself contains information about where to resume matching.

Building the Failure Function

The heart of KMP is the failure function (also called the partial match table or lps arrayโ€”Longest Proper Prefix which is also a Suffix). For each position i in the pattern, lps[i] stores the length of the longest proper prefix of pattern[0..i] that is also a suffix of that same substring.

Consider the pattern "ABABC":

Index:    0  1  2  3  4
Pattern:  A  B  A  B  C
lps:      0  0  1  2  0

At index 2, "ABA" has "A" as both a proper prefix and a suffix โ†’ lps[2] = 1. At index 3, "ABAB" has "AB" as both prefix and suffix โ†’ lps[3] = 2. At index 4, "ABABC" has no such match โ†’ lps[4] = 0.

When a mismatch happens at position j in the pattern, instead of restarting from the beginning of the pattern, you jump to lps[j-1]. This uses the knowledge that the text characters you already matched form a valid prefix of the pattern, so you resume from the longest prefix that is also a suffix of the matched portion.

## LeetCode 28: Find the Index of the First Occurrence in a String
## KMP solution: O(n + m) time, O(m) space

def strStr(haystack: str, needle: str) -> int:
    if not needle:
        return 0
    
    n, m = len(haystack), len(needle)
    
    # Step 1: Build the lps (failure function) array
    lps = [0] * m
    length = 0  # length of previous longest prefix-suffix
    i = 1
    
    while i < m:
        if needle[i] == needle[length]:
            length += 1
            lps[i] = length
            i += 1
        else:
            if length != 0:
                # Don't increment i; fall back using lps
                length = lps[length - 1]
            else:
                lps[i] = 0
                i += 1
    
    # Step 2: Use lps to search in haystack
    i = 0  # index for haystack
    j = 0  # index for needle
    
    while i < n:
        if haystack[i] == needle[j]:
            i += 1
            j += 1
        
        if j == m:  # Full match found
            return i - j
        elif i < n and haystack[i] != needle[j]:
            if j != 0:
                j = lps[j - 1]  # Key KMP move: don't reset to 0
            else:
                i += 1
    
    return -1

The building phase iterates through the pattern once (O(m)), and the search phase iterates through the text once (O(n)). The j = lps[j-1] line is where the magic happensโ€”it's the only departure from brute force, and it's what eliminates redundant comparisons.

โš ๏ธ Common Mistake: Students often confuse the two-pointer logic in the lps-building phase. When needle[i] != needle[length] and length != 0, you do not increment i. You fall back to lps[length - 1]. Incrementing i here is the most frequent KMP bug.

๐Ÿ’ก Mental Model: Think of the lps array as a "rewind guide." After a mismatch, instead of rewinding the entire pattern to position 0, you rewind only as far as the lps table says you mustโ€”because everything before that point was already proven to match.


Tries: The Prefix Tree

A Trie (pronounced "try," from retrieval) is a tree data structure where each node represents a single character, and paths from root to nodes represent prefixes of stored strings. Unlike a hash map which gives you O(1) average lookup for a whole key, a Trie gives you O(L) lookup where L is the length of the wordโ€”and crucially, it lets you efficiently answer prefix queries that hash maps cannot.

Trie storing ["apple", "app", "apt", "bat"]:

         root
        /    \
       a      b
       |      |
       p      a
      / \     |
     p   t    t
     |   |    |
     l  [*]  [*]
     |
     e
     |
    [*]
    
[*] = end-of-word marker
Note: "app" is marked at the second 'p' node
When to Use a Trie

Reach for a Trie when your problem involves:

  • ๐Ÿ”ง Prefix matching ("does any word start with 'pre'?")
  • ๐Ÿ”ง Autocomplete or word suggestion
  • ๐Ÿ”ง Searching a dictionary for exact words or prefix words
  • ๐Ÿ”ง Replacing multiple hash lookups with a single traversal path

๐ŸŽฏ Key Principle: If you find yourself storing many strings and repeatedly asking prefix-related questions, a Trie will almost certainly be faster and more elegant than a set or hash map.

## LeetCode 208: Implement Trie (Prefix Tree)
## Also foundational for LeetCode 212: Word Search II

class TrieNode:
    def __init__(self):
        self.children = {}       # char -> TrieNode
        self.is_end_of_word = False

class Trie:
    def __init__(self):
        self.root = TrieNode()
    
    def insert(self, word: str) -> None:
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end_of_word = True
    
    def search(self, word: str) -> bool:
        """Returns True only if the exact word exists."""
        node = self._traverse(word)
        return node is not None and node.is_end_of_word
    
    def startsWith(self, prefix: str) -> bool:
        """Returns True if any stored word starts with this prefix."""
        return self._traverse(prefix) is not None
    
    def _traverse(self, prefix: str):
        """Helper: walk the trie along prefix, return final node or None."""
        node = self.root
        for char in prefix:
            if char not in node.children:
                return None
            node = node.children[char]
        return node

For LeetCode 212 (Word Search II)โ€”find all words from a dictionary that exist in a 2D boardโ€”a Trie is the optimal solution. You insert all dictionary words into the Trie, then run DFS on the board. The Trie lets you prune branches early: if the current path doesn't match any prefix in the Trie, you stop immediately rather than continuing the DFS. This transforms what would be an O(words ร— 4^L) brute force into something far more tractable.

๐Ÿ’ก Pro Tip: Use an array of size 26 (children = [None] * 26) instead of a dictionary for the children when you know the character set is limited to lowercase English letters. Array indexing (ord(c) - ord('a')) is faster than dictionary hashing in practice.

๐Ÿค” Did you know? The Trie was invented in 1959 by Edward Fredkin, who wanted a data structure for dictionary retrieval. The name comes from the middle of "retrieval"โ€”which is why some people insist it should rhyme with "tree," while others say "try." Both camps exist to this day.


Dynamic Programming on Strings

String DP problems typically involve comparing two strings or analyzing a single string's internal structure. The defining characteristic is that the solution to a larger subproblem depends on solutions to smaller subproblems involving shorter substrings. We build answers in a 2D table and fill it systematically.

Edit Distance (LeetCode 72)

Edit Distance (also called Levenshtein distance) asks: what is the minimum number of insertions, deletions, or substitutions needed to transform string word1 into word2?

Define dp[i][j] as the edit distance between the first i characters of word1 and first j characters of word2. The recurrence is:

If word1[i-1] == word2[j-1]:
    dp[i][j] = dp[i-1][j-1]          # characters match, no operation needed
Else:
    dp[i][j] = 1 + min(
        dp[i-1][j],    # delete from word1
        dp[i][j-1],    # insert into word1
        dp[i-1][j-1]   # substitute
    )
## LeetCode 72: Edit Distance
## O(m * n) time and space

def minDistance(word1: str, word2: str) -> int:
    m, n = len(word1), len(word2)
    
    # dp[i][j] = edit distance between word1[:i] and word2[:j]
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    # Base cases: transforming to/from empty string
    for i in range(m + 1):
        dp[i][0] = i  # delete all characters of word1
    for j in range(n + 1):
        dp[0][j] = j  # insert all characters of word2
    
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if word1[i-1] == word2[j-1]:
                dp[i][j] = dp[i-1][j-1]  # free move
            else:
                dp[i][j] = 1 + min(
                    dp[i-1][j],    # delete
                    dp[i][j-1],    # insert
                    dp[i-1][j-1]   # replace
                )
    
    return dp[m][n]

Visualize the table for word1 = "horse", word2 = "ros":

    ""  r  o  s
""   0  1  2  3
h    1  1  2  3
o    2  2  1  2
r    3  2  2  2
s    4  3  3  2
e    5  4  4  3

Answer: dp[5][3] = 3
Longest Common Subsequence (LeetCode 1143)

LCS finds the length of the longest subsequence present in both strings. Unlike substrings, subsequences don't need to be contiguous. dp[i][j] is the LCS of text1[:i] and text2[:j].

The recurrence follows the same structural pattern as Edit Distance:

  • If characters match: dp[i][j] = dp[i-1][j-1] + 1
  • Otherwise: dp[i][j] = max(dp[i-1][j], dp[i][j-1])

๐Ÿ’ก Mental Model: LCS and Edit Distance share the same DP table skeleton. When characters match, you extend the diagonal. When they don't, you take the best of your neighbors. The difference is only in what you're measuringโ€”length of commonality versus cost of transformation.

Palindromic Substrings (LeetCode 647)

For palindromic substrings, we define dp[i][j] as a boolean: is s[i..j] a palindrome? The recurrence:

  • dp[i][i] = True (every single character)
  • dp[i][i+1] = (s[i] == s[i+1])
  • dp[i][j] = (s[i] == s[j]) and dp[i+1][j-1]

โš ๏ธ Common Mistake: You must fill the DP table in order of increasing substring length, not row by row. If you fill row by row, you'll read dp[i+1][j-1] before it's computed.

๐Ÿง  Mnemonic: For string DP, think "match and shrink"โ€”if the outer characters match, shrink to the inner problem. If they don't, split into independent subproblems.


Rabin-Karp Rolling Hash

The Rabin-Karp algorithm uses hashing to find pattern occurrences in text. Its real power is the rolling hash: when you slide a window one position forward, you don't recompute the entire hash. You subtract the contribution of the outgoing character and add the incoming oneโ€”an O(1) update.

The hash of a string s of length m is typically computed as:

hash = (s[0] * p^(m-1) + s[1] * p^(m-2) + ... + s[m-1] * p^0) mod MOD

When sliding forward, the new hash is:

new_hash = (old_hash - s[i] * p^(m-1)) * p + s[i+m]

For LeetCode 187 (Repeated DNA Sequences), you need to find all 10-character DNA substrings that appear more than once. A rolling hash over a 10-character window is ideal:

## LeetCode 187: Repeated DNA Sequences
## Rolling hash approach: O(n) time

def findRepeatedDnaSequences(s: str) -> list:
    if len(s) <= 10:
        return []
    
    # Map DNA characters to digits for hashing
    char_map = {'A': 0, 'C': 1, 'G': 2, 'T': 3}
    BASE = 4      # 4 possible characters
    WINDOW = 10
    MOD = 2**31 - 1
    
    seen = set()
    result = set()
    
    # Compute hash of first window
    current_hash = 0
    power = BASE ** (WINDOW - 1)  # precompute highest power
    
    for i in range(WINDOW):
        current_hash = (current_hash * BASE + char_map[s[i]]) % MOD
    seen.add(current_hash)
    
    # Slide the window
    for i in range(1, len(s) - WINDOW + 1):
        # Remove leftmost character, add new rightmost
        outgoing = char_map[s[i - 1]]
        incoming = char_map[s[i + WINDOW - 1]]
        current_hash = (current_hash - outgoing * power) % MOD
        current_hash = (current_hash * BASE + incoming) % MOD
        
        if current_hash in seen:
            result.add(s[i:i + WINDOW])  # verify actual string for hash collisions
        seen.add(current_hash)
    
    return list(result)

โš ๏ธ Common Mistake: Rolling hashes can have collisionsโ€”two different strings producing the same hash. Always verify actual string equality when a hash match is found, especially in correctness-critical problems. The code above confirms with the actual substring before adding to results.

๐Ÿ’ก Pro Tip: Use a large prime for MOD (like 2^31 - 1) and a prime base (like 31 or 131) to minimize collision probability. For extra safety on hard problems, use double hashingโ€”two independent hash functions where both must match.


Manacher's Algorithm: O(n) Palindrome Detection

Manacher's algorithm finds the longest palindromic substringโ€”and counts all palindromic substringsโ€”in true O(n) time. The naive expand-around-center approach is O(nยฒ); Manacher reuses previously computed palindrome radii to skip redundant comparisons.

The core idea: maintain a "rightmost palindrome" boundary. For each new center, if it falls within the current rightmost palindrome, its mirror center (on the other side) gives a free lower bound on its own palindrome radius.

Transformed string with separators: "#a#b#a#a#"
Palinrome radii P:                   0 1 0 3 0 1 0 1 0

Center at index 3 (original 'b'):
- Rightmost palindrome spans far right
- When computing index 5's radius,
  mirror at index 1 gives P[5] >= P[1] = 1
- Then try to expand further

The separator trick (inserting # between every character and at boundaries) unifies odd and even length palindromes into a single caseโ€”every palindrome in the transformed string has odd length.

๐ŸŽฏ Key Principle: Manacher is the right choice only when you explicitly need O(n) palindrome detectionโ€”usually in problems where O(nยฒ) would time out. For most LeetCode mediums, the expand-around-center O(nยฒ) approach with careful implementation is simpler, faster to write in an interview, and sufficient. Reserve Manacher for hard problems with very large inputs or tight time limits.

๐Ÿ“‹ Quick Reference Card: Algorithm Selection Guide

๐ŸŽฏ Problem Type ๐Ÿ”ง Best Algorithm โšก Time Complexity
๐Ÿ”’ Single pattern in text KMP O(n + m)
๐Ÿ”’ Prefix queries / autocomplete Trie O(L) per query
๐Ÿ”’ Transform one string to another Edit Distance DP O(n ร— m)
๐Ÿ”’ Longest shared subsequence LCS DP O(n ร— m)
๐Ÿ”’ Count/find palindromic substrings Expand-center or Manacher O(nยฒ) or O(n)
๐Ÿ”’ Duplicate substrings of fixed length Rolling Hash O(n)
๐Ÿ”’ Multiple patterns in text Trie + DFS O(n ร— L)

Connecting the Techniques

These algorithms aren't isolated toolsโ€”they combine powerfully. Word Search II (LeetCode 212) marries a Trie with DFS. Some competitive programming problems layer rolling hashing with binary search to find the longest repeated substring in O(n log n). Recognizing which tool fits which problem shape is the real skill:

  • ๐Ÿง  Linear scan, fixed pattern โ†’ KMP
  • ๐Ÿ“š Many words, prefix structure โ†’ Trie
  • ๐Ÿ”ง Two strings, transformation/overlap โ†’ DP
  • ๐ŸŽฏ Fixed-length windows, duplicates โ†’ Rolling Hash
  • ๐Ÿ”’ Palindrome structure, O(n) required โ†’ Manacher

โŒ Wrong thinking: "I'll just use a hash set and check all O(nยฒ) substrings." This works on small inputs but collapses on strings of length 10โต.

โœ… Correct thinking: Identify the structural property the problem is exploitingโ€”prefix overlap, repeated structure, optimal substructureโ€”and match it to the algorithm designed for exactly that property.

Mastering this section means you've crossed the threshold from someone who solves string problems to someone who engineers string solutions. The algorithms here handle the cases where raw cleverness runs out and principled technique must take over. Practice KMP until the failure function feels automatic, build a Trie from memory, and internalize the DP recurrences until they're reflexes.

Common Pitfalls, Edge Cases, and Debugging String Solutions

Even experienced engineers lose points on string problems โ€” not because they don't know the algorithm, but because a misplaced index or an unconsidered edge case silently corrupts their output. String algorithms sit at an unusual intersection: the logic feels simple on paper, yet the implementation hides an astonishing number of traps. This section is your field guide to the bugs that interviewers see most often, why they happen, and exactly how to prevent them.

The Off-by-One Error: String Algorithm's Most Loyal Enemy

Off-by-one errors (OBOEs) are the single most common source of wrong answers in string problems. They appear in three recurring situations: window boundary calculations in sliding window problems, substring extraction indices, and pointer positions in two-pointer traversals. The insidious part is that OBOEs often pass most test cases but fail on boundary inputs โ€” exactly the inputs interviewers use to probe correctness.

Consider a sliding window of fixed size k over a string s. The window should start at index 0 and the last valid starting position is len(s) - k, not len(s) - k + 1 and not len(s) - k - 1.

String:  a  b  c  d  e
Index:   0  1  2  3  4     len = 5, k = 3

Valid windows:
  [a b c]  start=0, end=2
  [b c d]  start=1, end=3
  [c d e]  start=2, end=4  <-- last valid start = len - k = 5 - 3 = 2

If loop runs while start <= len - k:  โœ… catches start=2
If loop runs while start <  len - k:  โŒ misses  start=2
If loop runs while start <= len - k - 1: โŒ same wrong result

For substring extraction, Python's s[i:j] is exclusive on the right, meaning s[i:j] gives characters from index i up to but not including index j. Forgetting this causes you to either drop the last character of a window or include one character too many.

A Systematic Off-by-One Checklist

Before submitting any solution, run through this mental checklist:

  • ๐Ÿ”ง Loop termination: Does your loop use < len or <= len - 1? Both are equivalent, but mixing <= len without adjustment is fatal.
  • ๐Ÿ”ง Window end index: When you record a result using s[left:right+1] vs s[left:right], confirm which one you intended and verify it with a 3-character example.
  • ๐Ÿ”ง Two-pointer collision: When left and right pointers meet, does your logic handle left == right as a valid state or an exit condition?
  • ๐Ÿ”ง Result slicing: After any loop, does your final answer slice match the indices you've been maintaining?

๐Ÿ’ก Pro Tip: Always trace through a minimal example โ€” a string of exactly k characters for a size-k window โ€” before testing on large inputs. If your loop runs exactly once on a string of length k, your boundaries are likely correct.

Forgotten Edge Cases: The Inputs Nobody Tests Until Interview Day

Human beings test the happy path. Interviewers test the edges. There is a predictable set of edge cases that trip up candidates on string problems, and memorizing them is a straightforward way to avoid losing points.

Empty string inputs are the most frequently forgotten case. A function that counts character frequencies, finds the longest palindrome, or performs a sliding window search must not crash or return nonsensical results when given "". In Python, len("") is 0 and iterating over an empty string is safe, but indexing into it (s[0]) throws an IndexError. Always add an explicit guard:

def longest_unique_substring(s: str) -> int:
    # Edge case: empty or single character โ€” answer is trivially len(s)
    if len(s) <= 1:
        return len(s)
    
    char_index = {}
    left = 0
    best = 0
    
    for right, ch in enumerate(s):
        if ch in char_index and char_index[ch] >= left:
            left = char_index[ch] + 1
        char_index[ch] = right
        best = max(best, right - left + 1)
    
    return best

Notice the guard at the top: it handles "" (returns 0) and "a" (returns 1) without ever touching the main logic. This pattern โ€” early return for degenerate inputs โ€” is one of the clearest signals to an interviewer that you think defensively.

Single-character strings deserve special attention in problems involving comparison, like palindrome checks or anagram detection. A single character is always a palindrome and is always an anagram of itself. If your algorithm compares s[i] with s[j] and assumes i != j, it may produce incorrect results.

All-identical-character strings like "aaaaaaa" expose bugs in algorithms that rely on character changes to update state. A sliding window that moves left forward only when a duplicate is newly added but never checks the "all same" scenario can get stuck in an infinite loop or return a window larger than the string itself.

Strings with spaces and special characters are the overlooked cousin of all-identical strings. Consider a word-counting function: splitting "hello world" on spaces gives ['hello', 'world'], but " hello world " (leading/trailing spaces) gives ['', 'hello', '', 'world', ''] in a naive split, polluting your word list with empty strings. Python's str.split() with no arguments handles this correctly by splitting on any whitespace and discarding empties; str.split(' ') does not.

โš ๏ธ Common Mistake: Using s.split(' ') when the input may have multiple consecutive spaces or leading/trailing whitespace. Always use s.split() (no argument) unless you have a specific reason to split on exactly one space.

๐ŸŽฏ Key Principle: Before writing a single line of logic, write out at least five test cases: the empty string, a single character, all-identical characters, a string with spaces, and the normal happy-path input. This habit takes 60 seconds and regularly saves 20 minutes of debugging.

The Hidden O(nยฒ) Trap: String Concatenation Inside Loops

One of the most performance-destroying mistakes in string algorithm implementations is string concatenation inside a loop. In Python and Java, strings are immutable โ€” every time you write result += character, the runtime allocates a brand-new string, copies all existing characters into it, and appends the new one. Do this inside a loop of length n, and you have secretly written an O(nยฒ) algorithm masquerading as O(n) code.

Naive concatenation for n = 5:

Iteration 1: result = ""    + "a"  โ†’ allocates 1  char  (1 copy)
Iteration 2: result = "a"   + "b"  โ†’ allocates 2  chars (2 copies)
Iteration 3: result = "ab"  + "c"  โ†’ allocates 3  chars (3 copies)
Iteration 4: result = "abc" + "d"  โ†’ allocates 4  chars (4 copies)
Iteration 5: result = "abcd"+ "e" โ†’ allocates 5  chars (5 copies)

Total copies: 1+2+3+4+5 = 15 = n(n+1)/2  โ†’  O(nยฒ)

For small strings this is invisible. For a 10,000-character input, it's a 50-million-operation difference. The fix is always the same: collect characters in a list, then join once at the end.

## โŒ Wrong: O(nยฒ) due to repeated string concatenation
def reverse_words_slow(sentence: str) -> str:
    words = sentence.split()
    result = ""
    for word in reversed(words):
        result += word + " "   # New string allocated every iteration
    return result.strip()


## โœ… Correct: O(n) using list accumulation + single join
def reverse_words_fast(sentence: str) -> str:
    words = sentence.split()
    parts = []                         # O(1) append each time
    for word in reversed(words):
        parts.append(word)             # No string allocation
    return " ".join(parts)             # Single allocation at the end


## โœ… Even better: Pythonic one-liner
def reverse_words_pythonic(sentence: str) -> str:
    return " ".join(reversed(sentence.split()))

In Java, the equivalent fix is to use StringBuilder instead of the + operator:

// โŒ Wrong: O(nยฒ) โ€” each + creates a new String object
String buildResult(char[] chars) {
    String result = "";
    for (char c : chars) {
        result += c;  // Hidden allocation on every iteration
    }
    return result;
}

// โœ… Correct: O(n) โ€” StringBuilder amortizes allocation
String buildResult(char[] chars) {
    StringBuilder sb = new StringBuilder();
    for (char c : chars) {
        sb.append(c);  // Appends to internal buffer, no new String
    }
    return sb.toString();  // Single String created at the very end
}

๐Ÿค” Did you know? Python's CPython interpreter includes an optimization that sometimes makes += on strings behave like O(1) when the string has no other references. But this optimization is implementation-specific, not guaranteed by the language spec, and disappears the moment another variable holds a reference to the string. Never rely on it in interview code.

๐Ÿ’ก Mental Model: Think of string concatenation like rewriting a document from scratch every time you add a sentence. A list is your scratch pad โ€” cheap to add to. The join call is the single moment you print the finished document.

Case Sensitivity and Whitespace: The Silent Wrong-Answer Generators

Case sensitivity bugs are responsible for a disproportionate number of wrong answers on problems involving anagram detection, word frequency counting, and palindrome checking. The problem is almost never that the candidate doesn't know about .lower() โ€” it's that they forget to apply it consistently, or apply it in the wrong place.

Consider anagram detection: "Listen" and "Silent" are anagrams, but if you build a character frequency map from the raw input, 'L' and 'l' will be counted as different characters. The fix is to normalize the case of the entire input before any processing begins, not partway through.

โš ๏ธ Common Mistake: Normalizing case when building one frequency map but forgetting to normalize the comparison string. This creates an asymmetric bug that passes when both strings happen to be lowercase and fails otherwise.

Whitespace bugs are equally subtle. In problems that count words or detect anagram phrases, a space character ' ' is a valid character with its own ASCII value (32). If your anagram checker treats spaces as regular characters, "a bc" and "ab c" will appear to be anagrams of each other โ€” which may or may not be the intended behavior depending on the problem statement.

def are_anagrams(s1: str, s2: str) -> bool:
    """
    Robust anagram check that handles case and whitespace correctly.
    Assumes spaces should be ignored and comparison is case-insensitive.
    Always clarify these assumptions with your interviewer.
    """
    # Normalize: lowercase and strip spaces
    # Do this ONCE, at the top, before any logic
    clean1 = s1.lower().replace(" ", "")
    clean2 = s2.lower().replace(" ", "")
    
    # Early exit: if lengths differ after normalization, not anagrams
    if len(clean1) != len(clean2):
        return False
    
    # Count character frequencies
    from collections import Counter
    return Counter(clean1) == Counter(clean2)


## Test cases that expose common bugs:
print(are_anagrams("Listen", "Silent"))    # True  โ€” case normalization
print(are_anagrams("Astronomer", "Moon starer"))  # True  โ€” space removal
print(are_anagrams("", ""))                 # True  โ€” empty string edge case
print(are_anagrams("a", ""))               # False โ€” length mismatch

๐Ÿง  Mnemonic: CLAN โ€” Case, Length, Alphanumeric filter, Normalize whitespace. Check all four before writing your core logic.

Misreading Constraints: Unicode vs. ASCII vs. Lowercase-Only

One of the most consequential โ€” and most avoidable โ€” mistakes in string algorithm design is failing to read the constraints section of the problem carefully. The input character set defines your solution's design, its data structures, and even its space complexity. Ignoring it means you may build an unnecessarily complex solution, or worse, a solution that is silently incorrect.

LeetCode problems typically fall into three tiers of input constraint:

Constraint Example Problems Implication
๐Ÿ”’ Lowercase English only (aโ€“z) Most sliding window, anagram Fixed 26-slot frequency array
๐Ÿ”’ ASCII printable (0โ€“127) File path parsing, URL encoding Fixed 128-slot array
๐Ÿ”’ Full Unicode (any character) Internationalized text processing Must use a hash map

When the constraint says lowercase English letters only, you can replace a hash map with a fixed-size integer array of length 26. This is not just a micro-optimization โ€” it changes the space complexity from O(k) where k is the alphabet size to O(1) (since 26 is a constant), and it eliminates hash collision overhead entirely.

def char_frequency_array(s: str) -> list:
    """
    O(1) space frequency counter โ€” valid ONLY when s contains
    lowercase English letters. Using this with Unicode input
    will produce an IndexError or silently wrong results.
    """
    # freq[0] = count of 'a', freq[1] = count of 'b', ...
    freq = [0] * 26
    for ch in s:
        freq[ord(ch) - ord('a')] += 1  # Maps 'a'->0, 'b'->1, ..., 'z'->25
    return freq


def char_frequency_map(s: str) -> dict:
    """
    O(k) space frequency counter โ€” safe for any character set.
    Use this when constraints allow Unicode, uppercase, digits,
    or special characters.
    """
    from collections import defaultdict
    freq = defaultdict(int)
    for ch in s:
        freq[ch] += 1
    return dict(freq)

โŒ Wrong thinking: "I'll just use a dictionary everywhere โ€” it's simpler and always correct."

โœ… Correct thinking: "I'll use a dictionary by default, but when the problem guarantees lowercase-only input, I'll switch to a 26-element array for cleaner, faster code that signals constraint awareness to my interviewer."

๐ŸŽฏ Key Principle: The constraints section is not fine print โ€” it is the specification. Reading it carefully takes 30 seconds and can save you from designing a solution that is either over-engineered or wrong.

When Unicode Changes Everything

If the problem allows full Unicode, two subtle issues emerge that don't exist in ASCII-only problems:

  • ๐Ÿง  Composite characters: Some characters that look like a single glyph are actually two Unicode code points โ€” a base character plus a combining diacritic. In Python, len("รฉ") can be either 1 or 2 depending on whether the character uses a precomposed form (U+00E9) or a combining sequence (e + U+0301). This can make length comparisons silently incorrect.
  • ๐Ÿ“š Case folding vs. lowercasing: .lower() works correctly for English but behaves unexpectedly for some languages. For robust internationalized comparisons, Python's str.casefold() is the correct method โ€” it handles German รŸ, Turkish ฤฐ, and other language-specific case rules.

โš ๏ธ Common Mistake: Using .lower() instead of .casefold() when the problem mentions that input may come from multiple languages or character encodings. In an interview, simply mentioning this distinction demonstrates a level of depth that sets you apart.

Building a Pre-Submission Debug Ritual

The fastest way to internalize all of the above is to develop a pre-submission ritual โ€” a mental checklist you run in the final 60โ€“90 seconds before clicking submit. Here is a battle-tested version:

PRE-SUBMISSION STRING ALGORITHM CHECKLIST
==========================================

1. EDGE CASES
   [ ] Empty string input handled?
   [ ] Single character input handled?
   [ ] All-identical characters handled?
   [ ] Spaces / special chars in input?

2. INDEX SAFETY
   [ ] Loop bounds use < len or <= len-1 (not <= len)?
   [ ] Substring slices: exclusive right end accounted for?
   [ ] Two-pointer: does left == right need special handling?

3. COMPLEXITY
   [ ] Any string += inside a loop? โ†’ Replace with list + join
   [ ] Any substring creation O(k) inside an O(n) loop?

4. NORMALIZATION
   [ ] Case sensitivity: should input be lowercased first?
   [ ] Whitespace: should spaces be stripped or ignored?
   [ ] Applied normalization consistently to ALL inputs?

5. CONSTRAINTS
   [ ] Is input lowercase-only? โ†’ Can use 26-array
   [ ] Does input allow Unicode? โ†’ Must use hash map
   [ ] Does problem guarantee non-empty? โ†’ Remove guard or keep?

๐Ÿ’ก Real-World Example: In a live Google interview, a candidate implemented a perfect sliding window anagram solution but forgot to normalize case. The solution passed 38 of 40 test cases. The two failures were inputs with mixed-case characters โ€” exactly the kind of test case that appears in the final few hidden test cases. Running the pre-submission checklist at step 4 would have caught this in under 10 seconds.

๐Ÿ“‹ Quick Reference Card: Most Common String Bugs

โš™๏ธ Bug Type ๐Ÿ” Symptom โœ… Fix
๐Ÿ”’ Off-by-one in window Misses last window or crashes Use len(s) - k as final start
๐Ÿ”’ Empty string crash IndexError on first access Add if not s: return ... guard
๐Ÿ”’ O(nยฒ) concatenation Correct output but TLE Use list + ''.join()
๐Ÿ”’ Case mismatch Fails mixed-case inputs Normalize with .lower() upfront
๐Ÿ”’ Space in anagram Wrong frequency counts Strip or explicitly handle spaces
๐Ÿ”’ Wrong charset assumed Wrong answer on Unicode Use dict not 26-array

The engineers who consistently score well on string problems are not necessarily faster coders โ€” they are more systematic coders. They have internalized the failure modes so thoroughly that checking for them is automatic. With the checklist above and the patterns from this section, you now have the same systematic foundation.

Key Takeaways and Your String Algorithm Cheat Sheet

You have traveled a long road through this lesson. You started by understanding why string problems dominate coding interviews, built a solid foundation in character encodings and fundamental operations, internalized three powerhouse patterns (sliding window, two pointers, hashing), leveled up with KMP, tries, and DP, and sharpened your defensive coding instincts against the most treacherous edge cases. This final section is your permanent reference โ€” the distillation of everything into actionable frameworks you can pull up the night before an interview or the morning of.

The goal here is not to introduce new ideas but to crystallize the mental models you have already built so they fire instantly under pressure.


Pattern Decision Guide: The String Algorithm Flowchart

The single most valuable skill in a string interview is choosing the right pattern quickly. Interviewers notice when a candidate jumps to code without a strategy; they also notice when a candidate confidently names a pattern and explains why within the first 60 seconds. Use the flowchart below as your internal decision engine.

START: Read the problem carefully
         |
         v
  Does the problem involve a
  CONTIGUOUS substring or window?
    /           \
  YES            NO
   |              |
   v              v
 Is the window    Does it involve
 size FIXED?      PAIRS or
  /     \        REVERSAL logic?
 YES    NO         /        \
  |      |       YES         NO
  v      v        |           |
Fixed  Sliding   Two        Does it involve
Window  Window  Pointers    PREFIX queries
(hash   (expand/            or AUTOCOMPLETE?
 count)  shrink)             /         \
                           YES          NO
                            |            |
                            v            v
                          Trie      Does it involve
                                    OVERLAPPING
                                    SUBPROBLEMS?
                                     /       \
                                   YES        NO
                                    |          |
                                    v          v
                                   DP       Hash Map
                               (LCS, Edit  (anagram,
                                Distance)  frequency)

๐ŸŽฏ Key Principle: The keywords in a problem statement are signals, not noise. Train yourself to map words like "contiguous", "at most K distinct", "prefix", and "minimum edit" directly to a pattern before you read the constraints.

Here is a compressed keyword-to-pattern mapping you can memorize:

Problem Keywords Likely Pattern
"longest substring without repeating" Sliding Window + Hash Set
"find all anagrams" Sliding Window + Frequency Map
"two strings, find if one is rotation" Hashing / KMP
"palindrome", "reverse" Two Pointers
"starts with", "autocomplete", "prefix" Trie
"minimum edit distance", "longest common subsequence" DP
"pattern in text (large scale)" KMP / Rabin-Karp
"count distinct substrings" Suffix Array / Trie

๐Ÿ’ก Pro Tip: When the problem mentions both a length constraint (like "at most K") and "contiguous substring", you are almost always looking at a sliding window that shrinks when the constraint is violated.


Complexity Cheat Sheet

Before you write a single line of code, you should be able to state the expected time and space complexity of your approach. This habit signals seniority to interviewers and catches O(nยฒ) traps before they become bugs.

๐Ÿ“‹ Quick Reference Card: String Pattern Complexity Benchmarks

๐Ÿ”ง Pattern โฑ๏ธ Time Complexity ๐Ÿ—„๏ธ Space Complexity ๐Ÿ“ Key Constraint
๐ŸชŸ Sliding Window (variable) O(n) O(k) โ€” k = alphabet size Single pass, two pointers
๐Ÿ” Two Pointers O(n) O(1) In-place, sorted or symmetric
๐Ÿ—บ๏ธ Hash Map / Frequency Count O(n) O(k) โ€” k = distinct chars k โ‰ค 26 for lowercase ASCII
๐Ÿ” KMP Pattern Match O(n + m) O(m) โ€” m = pattern length Preprocessing required
๐ŸŒณ Trie Insert/Search O(m) per op O(ALPHABET ร— n ร— m) Memory-heavy
๐Ÿงฎ DP (LCS / Edit Distance) O(n ร— m) O(n ร— m) or O(min(n,m)) 2D table, optimizable
๐Ÿ”‘ Rabin-Karp Rolling Hash O(n + m) avg O(1) Hash collision possible
๐Ÿ“Š Suffix Array O(n log n) O(n) Advanced; rarely in interviews

โš ๏ธ Common Mistake โ€” Mistake 1: Assuming O(26) space is O(1). Technically it is O(1) since the alphabet is fixed, but some interviewers expect you to name the constant explicitly. Say "O(1) space since the character set is bounded to 26 lowercase letters" to sound precise.

โš ๏ธ Common Mistake โ€” Mistake 2: Forgetting that string concatenation inside a loop is O(nยฒ) in languages like Python and Java where strings are immutable. Always use a StringBuilder (Java) or .join() (Python) pattern.

## โŒ Wrong โ€” O(nยฒ) due to repeated string allocation
result = ""
for char in chars:
    result += char  # Creates a new string every iteration!

## โœ… Correct โ€” O(n) with list buffer
buffer = []
for char in chars:
    buffer.append(char)
result = "".join(buffer)  # Single allocation at the end


Top 15 Must-Solve LeetCode String Problems

Not all practice is equal. The following 15 problems were selected because they each teach a transferable pattern โ€” solving one problem prepares you for a whole family of related questions. Work through them in order; each builds on the intuition from the last.

Easy โ€” Build Reflexes (Problems 1โ€“4)

๐ŸŸข 1. Valid Anagram (LC #242)

  • Pattern: Frequency Map / Sorting
  • What it teaches: The canonical frequency-counting template that appears inside dozens of harder problems.
  • Key insight: Two strings are anagrams iff their character frequency maps are identical.

๐ŸŸข 2. Valid Palindrome (LC #125)

  • Pattern: Two Pointers
  • What it teaches: Filtering non-alphanumeric characters, pointer convergence, case normalization.
  • Key insight: Ignore non-alphanumeric chars; compare left and right pointers moving inward.

๐ŸŸข 3. Longest Common Prefix (LC #14)

  • Pattern: Vertical scan / Sorting trick
  • What it teaches: How to use string comparison efficiently across an array of strings.
  • Key insight: Sort the array and compare only the first and last strings.

๐ŸŸข 4. Reverse Words in a String (LC #151)

  • Pattern: Two Pointers / String manipulation
  • What it teaches: In-place string manipulation, handling extra spaces โ€” a classic cleaning problem.
Medium โ€” Develop Pattern Fluency (Problems 5โ€“10)

๐ŸŸก 5. Longest Substring Without Repeating Characters (LC #3)

  • Pattern: Sliding Window + Hash Set
  • What it teaches: The gold-standard sliding window template. Solve this until it's automatic.

๐ŸŸก 6. Longest Palindromic Substring (LC #5)

  • Pattern: Expand Around Center (Two Pointers) or DP
  • What it teaches: Two fundamentally different approaches โ€” a great problem to implement both and compare.

๐ŸŸก 7. Find All Anagrams in a String (LC #438)

  • Pattern: Fixed-size Sliding Window + Frequency Map
  • What it teaches: The fixed-window variant, maintaining a running frequency delta.

๐ŸŸก 8. Group Anagrams (LC #49)

  • Pattern: Hash Map with sorted key
  • What it teaches: Using a canonical form as a hash key โ€” a technique that appears in many grouping problems.

๐ŸŸก 9. Implement Trie (Prefix Tree) (LC #208)

  • Pattern: Trie from scratch
  • What it teaches: Node design, insert, search, and startsWith โ€” the complete Trie interface.

๐ŸŸก 10. Decode Ways (LC #91)

  • Pattern: 1D DP on strings
  • What it teaches: How to frame string decoding as a DP problem; handles edge cases like '0' characters.
Hard โ€” Sharpen Elite Skills (Problems 11โ€“15)

๐Ÿ”ด 11. Minimum Window Substring (LC #76)

  • Pattern: Variable Sliding Window + Two Frequency Maps
  • What it teaches: The most complete sliding window template; handles "at least" constraints elegantly.

๐Ÿ”ด 12. Edit Distance (LC #72)

  • Pattern: 2D DP (Wagner-Fischer algorithm)
  • What it teaches: The canonical string DP problem. Every other string DP problem is a variation of this.

๐Ÿ”ด 13. Regular Expression Matching (LC #10)

  • Pattern: 2D DP with careful state transitions
  • What it teaches: Handling '*' and '.' wildcards through DP โ€” extremely difficult without the right framing.

๐Ÿ”ด 14. Word Search II (LC #212)

  • Pattern: Trie + DFS/Backtracking
  • What it teaches: Combining a Trie with grid traversal to prune the search space early.

๐Ÿ”ด 15. Shortest Palindrome (LC #214)

  • Pattern: KMP failure function
  • What it teaches: A real-world application of KMP beyond naive pattern matching โ€” builds true KMP intuition.

๐Ÿ’ก Real-World Example: At companies like Google and Meta, medium-difficulty string problems appear most frequently in phone screens, while hard problems show up in virtual onsites. Mastering problems 5โ€“10 will cover roughly 70% of what you encounter.



The Complete Sliding Window Template (Annotated)

Because the sliding window pattern solves so many of the must-solve problems above, here is the definitive reusable template with full annotation:

def sliding_window_template(s: str, t: str) -> str:
    """
    Template: Minimum window in s containing all chars of t.
    Adapts to: longest substring with at most K distinct chars,
    find all anagrams, longest without repeating, etc.
    """
    from collections import Counter

    # Step 1: Build the requirement map from the target
    need = Counter(t)       # What characters we still need
    have = {}               # What our current window has
    formed = 0              # Count of chars meeting their required frequency
    required = len(need)    # Total distinct chars we must satisfy

    # Step 2: Initialize window pointers and result tracker
    left = 0
    result = ""
    min_len = float('inf')

    # Step 3: Expand the right pointer across the entire string
    for right in range(len(s)):
        char = s[right]
        have[char] = have.get(char, 0) + 1

        # Check if this char now satisfies its requirement
        if char in need and have[char] == need[char]:
            formed += 1

        # Step 4: Shrink from the left when all requirements are met
        while formed == required and left <= right:
            window_len = right - left + 1
            if window_len < min_len:
                min_len = window_len
                result = s[left:right + 1]  # Record best valid window

            # Remove leftmost character from window
            left_char = s[left]
            have[left_char] -= 1
            if left_char in need and have[left_char] < need[left_char]:
                formed -= 1  # Window no longer satisfies this char
            left += 1

    return result

This template is your Swiss Army knife. To adapt it for "find all anagrams", change the shrink condition to trigger when right - left + 1 == len(t) instead of waiting for a valid window. To adapt it for "at most K distinct characters", replace need with a simple counter and trigger shrinking when len(have) > k.

๐Ÿง  Mnemonic: ERNS โ€” Expand Right, Note satisfaction, Shrink left โ€” the four-beat rhythm of every sliding window solution.


Interview Communication Tips

Technical correctness is necessary but not sufficient. How you talk through your solution separates strong candidates from exceptional ones. Here is an exact communication script for string problems.

Step 1 โ€” State Your Assumptions Out Loud

Before touching code, say these three things:

  1. Character set: "I'm assuming lowercase English letters only, so at most 26 distinct characters. Is that right?" This matters for space complexity claims.
  2. Mutability: "I'll treat the input as read-only and return a new string unless in-place manipulation is preferred."
  3. Null / empty inputs: "I'll handle empty strings as a valid edge case returning an empty result or 0, whichever makes semantic sense."

๐Ÿค” Did you know? Interviewers at top tech companies often deliberately under-specify problems to see if you ask clarifying questions. Assuming ASCII vs Unicode can change your space complexity from O(1) to O(n).

Step 2 โ€” Name the Pattern Before Coding

โŒ Wrong thinking: "Let me try iterating over the string and see what happens..."

โœ… Correct thinking: "The problem asks for a minimum contiguous window โ€” that's a classic variable sliding window with two frequency maps. My approach will be O(n) time and O(k) space where k is the character set size."

Naming the pattern and stating the complexity before coding demonstrates architectural thinking. It also gives you a contract to code against.

Step 3 โ€” Narrate the Invariant

For every loop you write, state its invariant aloud: "My invariant is that everything to the left of left has already been considered and rejected. Everything between left and right is my current candidate window." Interviewers love invariant-driven explanations because they reveal that you understand why your code works, not just that it works.

Step 4 โ€” Handle Follow-Up Optimizations Gracefully

Expect these follow-up questions and have ready answers:

  • "Can you do it in O(1) space?" โ†’ "For this specific problem, not without losing the frequency map, which is fundamental to correctness. However, if the alphabet is fixed at 26, we could use a fixed-size array instead of a hash map, which is O(1) space by convention."
  • "What if the string is a stream (you can't index backwards)?" โ†’ "A sliding window still works since we only move forward. I'd switch the have map to deque-based tracking."
  • "What if the input is a file too large for RAM?" โ†’ "That's a system design question. I'd use external sort or a streaming hash with Bloom filters for approximate membership."

๐Ÿ’ก Pro Tip: The best follow-up answer always starts with "Great question โ€” that changes the constraints in these specific ways..." It shows collaborative thinking rather than defensive posture.



Final Summary: What You Now Know

Let's anchor what changed between the start of this lesson and now. You entered knowing that strings are "just arrays of characters." You leave understanding that string problems are a rich taxonomy of patterns, each with precise time/space tradeoffs and characteristic failure modes.

๐Ÿ“‹ Quick Reference Card: The Complete String Algorithm Landscape

๐Ÿงฉ Concept ๐Ÿ”‘ What You Now Understand โš ๏ธ Critical Caveat
๐ŸชŸ Sliding Window Two-pointer expand/shrink on contiguous windows Shrink only when the invariant is violated
๐Ÿ” Two Pointers In-place symmetric traversal for palindromes/reversal Requires sorted or symmetric input structure
๐Ÿ—บ๏ธ Hashing Frequency maps as canonical string representations Hash collisions exist; Rabin-Karp needs verification
๐Ÿ” KMP Failure function avoids redundant character comparisons Build the LPS array correctly or the whole algorithm fails
๐ŸŒณ Trie Prefix-sharing data structure for O(m) lookups Memory cost is high; justify it with search frequency
๐Ÿงฎ String DP Overlapping subproblems on 2D character grids Base cases (empty string rows/cols) are the hardest part
๐Ÿ› Edge Cases Empty strings, single chars, Unicode, leading/trailing spaces These are not corner cases โ€” they are expected inputs

โš ๏ธ Final Critical Point 1: The most common reason candidates fail string problems in interviews is not lack of knowledge โ€” it is choosing a brute-force O(nยฒ) approach when an O(n) pattern exists. Every time you write a nested loop over a string, ask yourself: "Is there a sliding window or hash map that eliminates this inner loop?"

โš ๏ธ Final Critical Point 2: Unicode awareness is a proxy signal for production engineering experience. In Python 3, every string is Unicode by default; in Java, char is UTF-16; in C++, std::string is bytes. Knowing which encoding you're operating in changes your correctness assumptions for character comparison and index arithmetic.

โš ๏ธ Final Critical Point 3: Trie and DP solutions are rarely obvious on first reading. If you feel stuck, ask: "Do I see repeated prefix structure?" (โ†’ Trie) or "Do I see a recursive formula where the answer depends on smaller sub-answers?" (โ†’ DP). These two questions unlock the hardest 20% of string problems.


Next Steps on the Roadmap

String algorithms do not exist in isolation โ€” they are deeply connected to the next major topics you will encounter.

๐Ÿ”ง Connection to Array Problems

Every sliding window technique you learned transfers directly to array problems. "Maximum sum subarray of size K" is identical in structure to "longest substring with at most K distinct characters." The only difference is the element type. When you reach the array algorithms lesson, your string sliding window instincts will give you a head start.

๐Ÿ“š Connection to Dynamic Programming Deep Dive

String DP (LCS, Edit Distance, Palindromic Subsequences) is among the easiest DP to visualize because the two-dimensional table maps cleanly to two string indices. Mastering string DP first builds the mental model you will need for the harder grid DP and tree DP problems that come later in the roadmap. Think of Edit Distance as your DP training wheels โ€” two-dimensional, well-defined recurrence, clean base cases.

๐Ÿ—๏ธ Connection to System Design

Text processing is everywhere in system design: search engines rely on inverted indexes (tries and suffix arrays under the hood), autocomplete services use trie compression (Patricia tries / DAWG structures), spell checkers use edit distance with dynamic programming, and log parsers use sliding window aggregation over time-series strings. When a system design interviewer asks "how would you build a search-as-you-type feature for 500 million users?", your answer begins with: "At the data structure level, I'd use a compressed trie (Ternary Search Tree or DAWG) to store the corpus..." โ€” and now you have the vocabulary to finish that sentence confidently.


๐ŸŽฏ Key Principle to Carry Forward: String algorithm mastery is not about memorizing solutions. It is about internalizing five mental patterns (window, pointer, hash, trie, DP) so deeply that when you read a new problem, one of these patterns lights up automatically. That automatic recognition is the difference between spending 45 minutes fumbling toward a solution and spending 5 minutes confirming your approach before executing it cleanly.

๐Ÿ’ก Remember: Every expert string coder was once confused by why their sliding window wasn't shrinking correctly, why their KMP failure function was off by one, or why their DP table's base case was wrong. The problems in this lesson's curated list were chosen specifically because solving them โ€” and debugging them โ€” will expose you to those confusions in a controlled environment before the pressure of a real interview.

You now have the map. The only remaining step is the miles.