You are viewing a preview of this lesson. Sign in to start learning
Back to 2026 Modern AI Search & RAG Roadmap

ANN Algorithms

Deep dive into Approximate Nearest Neighbor search methods and their trade-offs between speed and accuracy.

Master Approximate Nearest Neighbor (ANN) algorithms with free flashcards and spaced repetition practice. This lesson covers graph-based methods like HNSW, tree-based structures like Annoy, locality-sensitive hashing, and quantization techniquesβ€”essential concepts for building scalable AI search systems and modern RAG applications.

Welcome 🎯

When you're searching through millions or billions of high-dimensional vectors in a production AI system, exact nearest neighbor search becomes impossibly slow. This is where Approximate Nearest Neighbor (ANN) algorithms shine. These clever techniques trade a small amount of accuracy for massive speed improvementsβ€”often delivering 100x to 1000x faster searches while maintaining 95%+ recall.

In this lesson, you'll discover the core algorithms powering modern vector databases: the hierarchical graph navigation of HNSW, the binary tree partitioning of Annoy, the hash-bucketing magic of LSH, and the compression brilliance of Product Quantization. By understanding these algorithms, you'll know how to choose the right approach for your specific use case, tune parameters for optimal performance, and debug issues when your vector search isn't performing as expected.

πŸ’‘ Why This Matters: Every major vector databaseβ€”from Pinecone to Weaviate to Qdrantβ€”implements one or more of these algorithms under the hood. Understanding them transforms you from a database user into someone who can architect intelligent, high-performance search systems.

Core Concepts: The ANN Algorithm Landscape πŸ—ΊοΈ

The Fundamental Trade-off

All ANN algorithms balance three competing factors:

Factor Description Trade-off
Speed Query latency (ms) Faster β†’ Less accurate
Recall % of true neighbors found Higher β†’ Slower/more memory
Memory Index size overhead Lower β†’ Less accurate

Recall is the key metric: if you want the top 10 nearest neighbors, and your ANN algorithm returns 9 that would be in the true top-10, that's 90% recall. Production systems typically target 95-99% recall.

Algorithm Categories

ANN algorithms fall into four main families:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         ANN ALGORITHM TAXONOMY                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                 β”‚
β”‚  πŸ“Š GRAPH-BASED                                β”‚
β”‚     β”œβ”€ HNSW (Hierarchical Navigable Small World)β”‚
β”‚     β”œβ”€ NSG (Navigable Small Graph)             β”‚
β”‚     └─ DiskANN                                 β”‚
β”‚                                                 β”‚
β”‚  🌳 TREE-BASED                                 β”‚
β”‚     β”œβ”€ Annoy (Approximate Nearest Neighbors)   β”‚
β”‚     β”œβ”€ KD-Tree variants                        β”‚
β”‚     └─ Ball Tree                               β”‚
β”‚                                                 β”‚
β”‚  #️⃣ HASH-BASED                                 β”‚
β”‚     β”œβ”€ LSH (Locality-Sensitive Hashing)        β”‚
β”‚     β”œβ”€ Multi-Probe LSH                         β”‚
β”‚     └─ Cross-Polytope LSH                      β”‚
β”‚                                                 β”‚
β”‚  πŸ—œοΈ QUANTIZATION                                β”‚
β”‚     β”œβ”€ Product Quantization (PQ)               β”‚
β”‚     β”œβ”€ Scalar Quantization (SQ)                β”‚
β”‚     └─ Binary Quantization                     β”‚
β”‚                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1. HNSW: The Graph Navigation Champion πŸ†

Hierarchical Navigable Small World (HNSW) is currently the most popular ANN algorithm for its exceptional balance of speed, accuracy, and ease of use.

How HNSW Works

HNSW builds a multi-layer graph where:

  • Each vector is a node
  • Edges connect similar vectors
  • Upper layers are sparse (skip lists for fast navigation)
  • Bottom layer is dense (contains all vectors)
HNSW STRUCTURE (3 layers)

Layer 2 (sparse):
     (A)────────────────────────(F)
      β”‚                          β”‚
      β”‚                          β”‚

Layer 1 (medium):
     (A)───(B)───────(D)───(E)───(F)
      β”‚     β”‚         β”‚     β”‚     β”‚
      β”‚     β”‚         β”‚     β”‚     β”‚

Layer 0 (all vectors):
     (A)─(B)─(C)─(D)─(E)─(F)─(G)─(H)
      β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚
     (I)─(J)─(K)─(L)─(M)─(N)─(O)─(P)

Search Process:

  1. Start at a random entry point in the top layer
  2. Navigate to the nearest neighbor using greedy search
  3. Drop down one layer
  4. Repeat until reaching the bottom layer
  5. Perform a local search to find k nearest neighbors

πŸ’‘ Why It's Fast: By starting at sparse upper layers, HNSW quickly zooms in on the right region, then refines at lower layers. It's like using a map's zoom levelsβ€”start zoomed out, then drill down.

HNSW Parameters
Parameter Description Impact
M Max edges per node ↑ = Better recall, more memory
ef_construction Candidates during build ↑ = Better quality, slower build
ef_search Candidates during query ↑ = Better recall, slower search

Typical Production Values:

M = 16  # Good default balance
ef_construction = 200  # Build once, search many times
ef_search = 50  # Tune based on latency requirements

⚠️ Common Mistake: Setting ef_search < k (fewer candidates than results requested). Always ensure ef_search >= k.

2. Annoy: Binary Tree Partitioning 🌳

Annoy (Approximate Nearest Neighbors Oh Yeah) was developed by Spotify for music recommendations. It builds multiple random projection trees to partition the vector space.

How Annoy Works

Tree Construction:

  1. Pick two random points from your dataset
  2. Draw a hyperplane halfway between them
  3. Split all points based on which side they fall on
  4. Recursively partition each side
  5. Build multiple trees (typically 10-100) with different random splits
ANNOY RANDOM PROJECTION TREE

                    Root
                     β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚   Hyperplane 1      β”‚
     β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
     β”‚ H-plane2β”‚           β”‚ H-plane3β”‚
   β”Œβ”€β”΄β”€β”     β”Œβ”€β”΄β”€β”       β”Œβ”€β”΄β”€β”     β”Œβ”€β”΄β”€β”
   β”‚A,Bβ”‚     β”‚C,Dβ”‚       β”‚E,Fβ”‚     β”‚G,Hβ”‚
   β””β”€β”€β”€β”˜     β””β”€β”€β”€β”˜       β””β”€β”€β”€β”˜     β””β”€β”€β”€β”˜
   Leaves contain actual vectors

Search Process:

  1. Traverse each tree from root to leaf
  2. Collect all candidates from the reached leaves
  3. Compute exact distances to all candidates
  4. Return the k closest

πŸ’‘ Why Multiple Trees: A single tree might unluckily separate similar vectors. Multiple trees with different random splits ensure you'll find neighbors in at least some trees.

Annoy Parameters
from annoy import AnnoyIndex

f = 128  # Vector dimensionality
t = AnnoyIndex(f, 'angular')  # or 'euclidean'

## Add vectors
for i, vector in enumerate(vectors):
    t.add_item(i, vector)

## Build with n_trees
t.build(n_trees=10)  # More trees = better recall, more memory
t.save('index.ann')

## Search with search_k candidates
neighbors = t.get_nns_by_vector(query_vector, k=10, search_k=100)
Parameter Description Typical Range
n_trees Number of trees to build 10-100 (more for higher recall)
search_k Nodes to inspect during search n_trees * k to n_trees * k * 10

Advantages:

  • πŸš€ Memory-mapped: Can search without loading entire index into RAM
  • πŸ“¦ Compact: Index size is close to raw vector data
  • 🎯 Predictable: Performance scales smoothly with parameters

Disadvantages:

  • πŸ”’ Static: Must rebuild entire index to add vectors
  • πŸ“Š Lower recall: Typically doesn't match HNSW at same speed

3. Locality-Sensitive Hashing (LSH) #️⃣

LSH uses special hash functions that map similar vectors to the same hash buckets with high probability.

The LSH Magic Trick

Normal hash functions (like MD5) aim to spread similar inputs far apart. LSH does the opposite:

REGULAR HASH vs LSH

Regular Hash (MD5):
Input: "hello"  β†’ Hash: a3f8c9...
Input: "hallo"  β†’ Hash: 8d2e1f...  (totally different!)

LSH for vectors:
Vector A: [0.9, 0.1, 0.2]  β†’ Bucket: 10110
Vector B: [0.85, 0.15, 0.25] β†’ Bucket: 10110  (same!)
Vector C: [0.1, 0.9, 0.8]  β†’ Bucket: 01001  (different)
Random Hyperplane LSH

The most common LSH for cosine similarity uses random hyperplanes:

import numpy as np

class LSHIndex:
    def __init__(self, dim, n_bits=16, n_tables=10):
        self.dim = dim
        self.n_bits = n_bits
        self.n_tables = n_tables
        
        # Create random hyperplanes for each table
        self.hyperplanes = [
            np.random.randn(n_bits, dim) 
            for _ in range(n_tables)
        ]
        self.tables = [{} for _ in range(n_tables)]
    
    def hash_vector(self, vector, table_idx):
        # Project onto hyperplanes, check which side
        projections = np.dot(self.hyperplanes[table_idx], vector)
        # Convert to binary hash
        bits = (projections > 0).astype(int)
        return tuple(bits)
    
    def insert(self, vector, item_id):
        for i in range(self.n_tables):
            hash_val = self.hash_vector(vector, i)
            if hash_val not in self.tables[i]:
                self.tables[i][hash_val] = []
            self.tables[i][hash_val].append(item_id)
    
    def query(self, vector, k=10):
        candidates = set()
        for i in range(self.n_tables):
            hash_val = self.hash_vector(vector, i)
            if hash_val in self.tables[i]:
                candidates.update(self.tables[i][hash_val])
        # Rank candidates by actual distance
        return list(candidates)[:k]  # Simplified
LSH Parameters
Parameter Effect
n_bits Hash length (more bits = finer buckets)
n_tables Independent hash tables (more = better recall)

πŸ’‘ The LSH Insight: Similar vectors will land in the same bucket in at least one table. More tables = more chances to find neighbors.

Advantages:

  • ⚑ Sub-linear search: O(n^ρ) where ρ < 1
  • πŸ’Ύ Memory efficient: Just store hash buckets
  • βž• Supports updates: Easy to add/remove vectors

Disadvantages:

  • 🎯 Lower recall: Hard to tune for high recall
  • πŸ“ Dimension dependent: Performance degrades in very high dimensions

4. Product Quantization (PQ) πŸ—œοΈ

Product Quantization compresses vectors by breaking them into chunks and replacing each chunk with the nearest centroid from a learned codebook. Think of it as "lossy compression for vectors."

How PQ Works

Step 1: Split Vectors into Subvectors

## Original 128-dim vector
vector = [0.1, 0.5, 0.3, ..., 0.8]  # 128 floats

## Split into 8 subvectors of 16 dimensions each
subvectors = [
    vector[0:16],    # Subvector 1
    vector[16:32],   # Subvector 2
    vector[32:48],   # Subvector 3
    # ... 8 total
]

Step 2: Learn Codebooks

For each subvector position, cluster all training subvectors into M centroids (typically M=256):

CODEBOOK for Subvector 1 (256 centroids)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ID 0:  [0.12, 0.45, 0.32, ...]  β”‚
β”‚ ID 1:  [0.89, 0.01, 0.56, ...]  β”‚
β”‚ ID 2:  [0.34, 0.67, 0.11, ...]  β”‚
β”‚ ...                              β”‚
β”‚ ID 255: [0.55, 0.23, 0.78, ...] β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Step 3: Encode Vectors

Replace each subvector with its nearest centroid ID:

## Original: 128 floats Γ— 4 bytes = 512 bytes
vector = [0.1, 0.5, 0.3, ..., 0.8]

## Encoded: 8 bytes (one ID per subvector)
encoded = [42, 17, 201, 88, 156, 3, 99, 211]
##          ↑   Each ID is 0-255 (1 byte)

Compression Ratio: 512 bytes β†’ 8 bytes = 64x compression!

To search without decompressing:

def pq_search(query_vector, encoded_db, codebooks, k=10):
    # 1. Split query into subvectors
    query_subs = split_into_subvectors(query_vector)
    
    # 2. Precompute distances from query to all centroids
    distance_tables = []
    for i, sub in enumerate(query_subs):
        # Distance from query_sub to each of 256 centroids
        distances = compute_distances(sub, codebooks[i])
        distance_tables.append(distances)
    
    # 3. Score each database vector using lookup tables
    scores = []
    for encoded_vec in encoded_db:
        score = 0
        for i, centroid_id in enumerate(encoded_vec):
            # Just look up precomputed distance!
            score += distance_tables[i][centroid_id]
        scores.append(score)
    
    # 4. Return top-k
    return get_topk(scores, k)

πŸ’‘ The PQ Speedup: We compute distances to only 8Γ—256=2048 centroids once, then use table lookups for every database vector. No floating-point math per vector!

PQ Parameters
Parameter Typical Value Trade-off
m (subvectors) 8, 16, 32 More = better accuracy, larger tables
k (centroids) 256 (fits in 1 byte) More = better accuracy, more memory

PQ + IVFPQ: Often combined with inverted file index (IVF) for two-stage search:

  1. Coarse search: Find relevant clusters using IVF
  2. Fine search: Use PQ within selected clusters
from faiss import IndexIVFPQ

## Create IVF with PQ
quantizer = faiss.IndexFlatL2(dim)
index = faiss.IndexIVFPQ(
    quantizer,
    dim,           # Vector dimension
    n_clusters=1024,  # IVF clusters
    m=8,           # PQ subvectors
    bits_per_code=8  # 256 centroids
)

5. Scalar Quantization (SQ) πŸ“‰

A simpler compression approach: convert float32 to int8.

## Original float32 (4 bytes per dimension)
vector = np.array([0.123, 0.876, -0.432, 0.591], dtype=np.float32)
## Memory: 4 dimensions Γ— 4 bytes = 16 bytes

## Find min/max
vmin, vmax = vector.min(), vector.max()

## Quantize to int8 (1 byte per dimension)
quantized = np.round(
    (vector - vmin) / (vmax - vmin) * 255
).astype(np.uint8)
## Memory: 4 dimensions Γ— 1 byte = 4 bytes

## Result: [0, 255, 42, 196]
## Compression: 4x, but some precision lost

Advantages of SQ:

  • 🎯 Better recall than PQ (less information loss)
  • πŸ’» SIMD-friendly: Fast with modern CPUs
  • πŸ”§ Simple: No training needed

Disadvantages:

  • πŸ“¦ Lower compression: Only 4x vs PQ's 64x
  • πŸ“Š Dimension dependent: Savings scale with original dimension

Examples: Choosing and Implementing ANN Algorithms πŸ”§

Example 1: Building an HNSW Index with FAISS

Scenario: You have 1 million product embeddings (384 dimensions) for semantic search.

import faiss
import numpy as np

## Generate sample data
n_vectors = 1_000_000
dim = 384
vectors = np.random.randn(n_vectors, dim).astype('float32')

## Normalize for cosine similarity
faiss.normalize_L2(vectors)

## Create HNSW index
M = 32  # Number of connections per layer
ef_construction = 200  # Quality during construction

index = faiss.IndexHNSWFlat(dim, M)
index.hnsw.efConstruction = ef_construction

## Build the index
print("Building HNSW index...")
index.add(vectors)
print(f"Index built with {index.ntotal} vectors")

## Search with different ef_search values
query = np.random.randn(1, dim).astype('float32')
faiss.normalize_L2(query)

for ef in [16, 32, 64, 128]:
    index.hnsw.efSearch = ef
    distances, indices = index.search(query, k=10)
    print(f"ef_search={ef}: Found {len(indices[0])} neighbors")

Expected Output:

Building HNSW index...
Index built with 1000000 vectors
ef_search=16: Found 10 neighbors
ef_search=32: Found 10 neighbors  # Better recall
ef_search=64: Found 10 neighbors  # Even better
ef_search=128: Found 10 neighbors  # Highest quality

πŸ’‘ Tuning Insight: Monitor latency vs recall. Start with ef_search = 50, increase if recall is too low, decrease if latency is too high.

Example 2: Memory-Efficient Search with IVF + PQ

Scenario: You have 10 million vectors but limited RAM. Compress with Product Quantization.

import faiss
import numpy as np

n_vectors = 10_000_000
dim = 768  # Large language model embeddings
vectors = np.random.randn(n_vectors, dim).astype('float32')

## Create IVF + PQ index
n_clusters = 4096  # Number of IVF clusters
m = 64  # Split into 64 subvectors (768/64 = 12 dims each)
bits = 8  # 256 centroids per subvector

quantizer = faiss.IndexFlatL2(dim)
index = faiss.IndexIVFPQ(quantizer, dim, n_clusters, m, bits)

## Train on sample data (PQ needs training)
print("Training index...")
training_sample = vectors[:100_000]  # Use 100k for training
index.train(training_sample)

## Add all vectors
print("Adding vectors...")
index.add(vectors)

print(f"\nMemory comparison:")
print(f"Original: {n_vectors * dim * 4 / 1e9:.2f} GB")
print(f"With PQ: {n_vectors * m / 1e9:.2f} GB")
print(f"Compression: {(dim * 4) / m:.1f}x")

## Search
index.nprobe = 32  # Search top 32 clusters
query = np.random.randn(1, dim).astype('float32')
distances, indices = index.search(query, k=10)

Output:

Training index...
Adding vectors...

Memory comparison:
Original: 30.72 GB
With PQ: 0.64 GB
Compression: 48.0x

πŸ’‘ Production Tip: Train PQ on representative data. If your distribution changes, retrain the codebooks.

Example 3: Dynamic Updates with LSH

Scenario: Real-time vector search where vectors are constantly added/removed.

import numpy as np
from collections import defaultdict

class SimpleLSH:
    def __init__(self, dim, n_bits=16, n_tables=20):
        self.dim = dim
        self.n_bits = n_bits
        self.n_tables = n_tables
        
        # Random hyperplanes for each table
        self.planes = [
            np.random.randn(n_bits, dim)
            for _ in range(n_tables)
        ]
        
        # Hash tables: {hash -> [vector_ids]}
        self.tables = [
            defaultdict(list) for _ in range(n_tables)
        ]
        
        # Store actual vectors
        self.vectors = {}
    
    def _hash(self, vector, table_idx):
        """Compute binary hash for a vector"""
        projections = self.planes[table_idx] @ vector
        bits = tuple((projections > 0).astype(int))
        return bits
    
    def insert(self, vector_id, vector):
        """Add a vector to the index"""
        self.vectors[vector_id] = vector
        for i in range(self.n_tables):
            hash_val = self._hash(vector, i)
            self.tables[i][hash_val].append(vector_id)
    
    def remove(self, vector_id):
        """Remove a vector from the index"""
        if vector_id not in self.vectors:
            return
        
        vector = self.vectors[vector_id]
        for i in range(self.n_tables):
            hash_val = self._hash(vector, i)
            self.tables[i][hash_val].remove(vector_id)
        
        del self.vectors[vector_id]
    
    def query(self, query_vector, k=10):
        """Find k nearest neighbors"""
        # Collect candidates from all tables
        candidates = set()
        for i in range(self.n_tables):
            hash_val = self._hash(query_vector, i)
            candidates.update(self.tables[i][hash_val])
        
        # Rank by actual distance
        distances = [
            (vid, np.linalg.norm(self.vectors[vid] - query_vector))
            for vid in candidates
        ]
        distances.sort(key=lambda x: x[1])
        
        return [vid for vid, _ in distances[:k]]

## Usage
lsh = SimpleLSH(dim=128, n_bits=16, n_tables=20)

## Add vectors dynamically
for i in range(10000):
    vec = np.random.randn(128).astype('float32')
    lsh.insert(f"vec_{i}", vec)

## Search
query = np.random.randn(128).astype('float32')
results = lsh.query(query, k=10)
print(f"Found neighbors: {results}")

## Remove vector (instant!)
lsh.remove("vec_5000")

Key Advantage: Updates are O(n_tables) = O(1) with respect to database size.

Example 4: Hybrid Approach - HNSW + SQ

Scenario: Balance speed, accuracy, and memory.

import faiss
import numpy as np

n_vectors = 5_000_000
dim = 512
vectors = np.random.randn(n_vectors, dim).astype('float32')

## Create HNSW with Scalar Quantization
M = 32
index = faiss.IndexHNSWSQ(dim, faiss.ScalarQuantizer.QT_8bit, M)
index.hnsw.efConstruction = 200

## Train scalar quantizer (learns min/max per dimension)
print("Training scalar quantizer...")
index.train(vectors[:100_000])

## Add vectors
print("Adding vectors...")
index.add(vectors)

print(f"\nMemory savings:")
print(f"Float32: {n_vectors * dim * 4 / 1e9:.2f} GB")
print(f"Int8 (SQ): {n_vectors * dim * 1 / 1e9:.2f} GB")
print(f"Savings: 4x compression")

## Search with high quality
index.hnsw.efSearch = 100
query = np.random.randn(1, dim).astype('float32')
distances, indices = index.search(query, k=10)

print(f"\nTop-10 neighbors: {indices[0]}")

Why This Works:

  • HNSW provides excellent graph navigation
  • SQ reduces memory by 4x with minimal recall loss
  • Still much faster than exhaustive search

Common Mistakes ⚠️

1. Not Normalizing Vectors for Cosine Similarity

❌ Wrong:

index = faiss.IndexHNSWFlat(dim, 32)
index.add(vectors)  # Vectors not normalized

βœ… Right:

import faiss
faiss.normalize_L2(vectors)  # Normalize first!
index = faiss.IndexHNSWFlat(dim, 32)
index.add(vectors)

Why: HNSW uses L2 distance by default. For cosine similarity, you must normalize vectors so that L2 distance approximates cosine.

2. Setting ef_search < k in HNSW

❌ Wrong:

index.hnsw.efSearch = 10
results = index.search(query, k=50)  # Asking for more than candidates!

βœ… Right:

index.hnsw.efSearch = 100  # Always >= k
results = index.search(query, k=50)

3. Not Training PQ/IVF Indices

❌ Wrong:

index = faiss.IndexIVFPQ(quantizer, dim, 1024, 8, 8)
index.add(vectors)  # Error: index not trained!

βœ… Right:

index = faiss.IndexIVFPQ(quantizer, dim, 1024, 8, 8)
index.train(training_vectors)  # Train first
index.add(vectors)

❌ Wrong:

index.nprobe = 1  # Only search 1 cluster - terrible recall!

βœ… Right:

index.nprobe = 32  # Search 32 clusters - much better

πŸ’‘ Rule of Thumb: Start with nprobe = sqrt(n_clusters), tune based on recall.

5. Forgetting to Set Quantizer Training

❌ Wrong:

quantizer = faiss.IndexFlatL2(dim)
index = faiss.IndexIVFPQ(quantizer, dim, 1024, 8, 8)
index.train(vectors)  # Quantizer also needs vectors

βœ… Right:

quantizer = faiss.IndexFlatL2(dim)
index = faiss.IndexIVFPQ(quantizer, dim, 1024, 8, 8)
quantizer.train(vectors)  # Train quantizer
index.train(vectors)      # Then train index

6. Not Benchmarking Recall

Always measure recall to ensure your approximation is acceptable:

def measure_recall(index, queries, ground_truth, k=10):
    """Measure recall@k against ground truth"""
    _, predicted = index.search(queries, k)
    
    recalls = []
    for i in range(len(queries)):
        true_neighbors = set(ground_truth[i][:k])
        pred_neighbors = set(predicted[i])
        recall = len(true_neighbors & pred_neighbors) / k
        recalls.append(recall)
    
    return np.mean(recalls)

## Get ground truth with exact search
exact_index = faiss.IndexFlatL2(dim)
exact_index.add(vectors)
_, ground_truth = exact_index.search(queries, k=10)

## Measure your ANN index
recall = measure_recall(ann_index, queries, ground_truth, k=10)
print(f"Recall@10: {recall:.2%}")

Key Takeaways 🎯

πŸ“‹ ANN Algorithm Quick Reference

Algorithm Best For Updates Recall Memory
HNSW High recall, static datasets ❌ Slow ⭐⭐⭐⭐⭐ Medium
Annoy Memory-mapped, read-heavy ❌ Rebuild ⭐⭐⭐⭐ Low
LSH Dynamic datasets, streaming βœ… Fast ⭐⭐⭐ Low
IVF+PQ Huge datasets, limited RAM ⚠️ Retrain ⭐⭐⭐⭐ Very Low
HNSW+SQ Balance speed/memory/recall ❌ Slow ⭐⭐⭐⭐⭐ Low

πŸŽ“ Decision Framework

Choose HNSW when:

  • Recall is critical (95%+ required)
  • Dataset fits in memory
  • Updates are infrequent
  • You need predictable performance

Choose Annoy when:

  • Dataset exceeds RAM (use memory mapping)
  • You can rebuild index offline
  • You need compact disk storage
  • Spotify-level recall (90-95%) is acceptable

Choose LSH when:

  • Vectors arrive in real-time
  • You need immediate updates/deletes
  • Lower recall (85-90%) is acceptable
  • Memory is very limited

Choose IVF+PQ when:

  • Dataset is massive (10M+ vectors)
  • RAM is constrained
  • You can afford retraining periodically
  • You need 50-100x compression

Parameter Tuning Priorities:

  1. Start simple: Default parameters often work well
  2. Measure baseline: Get recall/latency metrics
  3. Tune for recall: Increase ef_search, nprobe, search_k
  4. Optimize latency: Decrease parameters once recall is met
  5. Monitor memory: Add quantization if needed

πŸ“š Further Study

  1. FAISS Documentation: https://github.com/facebookresearch/faiss/wiki - Comprehensive guide to Meta's vector search library
  2. HNSW Paper: https://arxiv.org/abs/1603.09320 - Original research on Hierarchical Navigable Small World graphs
  3. ANN Benchmarks: https://ann-benchmarks.com/ - Compare algorithm performance across datasets

🧠 Memory Device for Algorithm Selection:

  • High recall β†’ HNSW
  • Annoy for Archived/static data
  • LSH for Live updates
  • PQ for Petabyte-scale compression

Ready to test your knowledge? Complete the practice questions below to reinforce your understanding of ANN algorithms! πŸš€