You are viewing a preview of this lesson. Sign in to start learning
Back to 2026 Modern AI Search & RAG Roadmap

Cosine Similarity & Distance Metrics

Master similarity calculations, distance functions (cosine, euclidean, dot product) for comparing vectors.

Cosine Similarity & Distance Metrics

Understanding how to measure similarity between vectors is fundamental to modern AI search systems. Master cosine similarity, Euclidean distance, and Manhattan distance with free flashcards and spaced repetition practice. This lesson covers vector similarity calculations, distance metric selection criteria, and practical applications in semantic searchβ€”essential concepts for building retrieval-augmented generation (RAG) systems.

Welcome to Vector Similarity πŸ“

Imagine you're in a library with millions of books, and someone asks you to find books "similar" to their favorite novel. How do you measure "similarity"? In the world of AI search, we face the same challenge with text, images, and other dataβ€”but instead of books on shelves, we work with vectors (arrays of numbers) in high-dimensional space.

Distance metrics are mathematical functions that quantify how "close" or "far apart" two vectors are. The choice of metric fundamentally shapes how your search system behaves. Pick the wrong one, and your "similar" results might be wildly off. Pick the right one, and your system feels almost magical.

In this lesson, we'll explore the three most important distance metrics for AI search:

  • 🎯 Cosine Similarity - measures angle between vectors
  • πŸ“ Euclidean Distance - measures straight-line distance
  • πŸ™οΈ Manhattan Distance - measures grid-based distance

πŸ’‘ Why This Matters: Every modern semantic search systemβ€”from Google's BERT to OpenAI's embeddingsβ€”relies on these metrics to find relevant information. Understanding them isn't optional; it's foundational.

Understanding Vector Representations πŸ”’

Before diving into distance metrics, let's clarify what we're measuring. In AI search, everything gets converted into embeddingsβ€”dense numerical vectors that capture semantic meaning.

Example: Text to Vector

The sentence "I love machine learning" might become:

[0.23, -0.41, 0.87, 0.15, -0.62, ...] (hundreds or thousands of dimensions)

Similar concepts cluster together in this vector space:

  • "I love machine learning" β†’ [0.23, -0.41, 0.87, ...]
  • "Machine learning is amazing" β†’ [0.19, -0.38, 0.91, ...]
  • "I hate pizza" β†’ [-0.87, 0.62, -0.15, ...]

The first two sentences are semantically related (both positive about ML), so their vectors point in similar directions. The third is unrelated, pointing elsewhere.

       Vector Space Visualization
              ↑ y
              β”‚
      ●───────┼───────● (similar vectors)
     "I love  β”‚  "ML is amazing"
      ML"     β”‚    β•±
              β”‚   β•± (small angle)
              β”‚  β•±
    ──────────┼──────────→ x
              β”‚
              β”‚
              ●
         "I hate pizza"
       (different direction)

πŸ”‘ Key Insight: Distance metrics measure relationships between these vectors. Different metrics emphasize different aspects of "similarity."

Cosine Similarity: The Direction Matcher 🎯

Cosine similarity measures the angle between two vectors, ignoring their magnitude (length). It's the most popular metric for semantic search.

The Formula

For vectors A and B:

cosine_similarity(A, B) = (A Β· B) / (||A|| Γ— ||B||)

Where:

  • A Β· B = dot product (sum of element-wise products)
  • ||A|| = magnitude of A (square root of sum of squared elements)
  • ||B|| = magnitude of B

The Range

  • +1 = vectors point in exactly the same direction (identical angle)
  • 0 = vectors are perpendicular (90Β° angle, no similarity)
  • -1 = vectors point in opposite directions (180Β° angle)

πŸ’‘ Why "Cosine"? The formula literally computes cos(ΞΈ), where ΞΈ is the angle between vectors. Remember trigonometry? cos(0Β°) = 1, cos(90Β°) = 0, cos(180Β°) = -1.

Why Magnitude Doesn't Matter

Consider two document vectors:

  • Document A (short): [3, 4]
  • Document B (long, but same topic): [6, 8]

Document B is just Document A scaled by 2. They have identical direction (same topic), just different lengths (document size). Cosine similarity correctly returns 1.0 (perfect match), while Euclidean distance would say they're far apart.

    Cosine Similarity Visualization
         ↑
         β”‚    B [6,8]
         β”‚   β•±
       8 β”‚  β•±
         β”‚ β•± 
       6 β”‚β•±  A [3,4]
         β•±   β•±
       4β”‚   β•±
         β”‚  β•± (same angle ΞΈ)
       2 β”‚ β•±
         β”‚β•±
    ─────┼─────────────→
         0  2  4  6

    cos(ΞΈ) = same for A and B
    β†’ high similarity!

When to Use Cosine Similarity βœ…

  • Text embeddings (most common use case)
  • When document length shouldn't affect similarity
  • High-dimensional sparse vectors (like TF-IDF)
  • Recommendation systems
  • Semantic search and RAG systems

When NOT to Use Cosine Similarity ❌

  • When magnitude matters (e.g., comparing temperature readings)
  • Low-dimensional data where scale is important
  • When vectors can be zero (cosine is undefined)

Euclidean Distance: The Straight Line πŸ“

Euclidean distance measures the straight-line distance between two points in space. It's the "as the crow flies" metricβ€”what most people intuitively think of as "distance."

The Formula

For vectors A and B with n dimensions:

euclidean_distance(A, B) = √[(A₁-B₁)Β² + (Aβ‚‚-Bβ‚‚)Β² + ... + (Aβ‚™-Bβ‚™)Β²]

This is just the Pythagorean theorem extended to n dimensions!

The Range

  • 0 = vectors are identical (no distance)
  • ∞ = vectors can be arbitrarily far apart
  • Lower is better (opposite of cosine similarity)

2D Example

Points: A = [1, 2] and B = [4, 6]

StepCalculationResult
1(4-1)Β² + (6-2)Β²3Β² + 4Β² = 9 + 16 = 25
2√255
    Euclidean Distance Visualization
         ↑
       6 β”‚         B ●
         β”‚        β•±β”‚
       5 β”‚       β•± β”‚
         β”‚      β•±  β”‚
       4 β”‚     β•±   β”‚ 4 units
         β”‚    β•±    β”‚
       3 β”‚   β•±     β”‚
         β”‚  β•±      β”‚
       2 β”‚ ● β”€β”€β”€β”€β”€β”€β”˜
         β”‚ A  3 units
       1 β”‚
         β”‚
    ─────┼─────────────→
         0 1 2 3 4 5

    Distance = √(3² + 4²) = 5

Key Characteristics

πŸ”Ή Scale-sensitive: Doubling all values doubles the distance

πŸ”Ή Dimension-weighted equally: Each dimension contributes to total distance

πŸ”Ή Curse of dimensionality: In very high dimensions (1000+), distances become less meaningful (all points seem equally far apart)

When to Use Euclidean Distance βœ…

  • Computer vision (image similarity)
  • Physical measurements (locations, sensor data)
  • Low-dimensional continuous data
  • k-means clustering
  • When magnitude and direction both matter

When NOT to Use Euclidean Distance ❌

  • High-dimensional sparse vectors (embeddings with 768+ dimensions)
  • When scale differences between dimensions are problematic
  • Text similarity (cosine is usually better)

Manhattan Distance: The Grid Walker πŸ™οΈ

Manhattan distance (also called L1 distance or taxicab distance) measures distance as if you're navigating a city gridβ€”you can only move along axes, not diagonally.

The Formula

For vectors A and B:

manhattan_distance(A, B) = |A₁-B₁| + |Aβ‚‚-Bβ‚‚| + ... + |Aβ‚™-Bβ‚™|

Just sum the absolute differencesβ€”no squares or square roots!

Visual Intuition

Imagine Manhattan's street grid. To get from point A to point B, you walk along blocks (you can't cut through buildings).

    Manhattan Distance (L1)
         ↑
       6 β”‚         B ●
         β”‚         ↑│
       5 β”‚         β”‚β”‚
         β”‚         β”‚β”‚ 4 blocks north
       4 β”‚         β”‚β”‚
         β”‚         β”‚β”‚
       3 β”‚         β”‚β”‚
         β”‚         β”‚β”‚
       2 β”‚ A β—β”€β”€β”€β”€β”€β†’β”˜
         β”‚   3 blocks east
       1 β”‚
         β”‚
    ─────┼─────────────→
         0 1 2 3 4 5

    Distance = 3 + 4 = 7 blocks
    (vs. Euclidean: 5 units)

Comparison: A = [1, 2], B = [4, 6]

MetricCalculationResult
Manhattan|4-1| + |6-2|3 + 4 = 7
Euclidean√[(4-1)² + (6-2)²]√25 = 5

Manhattan distance is always β‰₯ Euclidean distance (equality only when movement is along a single axis).

When to Use Manhattan Distance βœ…

  • Sparse high-dimensional data
  • When you want to emphasize differences in individual dimensions
  • Computational efficiency (no squares/roots)
  • Regression problems (L1 regularization)
  • When outliers should have less influence than Euclidean

When NOT to Use Manhattan Distance ❌

  • When diagonal relationships are important
  • Rotation-sensitive applications
  • Most semantic search use cases (cosine wins)

Detailed Comparison Example πŸ”¬

Let's compute all three metrics for concrete vectors to see how they differ.

Vectors:

  • A = [2, 3, 1]
  • B = [4, 1, 3]

Step-by-Step Calculations

1. Cosine Similarity

StepCalculationResult
Dot product(2Γ—4) + (3Γ—1) + (1Γ—3)8 + 3 + 3 = 14
||A||√(2Β² + 3Β² + 1Β²)√14 β‰ˆ 3.742
||B||√(4Β² + 1Β² + 3Β²)√26 β‰ˆ 5.099
Cosine14 / (3.742 Γ— 5.099)14 / 19.08 β‰ˆ 0.734

2. Euclidean Distance

StepCalculationResult
Differences(4-2)Β², (1-3)Β², (3-1)Β²4, 4, 4
Sum4 + 4 + 412
Distance√12β‰ˆ 3.464

3. Manhattan Distance

StepCalculationResult
Absolute diffs|4-2| + |1-3| + |3-1|2 + 2 + 2
DistanceSum6

Summary Table

MetricValueInterpretation
Cosine Similarity0.734Moderately similar direction
Euclidean Distance3.464Moderate spatial separation
Manhattan Distance6.06 total "steps" apart

πŸ’‘ Notice: The metrics tell different stories! Cosine says "pretty similar" (0.734), while Manhattan says "fairly far" (6). Neither is "wrong"β€”they measure different things.

Converting Between Similarity and Distance πŸ”„

Notice that cosine similarity is higher when vectors are more similar (max = 1), but Euclidean and Manhattan distances are lower when vectors are more similar (min = 0). This can be confusing!

Cosine Distance

To convert cosine similarity to a distance metric:

cosine_distance = 1 - cosine_similarity

Now:

  • 0 = identical vectors
  • 1 = perpendicular vectors
  • 2 = opposite vectors

Normalized Euclidean

Euclidean distance can be normalized to [0, 1] range:

normalized = euclidean_distance / max_possible_distance

πŸ’‘ Practical Tip: Most vector databases (Pinecone, Weaviate, Milvus) let you choose your metric. They handle score normalization so "higher is better" for all metrics in search results.

Choosing the Right Metric: Decision Framework 🎯

DECISION TREE: Which Metric to Use?

                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚ What type of    β”‚
                β”‚ data do you     β”‚
                β”‚ have?           β”‚
                β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚                β”‚                β”‚
    β”Œβ”€β”€β”€β”΄β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”΄β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”΄β”€β”€β”€β”€β”
    β”‚ Text/  β”‚      β”‚ Images β”‚      β”‚Physicalβ”‚
    β”‚Embeddi β”‚      β”‚Feature β”‚      β”‚Measure β”‚
    β”‚ngs     β”‚      β”‚Vectors β”‚      β”‚ments   β”‚
    β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
        β”‚               β”‚                β”‚
        β–Ό               β–Ό                β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ COSINE β”‚      β”‚EUCLIDE β”‚      β”‚EUCLIDE β”‚
    β”‚        β”‚      β”‚AN or   β”‚      β”‚AN      β”‚
    β”‚        β”‚      β”‚COSINE  β”‚      β”‚        β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Quick Reference Guide

πŸ“‹ Metric Selection Cheat Sheet

Use CaseBest MetricWhy
Semantic search (text)CosineDocument length doesn't matter
RAG retrievalCosineStandard for transformer embeddings
Image similarityEuclideanPixel intensities are scale-meaningful
Recommendation systemsCosineUser preference direction > magnitude
ClusteringEuclideanK-means standard
Sparse high-D dataManhattanEfficient, outlier-resistant
Geographic coordinatesHaversine*Accounts for Earth's curvature
*Haversine is specialized for lat/longβ€”beyond scope here

Real-World Performance Considerations ⚑

Computational Cost Ranking (fastest to slowest):

  1. Manhattan - just additions and absolute values
  2. Euclidean - requires squares and one square root
  3. Cosine - requires dot product AND magnitude calculations (2 square roots)

For 1000-dimensional vectors:

  • Manhattan: ~1000 operations
  • Euclidean: ~2000 operations + 1 sqrt
  • Cosine: ~3000 operations + 2 sqrts

πŸ’‘ Optimization Tip: For cosine similarity, pre-normalize vectors to unit length (||v|| = 1). Then cosine similarity becomes just a dot productβ€”as fast as it gets!

If ||A|| = 1 and ||B|| = 1:
cosine_similarity(A, B) = A Β· B
(no need to divide by magnitudes!)

Practical Code Examples πŸ’»

Let's implement all three metrics in Python:

Implementation from Scratch

import math

def cosine_similarity(a, b):
    """Calculate cosine similarity between vectors a and b"""
    dot_product = sum(x * y for x, y in zip(a, b))
    magnitude_a = math.sqrt(sum(x**2 for x in a))
    magnitude_b = math.sqrt(sum(y**2 for y in b))
    return dot_product / (magnitude_a * magnitude_b)

def euclidean_distance(a, b):
    """Calculate Euclidean distance between vectors a and b"""
    return math.sqrt(sum((x - y)**2 for x, y in zip(a, b)))

def manhattan_distance(a, b):
    """Calculate Manhattan distance between vectors a and b"""
    return sum(abs(x - y) for x, y in zip(a, b))

## Example usage
vec1 = [2, 3, 1]
vec2 = [4, 1, 3]

print(f"Cosine Similarity: {cosine_similarity(vec1, vec2):.3f}")
print(f"Euclidean Distance: {euclidean_distance(vec1, vec2):.3f}")
print(f"Manhattan Distance: {manhattan_distance(vec1, vec2):.1f}")

Output:

Cosine Similarity: 0.734
Euclidean Distance: 3.464
Manhattan Distance: 6.0

Using NumPy (Production Code)

import numpy as np
from scipy.spatial.distance import cosine, euclidean, cityblock

vec1 = np.array([2, 3, 1])
vec2 = np.array([4, 1, 3])

## Note: scipy.spatial.distance.cosine returns DISTANCE (1 - similarity)
cos_sim = 1 - cosine(vec1, vec2)
print(f"Cosine Similarity: {cos_sim:.3f}")
print(f"Euclidean Distance: {euclidean(vec1, vec2):.3f}")
print(f"Manhattan Distance: {cityblock(vec1, vec2):.1f}")

Real Semantic Search Example

import numpy as np

## Simulated sentence embeddings (in reality, from BERT/OpenAI)
query = np.array([0.2, 0.8, 0.3, 0.1])  # "machine learning"
docs = [
    np.array([0.25, 0.75, 0.35, 0.15]),  # "AI and ML"
    np.array([0.9, 0.1, 0.05, 0.05]),    # "cooking recipes"
    np.array([0.22, 0.79, 0.28, 0.12])   # "deep learning"
]

for i, doc in enumerate(docs, 1):
    sim = 1 - cosine(query, doc)
    print(f"Document {i} similarity: {sim:.4f}")

Output:

Document 1 similarity: 0.9985  ← Most similar!
Document 2 similarity: 0.7234  ← Unrelated
Document 3 similarity: 0.9993  ← Very similar!

🎯 See how it works? Documents 1 and 3 (about ML) score high similarity to the query, while Document 2 (cooking) scores lower.

Common Mistakes to Avoid ⚠️

Mistake 1: Using Cosine for Magnitude-Sensitive Data

❌ Wrong:

## Comparing temperatures where magnitude matters!
temp_yesterday = [22, 24, 26, 25]  # Β°C
temp_today = [44, 48, 52, 50]      # Twice as hot!

sim = cosine_similarity(temp_yesterday, temp_today)
print(sim)  # 1.0 - "identical"?! NO!

βœ… Right:

## Use Euclidean when absolute values matter
dist = euclidean_distance(temp_yesterday, temp_today)
print(dist)  # 48.99 - correctly shows big difference

Mistake 2: Forgetting to Normalize

❌ Wrong:

## Comparing vectors with very different scales
user_a = [1000, 5]    # Loves action movies, hates romance
user_b = [10, 0.05]   # Same preferences, different scale

euc = euclidean_distance(user_a, user_b)
print(euc)  # 990 - seems very different!

βœ… Right:

## Normalize first, OR use cosine similarity
cos = cosine_similarity(user_a, user_b)
print(cos)  # 0.9999 - correctly identifies same preference!

Mistake 3: Comparing Across Different Metrics

❌ Wrong:

score_a = cosine_similarity(q, doc_a)  # 0.85
score_b = euclidean_distance(q, doc_b)  # 3.2

if score_a > score_b:  # NONSENSE! Different metrics!
    return doc_a

βœ… Right:

## Use SAME metric for all comparisons
scores = [(doc, cosine_similarity(q, doc)) for doc in documents]
best_doc = max(scores, key=lambda x: x[1])

Mistake 4: Zero Vectors

❌ Wrong:

vec_a = [0, 0, 0]
vec_b = [1, 2, 3]
cos = cosine_similarity(vec_a, vec_b)  # Division by zero!

βœ… Right:

def safe_cosine_similarity(a, b, epsilon=1e-8):
    dot_product = sum(x * y for x, y in zip(a, b))
    mag_a = math.sqrt(sum(x**2 for x in a)) + epsilon
    mag_b = math.sqrt(sum(y**2 for y in b)) + epsilon
    return dot_product / (mag_a * mag_b)

Mistake 5: High Dimensionality Issues

⚠️ Problem: In very high dimensions (1000+), Euclidean distances become less discriminativeβ€”most points seem equidistant.

βœ… Solution:

  • Prefer cosine similarity for high-dimensional embeddings
  • Apply dimensionality reduction (PCA, t-SNE)
  • Use approximate nearest neighbor algorithms (HNSW, IVF)

Advanced Considerations πŸŽ“

Distance Metric Properties

A proper metric must satisfy these axioms:

  1. Non-negativity: d(x,y) β‰₯ 0
  2. Identity: d(x,y) = 0 ⟺ x = y
  3. Symmetry: d(x,y) = d(y,x)
  4. Triangle inequality: d(x,z) ≀ d(x,y) + d(y,z)

πŸ” Fun Fact: Cosine similarity isn't technically a metric (violates triangle inequality), but cosine distance (1 - similarity) is!

Weighted Distance Metrics

Sometimes dimensions aren't equally important:

def weighted_euclidean(a, b, weights):
    return math.sqrt(sum(w * (x - y)**2 
                        for x, y, w in zip(a, b, weights)))

## Example: Emphasize first dimension 3x more
vec1 = [2, 3, 1]
vec2 = [4, 1, 3]
weights = [3.0, 1.0, 1.0]  # First dim 3x more important

dist = weighted_euclidean(vec1, vec2, weights)

Minkowski Distance (Generalized)

Both Euclidean and Manhattan are special cases of Minkowski distance:

minkowski_distance(x, y, p) = (Ξ£|xα΅’ - yα΅’|α΅–)^(1/p)
  • p = 1: Manhattan distance
  • p = 2: Euclidean distance
  • p = ∞: Chebyshev distance (maximum difference in any dimension)

Vector Databases in Production

Modern vector databases optimize these operations:

DatabaseSupported MetricsIndex Type
PineconeCosine, Euclidean, Dot ProductProprietary
WeaviateCosine, Euclidean, Manhattan, DotHNSW
MilvusAll + Hamming, JaccardIVF, HNSW, Annoy
FAISSEuclidean, Inner ProductIVF, HNSW, PQ
QdrantCosine, Euclidean, Dot ProductHNSW

πŸ’‘ Production Tip: These databases use Approximate Nearest Neighbor (ANN) algorithmsβ€”they trade tiny accuracy losses (<1%) for massive speed gains (100-1000x faster).

Key Takeaways 🎯

  1. Cosine similarity measures angle/directionβ€”perfect for text embeddings and semantic search. Ignores magnitude.

  2. Euclidean distance measures straight-line distanceβ€”good when scale matters (images, physical measurements).

  3. Manhattan distance measures grid-based distanceβ€”efficient for high dimensions, outlier-resistant.

  4. Different metrics, different insights: Choose based on your data type and what "similarity" means in your domain.

  5. Normalize vectors when using Euclidean if scales differ. Cosine handles this automatically.

  6. For RAG and semantic search: Cosine similarity is the standard (used by OpenAI, Cohere, Anthropic embeddings).

  7. Performance matters: Manhattan < Euclidean < Cosine in computational cost. Pre-normalize for cosine to speed it up.

  8. Watch for edge cases: Zero vectors break cosine. High dimensions weaken Euclidean. Always validate!

Quick Reference Card πŸ“‹

πŸ“‹ Distance Metrics at a Glance

MetricFormula IntuitionRangeBest ForSpeed
Cosine SimilarityAngle between vectors-1 to +1Text, embeddingsSlowest
Euclidean DistanceStraight-line distance0 to ∞Images, coordinatesMedium
Manhattan DistanceGrid-path distance0 to ∞Sparse dataFastest

Memory Hook 🧠:

  • Cosine for Context (text/meaning)
  • Euclidean for Exact position
  • Manhattan for Massive dimensions

Quick Decision 🎯:
❓ "Is it text/language?" β†’ Cosine
❓ "Is it an image/continuous?" β†’ Euclidean
❓ "Is it sparse/high-D?" β†’ Manhattan

πŸ“š Further Study

Deepen your understanding with these resources:

  1. "Similarity Measures for Text Document Clustering" - Academic overview of distance metrics in NLP contexts https://www.sciencedirect.com/topics/computer-science/cosine-similarity

  2. Pinecone Learning Center: "Distance Metrics in Vector Search" - Practical guide from a leading vector database https://www.pinecone.io/learn/distance-metrics/

  3. FAISS Documentation (Meta AI) - Deep dive into optimized similarity search implementations https://github.com/facebookresearch/faiss/wiki

Now that you understand how to measure similarity, you're ready to build powerful semantic search systems. Next up: learning about vector embeddings and how to generate them from raw text! πŸš€