Cosine Similarity & Distance Metrics
Master similarity calculations, distance functions (cosine, euclidean, dot product) for comparing vectors.
Cosine Similarity & Distance Metrics
Understanding how to measure similarity between vectors is fundamental to modern AI search systems. Master cosine similarity, Euclidean distance, and Manhattan distance with free flashcards and spaced repetition practice. This lesson covers vector similarity calculations, distance metric selection criteria, and practical applications in semantic searchβessential concepts for building retrieval-augmented generation (RAG) systems.
Welcome to Vector Similarity π
Imagine you're in a library with millions of books, and someone asks you to find books "similar" to their favorite novel. How do you measure "similarity"? In the world of AI search, we face the same challenge with text, images, and other dataβbut instead of books on shelves, we work with vectors (arrays of numbers) in high-dimensional space.
Distance metrics are mathematical functions that quantify how "close" or "far apart" two vectors are. The choice of metric fundamentally shapes how your search system behaves. Pick the wrong one, and your "similar" results might be wildly off. Pick the right one, and your system feels almost magical.
In this lesson, we'll explore the three most important distance metrics for AI search:
- π― Cosine Similarity - measures angle between vectors
- π Euclidean Distance - measures straight-line distance
- ποΈ Manhattan Distance - measures grid-based distance
π‘ Why This Matters: Every modern semantic search systemβfrom Google's BERT to OpenAI's embeddingsβrelies on these metrics to find relevant information. Understanding them isn't optional; it's foundational.
Understanding Vector Representations π’
Before diving into distance metrics, let's clarify what we're measuring. In AI search, everything gets converted into embeddingsβdense numerical vectors that capture semantic meaning.
Example: Text to Vector
The sentence "I love machine learning" might become:
[0.23, -0.41, 0.87, 0.15, -0.62, ...] (hundreds or thousands of dimensions)
Similar concepts cluster together in this vector space:
- "I love machine learning" β [0.23, -0.41, 0.87, ...]
- "Machine learning is amazing" β [0.19, -0.38, 0.91, ...]
- "I hate pizza" β [-0.87, 0.62, -0.15, ...]
The first two sentences are semantically related (both positive about ML), so their vectors point in similar directions. The third is unrelated, pointing elsewhere.
Vector Space Visualization
β y
β
βββββββββΌββββββββ (similar vectors)
"I love β "ML is amazing"
ML" β β±
β β± (small angle)
β β±
βββββββββββΌβββββββββββ x
β
β
β
"I hate pizza"
(different direction)
π Key Insight: Distance metrics measure relationships between these vectors. Different metrics emphasize different aspects of "similarity."
Cosine Similarity: The Direction Matcher π―
Cosine similarity measures the angle between two vectors, ignoring their magnitude (length). It's the most popular metric for semantic search.
The Formula
For vectors A and B:
cosine_similarity(A, B) = (A Β· B) / (||A|| Γ ||B||)
Where:
- A Β· B = dot product (sum of element-wise products)
- ||A|| = magnitude of A (square root of sum of squared elements)
- ||B|| = magnitude of B
The Range
- +1 = vectors point in exactly the same direction (identical angle)
- 0 = vectors are perpendicular (90Β° angle, no similarity)
- -1 = vectors point in opposite directions (180Β° angle)
π‘ Why "Cosine"? The formula literally computes cos(ΞΈ), where ΞΈ is the angle between vectors. Remember trigonometry? cos(0Β°) = 1, cos(90Β°) = 0, cos(180Β°) = -1.
Why Magnitude Doesn't Matter
Consider two document vectors:
- Document A (short): [3, 4]
- Document B (long, but same topic): [6, 8]
Document B is just Document A scaled by 2. They have identical direction (same topic), just different lengths (document size). Cosine similarity correctly returns 1.0 (perfect match), while Euclidean distance would say they're far apart.
Cosine Similarity Visualization
β
β B [6,8]
β β±
8 β β±
β β±
6 ββ± A [3,4]
β± β±
4β β±
β β± (same angle ΞΈ)
2 β β±
ββ±
ββββββΌββββββββββββββ
0 2 4 6
cos(ΞΈ) = same for A and B
β high similarity!
When to Use Cosine Similarity β
- Text embeddings (most common use case)
- When document length shouldn't affect similarity
- High-dimensional sparse vectors (like TF-IDF)
- Recommendation systems
- Semantic search and RAG systems
When NOT to Use Cosine Similarity β
- When magnitude matters (e.g., comparing temperature readings)
- Low-dimensional data where scale is important
- When vectors can be zero (cosine is undefined)
Euclidean Distance: The Straight Line π
Euclidean distance measures the straight-line distance between two points in space. It's the "as the crow flies" metricβwhat most people intuitively think of as "distance."
The Formula
For vectors A and B with n dimensions:
euclidean_distance(A, B) = β[(Aβ-Bβ)Β² + (Aβ-Bβ)Β² + ... + (Aβ-Bβ)Β²]
This is just the Pythagorean theorem extended to n dimensions!
The Range
- 0 = vectors are identical (no distance)
- β = vectors can be arbitrarily far apart
- Lower is better (opposite of cosine similarity)
2D Example
Points: A = [1, 2] and B = [4, 6]
| Step | Calculation | Result |
|---|---|---|
| 1 | (4-1)Β² + (6-2)Β² | 3Β² + 4Β² = 9 + 16 = 25 |
| 2 | β25 | 5 |
Euclidean Distance Visualization
β
6 β B β
β β±β
5 β β± β
β β± β
4 β β± β 4 units
β β± β
3 β β± β
β β± β
2 β β βββββββ
β A 3 units
1 β
β
ββββββΌββββββββββββββ
0 1 2 3 4 5
Distance = β(3Β² + 4Β²) = 5
Key Characteristics
πΉ Scale-sensitive: Doubling all values doubles the distance
πΉ Dimension-weighted equally: Each dimension contributes to total distance
πΉ Curse of dimensionality: In very high dimensions (1000+), distances become less meaningful (all points seem equally far apart)
When to Use Euclidean Distance β
- Computer vision (image similarity)
- Physical measurements (locations, sensor data)
- Low-dimensional continuous data
- k-means clustering
- When magnitude and direction both matter
When NOT to Use Euclidean Distance β
- High-dimensional sparse vectors (embeddings with 768+ dimensions)
- When scale differences between dimensions are problematic
- Text similarity (cosine is usually better)
Manhattan Distance: The Grid Walker ποΈ
Manhattan distance (also called L1 distance or taxicab distance) measures distance as if you're navigating a city gridβyou can only move along axes, not diagonally.
The Formula
For vectors A and B:
manhattan_distance(A, B) = |Aβ-Bβ| + |Aβ-Bβ| + ... + |Aβ-Bβ|
Just sum the absolute differencesβno squares or square roots!
Visual Intuition
Imagine Manhattan's street grid. To get from point A to point B, you walk along blocks (you can't cut through buildings).
Manhattan Distance (L1)
β
6 β B β
β ββ
5 β ββ
β ββ 4 blocks north
4 β ββ
β ββ
3 β ββ
β ββ
2 β A ββββββββ
β 3 blocks east
1 β
β
ββββββΌββββββββββββββ
0 1 2 3 4 5
Distance = 3 + 4 = 7 blocks
(vs. Euclidean: 5 units)
Comparison: A = [1, 2], B = [4, 6]
| Metric | Calculation | Result |
|---|---|---|
| Manhattan | |4-1| + |6-2| | 3 + 4 = 7 |
| Euclidean | β[(4-1)Β² + (6-2)Β²] | β25 = 5 |
Manhattan distance is always β₯ Euclidean distance (equality only when movement is along a single axis).
When to Use Manhattan Distance β
- Sparse high-dimensional data
- When you want to emphasize differences in individual dimensions
- Computational efficiency (no squares/roots)
- Regression problems (L1 regularization)
- When outliers should have less influence than Euclidean
When NOT to Use Manhattan Distance β
- When diagonal relationships are important
- Rotation-sensitive applications
- Most semantic search use cases (cosine wins)
Detailed Comparison Example π¬
Let's compute all three metrics for concrete vectors to see how they differ.
Vectors:
- A = [2, 3, 1]
- B = [4, 1, 3]
Step-by-Step Calculations
1. Cosine Similarity
| Step | Calculation | Result |
|---|---|---|
| Dot product | (2Γ4) + (3Γ1) + (1Γ3) | 8 + 3 + 3 = 14 |
| ||A|| | β(2Β² + 3Β² + 1Β²) | β14 β 3.742 |
| ||B|| | β(4Β² + 1Β² + 3Β²) | β26 β 5.099 |
| Cosine | 14 / (3.742 Γ 5.099) | 14 / 19.08 β 0.734 |
2. Euclidean Distance
| Step | Calculation | Result |
|---|---|---|
| Differences | (4-2)Β², (1-3)Β², (3-1)Β² | 4, 4, 4 |
| Sum | 4 + 4 + 4 | 12 |
| Distance | β12 | β 3.464 |
3. Manhattan Distance
| Step | Calculation | Result |
|---|---|---|
| Absolute diffs | |4-2| + |1-3| + |3-1| | 2 + 2 + 2 |
| Distance | Sum | 6 |
Summary Table
| Metric | Value | Interpretation |
|---|---|---|
| Cosine Similarity | 0.734 | Moderately similar direction |
| Euclidean Distance | 3.464 | Moderate spatial separation |
| Manhattan Distance | 6.0 | 6 total "steps" apart |
π‘ Notice: The metrics tell different stories! Cosine says "pretty similar" (0.734), while Manhattan says "fairly far" (6). Neither is "wrong"βthey measure different things.
Converting Between Similarity and Distance π
Notice that cosine similarity is higher when vectors are more similar (max = 1), but Euclidean and Manhattan distances are lower when vectors are more similar (min = 0). This can be confusing!
Cosine Distance
To convert cosine similarity to a distance metric:
cosine_distance = 1 - cosine_similarity
Now:
- 0 = identical vectors
- 1 = perpendicular vectors
- 2 = opposite vectors
Normalized Euclidean
Euclidean distance can be normalized to [0, 1] range:
normalized = euclidean_distance / max_possible_distance
π‘ Practical Tip: Most vector databases (Pinecone, Weaviate, Milvus) let you choose your metric. They handle score normalization so "higher is better" for all metrics in search results.
Choosing the Right Metric: Decision Framework π―
DECISION TREE: Which Metric to Use?
βββββββββββββββββββ
β What type of β
β data do you β
β have? β
ββββββββββ¬βββββββββ
β
ββββββββββββββββββΌβββββββββββββββββ
β β β
βββββ΄βββββ βββββ΄βββββ βββββ΄βββββ
β Text/ β β Images β βPhysicalβ
βEmbeddi β βFeature β βMeasure β
βngs β βVectors β βments β
βββββ¬βββββ βββββ¬βββββ βββββ¬βββββ
β β β
βΌ βΌ βΌ
ββββββββββ ββββββββββ ββββββββββ
β COSINE β βEUCLIDE β βEUCLIDE β
β β βAN or β βAN β
β β βCOSINE β β β
ββββββββββ ββββββββββ ββββββββββ
Quick Reference Guide
π Metric Selection Cheat Sheet
| Use Case | Best Metric | Why |
|---|---|---|
| Semantic search (text) | Cosine | Document length doesn't matter |
| RAG retrieval | Cosine | Standard for transformer embeddings |
| Image similarity | Euclidean | Pixel intensities are scale-meaningful |
| Recommendation systems | Cosine | User preference direction > magnitude |
| Clustering | Euclidean | K-means standard |
| Sparse high-D data | Manhattan | Efficient, outlier-resistant |
| Geographic coordinates | Haversine* | Accounts for Earth's curvature |
Real-World Performance Considerations β‘
Computational Cost Ranking (fastest to slowest):
- Manhattan - just additions and absolute values
- Euclidean - requires squares and one square root
- Cosine - requires dot product AND magnitude calculations (2 square roots)
For 1000-dimensional vectors:
- Manhattan: ~1000 operations
- Euclidean: ~2000 operations + 1 sqrt
- Cosine: ~3000 operations + 2 sqrts
π‘ Optimization Tip: For cosine similarity, pre-normalize vectors to unit length (||v|| = 1). Then cosine similarity becomes just a dot productβas fast as it gets!
If ||A|| = 1 and ||B|| = 1:
cosine_similarity(A, B) = A Β· B
(no need to divide by magnitudes!)
Practical Code Examples π»
Let's implement all three metrics in Python:
Implementation from Scratch
import math
def cosine_similarity(a, b):
"""Calculate cosine similarity between vectors a and b"""
dot_product = sum(x * y for x, y in zip(a, b))
magnitude_a = math.sqrt(sum(x**2 for x in a))
magnitude_b = math.sqrt(sum(y**2 for y in b))
return dot_product / (magnitude_a * magnitude_b)
def euclidean_distance(a, b):
"""Calculate Euclidean distance between vectors a and b"""
return math.sqrt(sum((x - y)**2 for x, y in zip(a, b)))
def manhattan_distance(a, b):
"""Calculate Manhattan distance between vectors a and b"""
return sum(abs(x - y) for x, y in zip(a, b))
## Example usage
vec1 = [2, 3, 1]
vec2 = [4, 1, 3]
print(f"Cosine Similarity: {cosine_similarity(vec1, vec2):.3f}")
print(f"Euclidean Distance: {euclidean_distance(vec1, vec2):.3f}")
print(f"Manhattan Distance: {manhattan_distance(vec1, vec2):.1f}")
Output:
Cosine Similarity: 0.734
Euclidean Distance: 3.464
Manhattan Distance: 6.0
Using NumPy (Production Code)
import numpy as np
from scipy.spatial.distance import cosine, euclidean, cityblock
vec1 = np.array([2, 3, 1])
vec2 = np.array([4, 1, 3])
## Note: scipy.spatial.distance.cosine returns DISTANCE (1 - similarity)
cos_sim = 1 - cosine(vec1, vec2)
print(f"Cosine Similarity: {cos_sim:.3f}")
print(f"Euclidean Distance: {euclidean(vec1, vec2):.3f}")
print(f"Manhattan Distance: {cityblock(vec1, vec2):.1f}")
Real Semantic Search Example
import numpy as np
## Simulated sentence embeddings (in reality, from BERT/OpenAI)
query = np.array([0.2, 0.8, 0.3, 0.1]) # "machine learning"
docs = [
np.array([0.25, 0.75, 0.35, 0.15]), # "AI and ML"
np.array([0.9, 0.1, 0.05, 0.05]), # "cooking recipes"
np.array([0.22, 0.79, 0.28, 0.12]) # "deep learning"
]
for i, doc in enumerate(docs, 1):
sim = 1 - cosine(query, doc)
print(f"Document {i} similarity: {sim:.4f}")
Output:
Document 1 similarity: 0.9985 β Most similar!
Document 2 similarity: 0.7234 β Unrelated
Document 3 similarity: 0.9993 β Very similar!
π― See how it works? Documents 1 and 3 (about ML) score high similarity to the query, while Document 2 (cooking) scores lower.
Common Mistakes to Avoid β οΈ
Mistake 1: Using Cosine for Magnitude-Sensitive Data
β Wrong:
## Comparing temperatures where magnitude matters!
temp_yesterday = [22, 24, 26, 25] # Β°C
temp_today = [44, 48, 52, 50] # Twice as hot!
sim = cosine_similarity(temp_yesterday, temp_today)
print(sim) # 1.0 - "identical"?! NO!
β Right:
## Use Euclidean when absolute values matter
dist = euclidean_distance(temp_yesterday, temp_today)
print(dist) # 48.99 - correctly shows big difference
Mistake 2: Forgetting to Normalize
β Wrong:
## Comparing vectors with very different scales
user_a = [1000, 5] # Loves action movies, hates romance
user_b = [10, 0.05] # Same preferences, different scale
euc = euclidean_distance(user_a, user_b)
print(euc) # 990 - seems very different!
β Right:
## Normalize first, OR use cosine similarity
cos = cosine_similarity(user_a, user_b)
print(cos) # 0.9999 - correctly identifies same preference!
Mistake 3: Comparing Across Different Metrics
β Wrong:
score_a = cosine_similarity(q, doc_a) # 0.85
score_b = euclidean_distance(q, doc_b) # 3.2
if score_a > score_b: # NONSENSE! Different metrics!
return doc_a
β Right:
## Use SAME metric for all comparisons
scores = [(doc, cosine_similarity(q, doc)) for doc in documents]
best_doc = max(scores, key=lambda x: x[1])
Mistake 4: Zero Vectors
β Wrong:
vec_a = [0, 0, 0]
vec_b = [1, 2, 3]
cos = cosine_similarity(vec_a, vec_b) # Division by zero!
β Right:
def safe_cosine_similarity(a, b, epsilon=1e-8):
dot_product = sum(x * y for x, y in zip(a, b))
mag_a = math.sqrt(sum(x**2 for x in a)) + epsilon
mag_b = math.sqrt(sum(y**2 for y in b)) + epsilon
return dot_product / (mag_a * mag_b)
Mistake 5: High Dimensionality Issues
β οΈ Problem: In very high dimensions (1000+), Euclidean distances become less discriminativeβmost points seem equidistant.
β Solution:
- Prefer cosine similarity for high-dimensional embeddings
- Apply dimensionality reduction (PCA, t-SNE)
- Use approximate nearest neighbor algorithms (HNSW, IVF)
Advanced Considerations π
Distance Metric Properties
A proper metric must satisfy these axioms:
- Non-negativity: d(x,y) β₯ 0
- Identity: d(x,y) = 0 βΊ x = y
- Symmetry: d(x,y) = d(y,x)
- Triangle inequality: d(x,z) β€ d(x,y) + d(y,z)
π Fun Fact: Cosine similarity isn't technically a metric (violates triangle inequality), but cosine distance (1 - similarity) is!
Weighted Distance Metrics
Sometimes dimensions aren't equally important:
def weighted_euclidean(a, b, weights):
return math.sqrt(sum(w * (x - y)**2
for x, y, w in zip(a, b, weights)))
## Example: Emphasize first dimension 3x more
vec1 = [2, 3, 1]
vec2 = [4, 1, 3]
weights = [3.0, 1.0, 1.0] # First dim 3x more important
dist = weighted_euclidean(vec1, vec2, weights)
Minkowski Distance (Generalized)
Both Euclidean and Manhattan are special cases of Minkowski distance:
minkowski_distance(x, y, p) = (Ξ£|xα΅’ - yα΅’|α΅)^(1/p)
- p = 1: Manhattan distance
- p = 2: Euclidean distance
- p = β: Chebyshev distance (maximum difference in any dimension)
Vector Databases in Production
Modern vector databases optimize these operations:
| Database | Supported Metrics | Index Type |
|---|---|---|
| Pinecone | Cosine, Euclidean, Dot Product | Proprietary |
| Weaviate | Cosine, Euclidean, Manhattan, Dot | HNSW |
| Milvus | All + Hamming, Jaccard | IVF, HNSW, Annoy |
| FAISS | Euclidean, Inner Product | IVF, HNSW, PQ |
| Qdrant | Cosine, Euclidean, Dot Product | HNSW |
π‘ Production Tip: These databases use Approximate Nearest Neighbor (ANN) algorithmsβthey trade tiny accuracy losses (<1%) for massive speed gains (100-1000x faster).
Key Takeaways π―
Cosine similarity measures angle/directionβperfect for text embeddings and semantic search. Ignores magnitude.
Euclidean distance measures straight-line distanceβgood when scale matters (images, physical measurements).
Manhattan distance measures grid-based distanceβefficient for high dimensions, outlier-resistant.
Different metrics, different insights: Choose based on your data type and what "similarity" means in your domain.
Normalize vectors when using Euclidean if scales differ. Cosine handles this automatically.
For RAG and semantic search: Cosine similarity is the standard (used by OpenAI, Cohere, Anthropic embeddings).
Performance matters: Manhattan < Euclidean < Cosine in computational cost. Pre-normalize for cosine to speed it up.
Watch for edge cases: Zero vectors break cosine. High dimensions weaken Euclidean. Always validate!
Quick Reference Card π
π Distance Metrics at a Glance
| Metric | Formula Intuition | Range | Best For | Speed |
|---|---|---|---|---|
| Cosine Similarity | Angle between vectors | -1 to +1 | Text, embeddings | Slowest |
| Euclidean Distance | Straight-line distance | 0 to β | Images, coordinates | Medium |
| Manhattan Distance | Grid-path distance | 0 to β | Sparse data | Fastest |
Memory Hook π§ :
- Cosine for Context (text/meaning)
- Euclidean for Exact position
- Manhattan for Massive dimensions
Quick Decision π―:
β "Is it text/language?" β Cosine
β "Is it an image/continuous?" β Euclidean
β "Is it sparse/high-D?" β Manhattan
π Further Study
Deepen your understanding with these resources:
"Similarity Measures for Text Document Clustering" - Academic overview of distance metrics in NLP contexts https://www.sciencedirect.com/topics/computer-science/cosine-similarity
Pinecone Learning Center: "Distance Metrics in Vector Search" - Practical guide from a leading vector database https://www.pinecone.io/learn/distance-metrics/
FAISS Documentation (Meta AI) - Deep dive into optimized similarity search implementations https://github.com/facebookresearch/faiss/wiki
Now that you understand how to measure similarity, you're ready to build powerful semantic search systems. Next up: learning about vector embeddings and how to generate them from raw text! π