Vector DB Selection

Compare popular vector databases, their features, scaling characteristics, and when to use each solution.

Vector DB Selection

Choosing the right vector database is a critical decision that impacts search performance, scalability, and cost in modern AI applications. Master vector database selection with free flashcards and spaced repetition practice. This lesson covers evaluation criteria for vector databases, performance benchmarking, deployment considerations, and migration strategies—essential concepts for building production-ready RAG (Retrieval-Augmented Generation) systems.

Welcome to Vector Database Selection 💾

The vector database landscape has exploded in recent years. With dozens of options ranging from specialized vector stores like Pinecone and Weaviate to vector extensions for traditional databases like PostgreSQL with pgvector, making the right choice can feel overwhelming. This lesson provides a structured framework for evaluating and selecting the vector database that best fits your specific use case, technical requirements, and organizational constraints.

You'll learn how to assess databases across critical dimensions including query performance, indexing strategies, scalability patterns, cost structures, and operational complexity. By the end of this lesson, you'll have a practical decision-making framework and understand the trade-offs involved in each major vector database option.

Core Concepts: Understanding Vector Database Selection 🔍

The Vector Database Landscape

Vector databases fall into several distinct categories, each with unique architectural approaches:

Specialized Vector Databases are purpose-built for vector operations. Examples include:

Pinecone: Fully managed, cloud-native service with automatic scaling
Weaviate: Open-source with GraphQL API and hybrid search capabilities
Qdrant: Rust-based with filtering support and efficient memory usage
Milvus: Distributed architecture designed for massive scale

Vector Extensions for Traditional Databases add vector capabilities to existing systems:

PostgreSQL + pgvector: Vector extension for the world's most popular open-source database
MongoDB Atlas Vector Search: Native vector search in MongoDB
Redis with RediSearch: In-memory vector search with ultra-low latency
Elasticsearch with dense_vector: Vector search alongside full-text capabilities

Cloud Provider Solutions integrate with broader cloud ecosystems:

AWS OpenSearch with k-NN: Vector search in the AWS ecosystem
Azure Cognitive Search: Microsoft's integrated search solution
Google Cloud Vertex AI Matching Engine: Scalable vector similarity matching

💡 Tip: Starting with a vector extension for your existing database can minimize operational overhead and accelerate time-to-market, especially for prototypes and MVPs.

Critical Evaluation Dimensions

1. Query Performance and Indexing 🚀

Query performance is determined by the indexing algorithm used for approximate nearest neighbor (ANN) search:

Algorithm	Speed	Recall	Memory	Best For
HNSW (Hierarchical Navigable Small World)	⚡ Very Fast	🎯 High (>95%)	💾 High	Low-latency applications
IVF (Inverted File Index)	⚡ Fast	🎯 Medium-High	💾 Medium	Balanced use cases
LSH (Locality Sensitive Hashing)	⚡ Fast	🎯 Medium	💾 Low	Memory-constrained systems
FAISS Flat	🐌 Slow	🎯 Perfect (100%)	💾 Low	Small datasets, benchmarking
PQ (Product Quantization)	⚡ Very Fast	🎯 Medium	💾 Very Low	Large-scale, compressed storage

Key Performance Metrics to Benchmark:

QPS (Queries Per Second): Throughput under typical load
p95/p99 Latency: Tail latency for user-facing applications
Recall@k: Percentage of true nearest neighbors retrieved
Index Build Time: How long to index your corpus
Memory Footprint: RAM required per million vectors

🔧 Try this: Run the ann-benchmarks suite against your candidate databases using a sample of your actual production data. Generic benchmarks often don't reflect your specific use case.

2. Scalability Architecture 📈

SCALABILITY PATTERNS

┌────────────────────────────────────────────┐
│ VERTICAL SCALING (Scale Up)               │
│                                            │
│  💻 → 💻💻 → 💻💻💻                        │
│  Single machine, more resources            │
│  ✓ Simple                                  │
│  ✗ Hardware limits                         │
│  Examples: pgvector, Qdrant (single-node) │
└────────────────────────────────────────────┘

┌────────────────────────────────────────────┐
│ HORIZONTAL SCALING (Scale Out)            │
│                                            │
│  💻 → 💻💻💻💻💻                           │
│  Multiple machines, distributed            │
│  ✓ Unlimited scale                         │
│  ✗ Complexity, consistency challenges      │
│  Examples: Milvus, Pinecone, Weaviate     │
└────────────────────────────────────────────┘

┌────────────────────────────────────────────┐
│ HYBRID SCALING                             │
│                                            │
│  Scale vertically per shard, then          │
│  horizontally across shards                │
│  Examples: Elasticsearch, MongoDB          │
└────────────────────────────────────────────┘

Sharding Strategies:

Collection-based sharding: Different vector collections on different nodes
Hash-based sharding: Vectors distributed by ID hash
Range-based sharding: Vectors partitioned by metadata ranges
Custom sharding: Application-defined partitioning logic

⚠️ Important: Distributed systems introduce complexity. Only scale horizontally when your dataset truly requires it (typically >10M vectors or >100GB).

3. Filtering and Hybrid Search 🔎

Most real-world applications need more than pure vector similarity. You'll often need to combine:

Vector Search + Metadata Filtering:

Find similar documents:
  WHERE category = "technical"
  AND published_date > "2023-01-01"
  AND language = "en"
  ORDER BY vector_similarity
  LIMIT 10

Hybrid Search Architectures:

Approach	Description	Best For
Pre-filtering	Filter first, then vector search on subset	Highly selective filters (< 10% of data)
Post-filtering	Vector search first, filter results	Loose filters, need exact top-k
Combined scoring	Weighted combination of vector + BM25	Balancing semantic and keyword relevance
Two-stage retrieval	Broad vector search → rerank with filters	Complex filtering requirements

Database Support for Filtering:

✅ Strong: Weaviate, Qdrant, Elasticsearch, MongoDB
⚠️ Limited: Early versions of Pinecone, basic pgvector
✅ Improving: Most vendors rapidly adding advanced filtering

💡 Tip: Test filtering performance with your actual metadata schema. Some databases show significant slowdown with complex filter conditions.

4. Cost Structure and TCO 💰

Total Cost of Ownership includes multiple factors:

Managed Service Pricing (Typical):

Component	Typical Cost	Billing Model
Vector Storage	$0.10-0.40/GB/month	Per GB stored
Compute (Queries)	$0.05-0.20/1000 queries	Per query or by pod size
Index Operations	$0.01-0.05/1000 writes	Per vector inserted/updated
Data Transfer	$0.05-0.12/GB	Egress charges

Self-Hosted Cost Factors:

Infrastructure: EC2/GCE instances, persistent storage, networking
Engineering Time: Setup, monitoring, upgrades, troubleshooting
Backup & DR: Replication, snapshots, disaster recovery infrastructure
Scaling Overhead: Load balancers, orchestration, monitoring tools

🧮 Cost Estimation Formula:

Monthly TCO = Infrastructure Costs 
            + (Engineer Hours × Hourly Rate)
            + (Downtime Cost × Probability)
            + Licensing Fees (if applicable)

Real-world Cost Comparison Example (1M vectors, 768 dimensions, 100 QPS):

Pinecone: ~$70-200/month (managed, predictable)
AWS OpenSearch: ~$150-300/month (includes other features)
Self-hosted Qdrant: ~$50-100/month (infrastructure only) + engineering time
pgvector on RDS: ~$80-150/month (leverages existing database)

5. Operational Considerations 🛠️

Deployment Models:

DEPLOYMENT COMPLEXITY SPECTRUM

┌─────────────────────────────────────────┐
│ 🟢 MANAGED (Lowest Ops Burden)         │
│                                         │
│ • Pinecone, Weaviate Cloud              │
│ • Zero infrastructure management        │
│ • Automatic scaling & updates           │
│ • Higher cost, less control             │
└─────────────────────────────────────────┘
         ↓
┌─────────────────────────────────────────┐
│ 🟡 SEMI-MANAGED                         │
│                                         │
│ • AWS OpenSearch, MongoDB Atlas         │
│ • Some configuration required           │
│ • Integrated monitoring                 │
│ • Moderate cost & control               │
└─────────────────────────────────────────┘
         ↓
┌─────────────────────────────────────────┐
│ 🟠 CONTAINERIZED                        │
│                                         │
│ • Qdrant, Milvus on Kubernetes          │
│ • You manage orchestration              │
│ • Flexible, portable                    │
│ • Requires k8s expertise                │
└─────────────────────────────────────────┘
         ↓
┌─────────────────────────────────────────┐
│ 🔴 SELF-HOSTED (Highest Ops Burden)    │
│                                         │
│ • DIY on VMs, pgvector on PostgreSQL    │
│ • Complete control & customization      │
│ • Lowest cost (infrastructure only)     │
│ • Highest engineering overhead          │
└─────────────────────────────────────────┘

Monitoring and Observability:

Essential metrics to track:

Query latency distribution (p50, p95, p99)
Index memory usage and growth rate
CPU/GPU utilization during queries and indexing
Cache hit rates (if applicable)
Replication lag (distributed systems)
Failed query rate and error types

Backup and Disaster Recovery:

Database	Backup Method	Recovery Time
Pinecone	Automated, managed	Minutes (automatic)
pgvector	PostgreSQL backups (pg_dump, WAL)	Minutes to hours
Weaviate	Snapshots, volume backups	Hours (depends on size)
Milvus	Snapshot + object storage	Hours to days

6. Ecosystem Integration 🔌

Language SDK Support:

Most vector databases provide official SDKs for:

Python (universal support, priority for ML workflows)
JavaScript/TypeScript (web applications)
Go (high-performance services)
Java (enterprise applications)

Check for:

LangChain integration: Simplifies RAG pipeline development
LlamaIndex support: Streamlines indexing and retrieval
OpenAI/Anthropic compatibility: Easy embedding integration
Hugging Face integration: Access to open-source models

Data Ingestion Patterns:

COMMON INGESTION ARCHITECTURES

┌──────────────────────────────────────────┐
│ BATCH INGESTION                          │
│                                          │
│ Source Data → ETL Pipeline → Vector DB  │
│ (Daily/hourly bulk updates)              │
└──────────────────────────────────────────┘

┌──────────────────────────────────────────┐
│ STREAMING INGESTION                      │
│                                          │
│ Events → Kafka → Processor → Vector DB  │
│ (Real-time updates)                      │
└──────────────────────────────────────────┘

┌──────────────────────────────────────────┐
│ HYBRID                                   │
│                                          │
│ Historical: Batch (nightly)              │
│ Recent: Streaming (real-time)            │
└──────────────────────────────────────────┘

Practical Examples: Vector Database Selection in Action 🎯

Example 1: Startup MVP - Document Search

Scenario: A startup building a document search feature for a SaaS product needs to choose their first vector database.

Requirements:

100K documents initially, growing to 1M in year 1
50-100 QPS average, 200 QPS peak
Budget: <$500/month
Team: 2 backend engineers, no ML specialists
Need to launch in 6 weeks

Decision Process:

Option	Pros	Cons	Verdict
Pinecone	Zero ops, fast setup, great docs	Higher cost at scale, vendor lock-in	🥇 Top choice
pgvector	Use existing PostgreSQL, low cost	Performance at 1M vectors, limited features	🥈 Strong alternative
Self-hosted Qdrant	Open source, good performance	Ops overhead, setup time	❌ Too complex for MVP

Recommendation: Start with Pinecone for rapid development. The managed service allows the team to focus on product features rather than infrastructure. Cost is within budget (~$150-300/month initially). Plan to re-evaluate at 5M+ vectors if cost becomes prohibitive.

Implementation snippet:

import pinecone
from langchain.vectorstores import Pinecone
from langchain.embeddings import OpenAIEmbeddings

## Initialize Pinecone
pinecone.init(api_key="your-key", environment="us-west1-gcp")

## Create index
if "docs-search" not in pinecone.list_indexes():
    pinecone.create_index(
        "docs-search", 
        dimension=1536,  # OpenAI ada-002
        metric="cosine"
    )

## Simple integration with LangChain
vectorstore = Pinecone.from_documents(
    documents, 
    OpenAIEmbeddings(), 
    index_name="docs-search"
)

Example 2: Enterprise System - Customer Support Knowledge Base

Scenario: A Fortune 500 company wants to build an internal AI assistant for 10,000 support agents.

Requirements:

50M historical support tickets and knowledge articles
1000+ concurrent users during peak hours
Multi-region deployment (US, EU, APAC)
Strict data residency requirements
Enterprise SLA: 99.9% uptime
Existing tech stack: AWS, PostgreSQL, Kubernetes

Decision Process:

Key considerations:

Scale: 50M vectors requires horizontal scalability
Compliance: Data residency rules out some managed services
Integration: Must work with existing AWS infrastructure
Support: Need enterprise support contracts

Top candidates:

Database	Deployment	Fit Score
AWS OpenSearch	Managed in VPC, multi-region	9/10 - Best AWS integration
Weaviate (self-hosted)	Kubernetes, each region	8/10 - Great features, more ops
Milvus on EKS	Containerized, distributed	7/10 - Excellent scale, higher complexity

Recommendation: AWS OpenSearch with k-NN plugin. Reasons:

Native AWS integration with existing infrastructure
Data residency control via AWS regions
Elasticsearch compatibility (team already knows it)
Enterprise support through AWS
Can combine vector search with full-text and aggregations

Architecture:

┌─────────────────────────────────────────────┐
│           US-EAST-1 (Primary)               │
│                                             │
│  ┌──────────────┐      ┌──────────────┐    │
│  │ OpenSearch   │      │ OpenSearch   │    │
│  │ Master Nodes │◄────►│ Data Nodes   │    │
│  │   (3 nodes)  │      │  (10 nodes)  │    │
│  └──────────────┘      └──────────────┘    │
│                              │              │
└──────────────────────────────┼──────────────┘
                               │
                     Cross-Region Replication
                               │
┌──────────────────────────────┼──────────────┐
│           EU-WEST-1                         │
│                              ↓              │
│  ┌──────────────┐      ┌──────────────┐    │
│  │ OpenSearch   │      │ OpenSearch   │    │
│  │ Master Nodes │◄────►│ Data Nodes   │    │
│  │   (3 nodes)  │      │   (8 nodes)  │    │
│  └──────────────┘      └──────────────┘    │
└─────────────────────────────────────────────┘

Example 3: E-commerce - Visual Product Search

Scenario: An online retailer wants to add "search by image" functionality for their 5M product catalog.

Requirements:

5M products, each with multiple images (15M total vectors)
Image embeddings (512 dimensions from CLIP model)
Real-time inventory filtering (only show in-stock products)
Mobile app integration (low latency critical)
Seasonal traffic spikes (3x during holidays)
Need sub-100ms p95 latency

Special Considerations:

Frequent metadata updates: Stock status changes constantly
High read-to-write ratio: 1000:1
Filtering is critical: Must combine similarity with inventory/price/category

Decision Process:

Database	Key Strength	Weakness
Qdrant	Excellent filtering, Rust performance	Need to self-host or use Qdrant Cloud
Weaviate	Hybrid search, good filtering	Higher resource usage
Redis + RediSearch	Ultra-low latency, in-memory	Memory cost, limited scale

Recommendation: Qdrant (managed Qdrant Cloud). Reasons:

Industry-leading filtering performance for metadata combinations
Efficient memory usage (important for 15M vectors)
Native support for payload indexing (category, price, stock)
Can handle metadata updates without full re-indexing
Rust implementation delivers consistent low latency

Filtering query example:

from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, Range

client = QdrantClient(url="https://xyz.qdrant.io", api_key="key")

results = client.search(
    collection_name="products",
    query_vector=image_embedding,  # 512-dim CLIP vector
    query_filter=Filter(
        must=[
            FieldCondition(key="in_stock", match={"value": True}),
            FieldCondition(key="category", match={"value": "shoes"}),
            FieldCondition(key="price", range=Range(gte=50, lte=200)),
        ]
    ),
    limit=20
)

Example 4: Migration Strategy - Moving from Pinecone to Self-Hosted

Scenario: A growing company has 20M vectors in Pinecone. Monthly cost has grown to $2000+, and they want to reduce costs by moving to self-hosted Qdrant.

Migration Plan:

📋 Zero-Downtime Migration Steps

Phase	Action	Duration
1. Setup	Deploy Qdrant cluster, configure monitoring	1 week
2. Backfill	Export from Pinecone, import to Qdrant (parallel with production)	3-5 days
3. Dual-Write	Write new vectors to both systems, verify consistency	1 week
4. Shadow Read	Query both systems, compare results and latency	1 week
5. Canary	Route 5% → 25% → 50% traffic to Qdrant	1 week
6. Cutover	100% traffic to Qdrant, maintain Pinecone as backup	1 day
7. Decommission	After 2 weeks of stable operation, delete Pinecone index	1 day

Cost comparison after migration:

Before (Pinecone): $2000/month
After (Qdrant on AWS): ~$600/month infrastructure + engineering overhead
Net savings: ~$1400/month (~70% reduction)

⚠️ Caution: Factor in the engineering cost of migration (~200 hours) and ongoing maintenance (10-20 hours/month). Migration makes sense at scale, but not always for smaller datasets.

Common Mistakes to Avoid ⚠️

1. Choosing Based Only on Benchmarks

The Mistake: Selecting a database purely because it topped an ANN benchmark without considering your specific use case.

Why It's Wrong: Benchmarks typically test:

Pure vector similarity (no filtering)
Uniform data distribution
Idealized query patterns
Synthetic datasets

Your production system likely has:

Complex metadata filtering requirements
Skewed access patterns (some vectors queried 100x more)
Real-world data with quality issues
Varying embedding dimensions

Better Approach: Run benchmarks with your actual data and query patterns. Include filters, mixed query types, and realistic concurrency.

2. Underestimating Operational Complexity

The Mistake: Choosing a self-hosted solution to save money without accounting for engineering overhead.

Hidden Costs:

Initial setup and configuration (1-2 weeks)
Monitoring and alerting setup
Backup and disaster recovery procedures
Security hardening and updates
Scaling operations as data grows
Debugging production issues (often at 2 AM)

💡 Rule of Thumb: If you don't have dedicated DevOps/SRE resources, strongly prefer managed services. The cost difference is usually less than one engineer's salary.

3. Ignoring Filtering Performance

The Mistake: Testing only pure vector similarity, then discovering in production that filtered queries are 10x slower.

Real-world Impact:

## This is fast
results = db.search(vector=embedding, limit=10)
## Response time: 20ms

## This might be slow depending on database
results = db.search(
    vector=embedding, 
    filter={"user_id": 12345, "category": "tech", "date": ">2023-01-01"},
    limit=10
)
## Response time: 200ms or timeout!

Better Approach: Benchmark with representative filter selectivity. Test cases where filters match 0.1%, 1%, 10%, and 50% of your data.

4. Over-Engineering for Day 1

The Mistake: Setting up a distributed, multi-region vector database when you have 100K vectors and 10 QPS.

Unnecessary Complexity:

Kubernetes clusters
Multi-region replication
Complex sharding strategies
Custom load balancing

Better Approach: Start simple! A single PostgreSQL instance with pgvector can handle millions of vectors and hundreds of QPS. Scale only when you have clear evidence you need it.

🧠 Memory Device - SCALE Principle:

Start simple
Cost-optimize later
Add complexity when needed
Learn from real usage
Evolve architecture gradually

5. Neglecting Data Migration Strategy

The Mistake: Picking a database without considering how you'll migrate if needed later.

Migration Challenges:

Different embedding formats or normalization
Metadata schema incompatibilities
Query syntax differences
Different ID formats
Downtime during transition

Better Approach:

Use abstraction layers (LangChain, LlamaIndex) when possible
Export your vectors regularly to standard formats
Document your embedding process and parameters
Design your application with swappable backends

6. Ignoring License and Pricing Changes

The Mistake: Building critical infrastructure on a database without understanding the licensing model or pricing trajectory.

Recent Examples:

Managed services changing pricing structures
Open-source projects adding restrictions
"Free tier" limitations discovered in production

Better Approach:

Read the license carefully (Apache 2.0 vs. SSPL vs. proprietary)
Understand pricing tiers and overage costs
Plan for 10x growth - what will it cost?
Have a backup option identified

Key Takeaways 🎯

📋 Vector Database Selection Quick Reference

Use Case	Recommended Option	Rationale
MVP / Prototype	Pinecone or pgvector	Fast setup, minimal ops
Enterprise / Regulated	AWS OpenSearch or self-hosted	Control, compliance, integration
E-commerce / Heavy Filtering	Qdrant or Weaviate	Excellent filter performance
Massive Scale (100M+ vectors)	Milvus or Pinecone	Proven at scale
Ultra-Low Latency	Redis + RediSearch	In-memory performance
Cost-Sensitive	pgvector or self-hosted Qdrant	Lowest infrastructure cost
Hybrid Search (Vector + Text)	Elasticsearch or Weaviate	Native hybrid capabilities

Essential Decision Framework

Ask these questions in order:

Scale: How many vectors? Growth rate?
- <1M: Single-node solutions (pgvector, small Qdrant)
- 1M-10M: Vertical scaling or small cluster
- 10M: Distributed architecture
Performance: What are your latency requirements?
- <50ms: In-memory (Redis) or HNSW indexes
- <200ms: Most modern vector databases
- 200ms: Even basic solutions work
Filtering: How complex are your metadata queries?
- None: Any vector database works
- Simple: Most databases adequate
- Complex: Qdrant, Weaviate, or Elasticsearch
Operations: What's your team's capacity?
- No ops team: Managed services only
- DevOps available: Self-hosted is viable
- SRE team: Any option works
Budget: What can you spend?
- <$100/month: pgvector or self-hosted
- $100-1000/month: Managed services
- $1000/month: Any solution, optimize TCO

Future-Proofing Considerations

The vector database landscape is evolving rapidly. Design with flexibility:

Use abstraction layers (LangChain, LlamaIndex) to minimize coupling
Export data regularly in portable formats
Monitor costs and performance continuously
Re-evaluate annually as new options emerge
Stay vendor-neutral in architecture when possible

🤔 Did you know? Many companies run multiple vector databases in production—one for high-QPS simple queries (like autocomplete) and another for complex analytical queries (like similarity analysis). Don't assume you need to pick just one!

Further Study 📚

Deepen your understanding with these resources:

ANN Benchmarks - Compare vector search algorithms with real benchmarks
Vector Database Comparison Guide - Comprehensive feature comparison across databases
LangChain Vector Store Documentation - Integration guides for all major vector databases

Congratulations! 🎉 You now have a structured framework for evaluating and selecting vector databases. Remember: the "best" database depends entirely on your specific requirements. Start simple, measure actual performance with your data, and evolve your architecture as you scale. The most important decision is making a choice and moving forward—you can always migrate later if needed!

Practice your understanding with the free flashcards embedded throughout this lesson, and test your knowledge with the quiz questions below. Master these concepts, and you'll be well-equipped to make confident vector database decisions for your AI search and RAG applications.

📝

Ready to practice?

This lesson has 15 questions to help you learn