You are viewing a preview of this lesson. Sign in to start learning
Back to 2026 Modern AI Search & RAG Roadmap

Vector DB Selection

Compare popular vector databases, their features, scaling characteristics, and when to use each solution.

Vector DB Selection

Choosing the right vector database is a critical decision that impacts search performance, scalability, and cost in modern AI applications. Master vector database selection with free flashcards and spaced repetition practice. This lesson covers evaluation criteria for vector databases, performance benchmarking, deployment considerations, and migration strategiesโ€”essential concepts for building production-ready RAG (Retrieval-Augmented Generation) systems.

Welcome to Vector Database Selection ๐Ÿ’พ

The vector database landscape has exploded in recent years. With dozens of options ranging from specialized vector stores like Pinecone and Weaviate to vector extensions for traditional databases like PostgreSQL with pgvector, making the right choice can feel overwhelming. This lesson provides a structured framework for evaluating and selecting the vector database that best fits your specific use case, technical requirements, and organizational constraints.

You'll learn how to assess databases across critical dimensions including query performance, indexing strategies, scalability patterns, cost structures, and operational complexity. By the end of this lesson, you'll have a practical decision-making framework and understand the trade-offs involved in each major vector database option.

Core Concepts: Understanding Vector Database Selection ๐Ÿ”

The Vector Database Landscape

Vector databases fall into several distinct categories, each with unique architectural approaches:

Specialized Vector Databases are purpose-built for vector operations. Examples include:

  • Pinecone: Fully managed, cloud-native service with automatic scaling
  • Weaviate: Open-source with GraphQL API and hybrid search capabilities
  • Qdrant: Rust-based with filtering support and efficient memory usage
  • Milvus: Distributed architecture designed for massive scale

Vector Extensions for Traditional Databases add vector capabilities to existing systems:

  • PostgreSQL + pgvector: Vector extension for the world's most popular open-source database
  • MongoDB Atlas Vector Search: Native vector search in MongoDB
  • Redis with RediSearch: In-memory vector search with ultra-low latency
  • Elasticsearch with dense_vector: Vector search alongside full-text capabilities

Cloud Provider Solutions integrate with broader cloud ecosystems:

  • AWS OpenSearch with k-NN: Vector search in the AWS ecosystem
  • Azure Cognitive Search: Microsoft's integrated search solution
  • Google Cloud Vertex AI Matching Engine: Scalable vector similarity matching

๐Ÿ’ก Tip: Starting with a vector extension for your existing database can minimize operational overhead and accelerate time-to-market, especially for prototypes and MVPs.

Critical Evaluation Dimensions

1. Query Performance and Indexing ๐Ÿš€

Query performance is determined by the indexing algorithm used for approximate nearest neighbor (ANN) search:

Algorithm Speed Recall Memory Best For
HNSW (Hierarchical Navigable Small World) โšก Very Fast ๐ŸŽฏ High (>95%) ๐Ÿ’พ High Low-latency applications
IVF (Inverted File Index) โšก Fast ๐ŸŽฏ Medium-High ๐Ÿ’พ Medium Balanced use cases
LSH (Locality Sensitive Hashing) โšก Fast ๐ŸŽฏ Medium ๐Ÿ’พ Low Memory-constrained systems
FAISS Flat ๐ŸŒ Slow ๐ŸŽฏ Perfect (100%) ๐Ÿ’พ Low Small datasets, benchmarking
PQ (Product Quantization) โšก Very Fast ๐ŸŽฏ Medium ๐Ÿ’พ Very Low Large-scale, compressed storage

Key Performance Metrics to Benchmark:

  • QPS (Queries Per Second): Throughput under typical load
  • p95/p99 Latency: Tail latency for user-facing applications
  • Recall@k: Percentage of true nearest neighbors retrieved
  • Index Build Time: How long to index your corpus
  • Memory Footprint: RAM required per million vectors

๐Ÿ”ง Try this: Run the ann-benchmarks suite against your candidate databases using a sample of your actual production data. Generic benchmarks often don't reflect your specific use case.

2. Scalability Architecture ๐Ÿ“ˆ
SCALABILITY PATTERNS

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ VERTICAL SCALING (Scale Up)               โ”‚
โ”‚                                            โ”‚
โ”‚  ๐Ÿ’ป โ†’ ๐Ÿ’ป๐Ÿ’ป โ†’ ๐Ÿ’ป๐Ÿ’ป๐Ÿ’ป                        โ”‚
โ”‚  Single machine, more resources            โ”‚
โ”‚  โœ“ Simple                                  โ”‚
โ”‚  โœ— Hardware limits                         โ”‚
โ”‚  Examples: pgvector, Qdrant (single-node) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ HORIZONTAL SCALING (Scale Out)            โ”‚
โ”‚                                            โ”‚
โ”‚  ๐Ÿ’ป โ†’ ๐Ÿ’ป๐Ÿ’ป๐Ÿ’ป๐Ÿ’ป๐Ÿ’ป                           โ”‚
โ”‚  Multiple machines, distributed            โ”‚
โ”‚  โœ“ Unlimited scale                         โ”‚
โ”‚  โœ— Complexity, consistency challenges      โ”‚
โ”‚  Examples: Milvus, Pinecone, Weaviate     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ HYBRID SCALING                             โ”‚
โ”‚                                            โ”‚
โ”‚  Scale vertically per shard, then          โ”‚
โ”‚  horizontally across shards                โ”‚
โ”‚  Examples: Elasticsearch, MongoDB          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Sharding Strategies:

  1. Collection-based sharding: Different vector collections on different nodes
  2. Hash-based sharding: Vectors distributed by ID hash
  3. Range-based sharding: Vectors partitioned by metadata ranges
  4. Custom sharding: Application-defined partitioning logic

โš ๏ธ Important: Distributed systems introduce complexity. Only scale horizontally when your dataset truly requires it (typically >10M vectors or >100GB).

Most real-world applications need more than pure vector similarity. You'll often need to combine:

Vector Search + Metadata Filtering:

Find similar documents:
  WHERE category = "technical"
  AND published_date > "2023-01-01"
  AND language = "en"
  ORDER BY vector_similarity
  LIMIT 10

Hybrid Search Architectures:

Approach Description Best For
Pre-filtering Filter first, then vector search on subset Highly selective filters (< 10% of data)
Post-filtering Vector search first, filter results Loose filters, need exact top-k
Combined scoring Weighted combination of vector + BM25 Balancing semantic and keyword relevance
Two-stage retrieval Broad vector search โ†’ rerank with filters Complex filtering requirements

Database Support for Filtering:

  • โœ… Strong: Weaviate, Qdrant, Elasticsearch, MongoDB
  • โš ๏ธ Limited: Early versions of Pinecone, basic pgvector
  • โœ… Improving: Most vendors rapidly adding advanced filtering

๐Ÿ’ก Tip: Test filtering performance with your actual metadata schema. Some databases show significant slowdown with complex filter conditions.

4. Cost Structure and TCO ๐Ÿ’ฐ

Total Cost of Ownership includes multiple factors:

Managed Service Pricing (Typical):

Component Typical Cost Billing Model
Vector Storage $0.10-0.40/GB/month Per GB stored
Compute (Queries) $0.05-0.20/1000 queries Per query or by pod size
Index Operations $0.01-0.05/1000 writes Per vector inserted/updated
Data Transfer $0.05-0.12/GB Egress charges

Self-Hosted Cost Factors:

  • Infrastructure: EC2/GCE instances, persistent storage, networking
  • Engineering Time: Setup, monitoring, upgrades, troubleshooting
  • Backup & DR: Replication, snapshots, disaster recovery infrastructure
  • Scaling Overhead: Load balancers, orchestration, monitoring tools

๐Ÿงฎ Cost Estimation Formula:

Monthly TCO = Infrastructure Costs 
            + (Engineer Hours ร— Hourly Rate)
            + (Downtime Cost ร— Probability)
            + Licensing Fees (if applicable)

Real-world Cost Comparison Example (1M vectors, 768 dimensions, 100 QPS):

  • Pinecone: ~$70-200/month (managed, predictable)
  • AWS OpenSearch: ~$150-300/month (includes other features)
  • Self-hosted Qdrant: ~$50-100/month (infrastructure only) + engineering time
  • pgvector on RDS: ~$80-150/month (leverages existing database)
5. Operational Considerations ๐Ÿ› ๏ธ

Deployment Models:

DEPLOYMENT COMPLEXITY SPECTRUM

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ ๐ŸŸข MANAGED (Lowest Ops Burden)         โ”‚
โ”‚                                         โ”‚
โ”‚ โ€ข Pinecone, Weaviate Cloud              โ”‚
โ”‚ โ€ข Zero infrastructure management        โ”‚
โ”‚ โ€ข Automatic scaling & updates           โ”‚
โ”‚ โ€ข Higher cost, less control             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ ๐ŸŸก SEMI-MANAGED                         โ”‚
โ”‚                                         โ”‚
โ”‚ โ€ข AWS OpenSearch, MongoDB Atlas         โ”‚
โ”‚ โ€ข Some configuration required           โ”‚
โ”‚ โ€ข Integrated monitoring                 โ”‚
โ”‚ โ€ข Moderate cost & control               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ ๐ŸŸ  CONTAINERIZED                        โ”‚
โ”‚                                         โ”‚
โ”‚ โ€ข Qdrant, Milvus on Kubernetes          โ”‚
โ”‚ โ€ข You manage orchestration              โ”‚
โ”‚ โ€ข Flexible, portable                    โ”‚
โ”‚ โ€ข Requires k8s expertise                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ ๐Ÿ”ด SELF-HOSTED (Highest Ops Burden)    โ”‚
โ”‚                                         โ”‚
โ”‚ โ€ข DIY on VMs, pgvector on PostgreSQL    โ”‚
โ”‚ โ€ข Complete control & customization      โ”‚
โ”‚ โ€ข Lowest cost (infrastructure only)     โ”‚
โ”‚ โ€ข Highest engineering overhead          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Monitoring and Observability:

Essential metrics to track:

  • Query latency distribution (p50, p95, p99)
  • Index memory usage and growth rate
  • CPU/GPU utilization during queries and indexing
  • Cache hit rates (if applicable)
  • Replication lag (distributed systems)
  • Failed query rate and error types

Backup and Disaster Recovery:

Database Backup Method Recovery Time
Pinecone Automated, managed Minutes (automatic)
pgvector PostgreSQL backups (pg_dump, WAL) Minutes to hours
Weaviate Snapshots, volume backups Hours (depends on size)
Milvus Snapshot + object storage Hours to days
6. Ecosystem Integration ๐Ÿ”Œ

Language SDK Support:

Most vector databases provide official SDKs for:

  • Python (universal support, priority for ML workflows)
  • JavaScript/TypeScript (web applications)
  • Go (high-performance services)
  • Java (enterprise applications)

Check for:

  • LangChain integration: Simplifies RAG pipeline development
  • LlamaIndex support: Streamlines indexing and retrieval
  • OpenAI/Anthropic compatibility: Easy embedding integration
  • Hugging Face integration: Access to open-source models

Data Ingestion Patterns:

COMMON INGESTION ARCHITECTURES

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ BATCH INGESTION                          โ”‚
โ”‚                                          โ”‚
โ”‚ Source Data โ†’ ETL Pipeline โ†’ Vector DB  โ”‚
โ”‚ (Daily/hourly bulk updates)              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ STREAMING INGESTION                      โ”‚
โ”‚                                          โ”‚
โ”‚ Events โ†’ Kafka โ†’ Processor โ†’ Vector DB  โ”‚
โ”‚ (Real-time updates)                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ HYBRID                                   โ”‚
โ”‚                                          โ”‚
โ”‚ Historical: Batch (nightly)              โ”‚
โ”‚ Recent: Streaming (real-time)            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Practical Examples: Vector Database Selection in Action ๐ŸŽฏ

Scenario: A startup building a document search feature for a SaaS product needs to choose their first vector database.

Requirements:

  • 100K documents initially, growing to 1M in year 1
  • 50-100 QPS average, 200 QPS peak
  • Budget: <$500/month
  • Team: 2 backend engineers, no ML specialists
  • Need to launch in 6 weeks

Decision Process:

Option Pros Cons Verdict
Pinecone Zero ops, fast setup, great docs Higher cost at scale, vendor lock-in ๐Ÿฅ‡ Top choice
pgvector Use existing PostgreSQL, low cost Performance at 1M vectors, limited features ๐Ÿฅˆ Strong alternative
Self-hosted Qdrant Open source, good performance Ops overhead, setup time โŒ Too complex for MVP

Recommendation: Start with Pinecone for rapid development. The managed service allows the team to focus on product features rather than infrastructure. Cost is within budget (~$150-300/month initially). Plan to re-evaluate at 5M+ vectors if cost becomes prohibitive.

Implementation snippet:

import pinecone
from langchain.vectorstores import Pinecone
from langchain.embeddings import OpenAIEmbeddings

## Initialize Pinecone
pinecone.init(api_key="your-key", environment="us-west1-gcp")

## Create index
if "docs-search" not in pinecone.list_indexes():
    pinecone.create_index(
        "docs-search", 
        dimension=1536,  # OpenAI ada-002
        metric="cosine"
    )

## Simple integration with LangChain
vectorstore = Pinecone.from_documents(
    documents, 
    OpenAIEmbeddings(), 
    index_name="docs-search"
)

Example 2: Enterprise System - Customer Support Knowledge Base

Scenario: A Fortune 500 company wants to build an internal AI assistant for 10,000 support agents.

Requirements:

  • 50M historical support tickets and knowledge articles
  • 1000+ concurrent users during peak hours
  • Multi-region deployment (US, EU, APAC)
  • Strict data residency requirements
  • Enterprise SLA: 99.9% uptime
  • Existing tech stack: AWS, PostgreSQL, Kubernetes

Decision Process:

Key considerations:

  1. Scale: 50M vectors requires horizontal scalability
  2. Compliance: Data residency rules out some managed services
  3. Integration: Must work with existing AWS infrastructure
  4. Support: Need enterprise support contracts

Top candidates:

Database Deployment Fit Score
AWS OpenSearch Managed in VPC, multi-region 9/10 - Best AWS integration
Weaviate (self-hosted) Kubernetes, each region 8/10 - Great features, more ops
Milvus on EKS Containerized, distributed 7/10 - Excellent scale, higher complexity

Recommendation: AWS OpenSearch with k-NN plugin. Reasons:

  • Native AWS integration with existing infrastructure
  • Data residency control via AWS regions
  • Elasticsearch compatibility (team already knows it)
  • Enterprise support through AWS
  • Can combine vector search with full-text and aggregations

Architecture:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           US-EAST-1 (Primary)               โ”‚
โ”‚                                             โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
โ”‚  โ”‚ OpenSearch   โ”‚      โ”‚ OpenSearch   โ”‚    โ”‚
โ”‚  โ”‚ Master Nodes โ”‚โ—„โ”€โ”€โ”€โ”€โ–บโ”‚ Data Nodes   โ”‚    โ”‚
โ”‚  โ”‚   (3 nodes)  โ”‚      โ”‚  (10 nodes)  โ”‚    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
โ”‚                              โ”‚              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ”‚
                     Cross-Region Replication
                               โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           EU-WEST-1                         โ”‚
โ”‚                              โ†“              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
โ”‚  โ”‚ OpenSearch   โ”‚      โ”‚ OpenSearch   โ”‚    โ”‚
โ”‚  โ”‚ Master Nodes โ”‚โ—„โ”€โ”€โ”€โ”€โ–บโ”‚ Data Nodes   โ”‚    โ”‚
โ”‚  โ”‚   (3 nodes)  โ”‚      โ”‚   (8 nodes)  โ”‚    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Scenario: An online retailer wants to add "search by image" functionality for their 5M product catalog.

Requirements:

  • 5M products, each with multiple images (15M total vectors)
  • Image embeddings (512 dimensions from CLIP model)
  • Real-time inventory filtering (only show in-stock products)
  • Mobile app integration (low latency critical)
  • Seasonal traffic spikes (3x during holidays)
  • Need sub-100ms p95 latency

Special Considerations:

  • Frequent metadata updates: Stock status changes constantly
  • High read-to-write ratio: 1000:1
  • Filtering is critical: Must combine similarity with inventory/price/category

Decision Process:

Database Key Strength Weakness
Qdrant Excellent filtering, Rust performance Need to self-host or use Qdrant Cloud
Weaviate Hybrid search, good filtering Higher resource usage
Redis + RediSearch Ultra-low latency, in-memory Memory cost, limited scale

Recommendation: Qdrant (managed Qdrant Cloud). Reasons:

  • Industry-leading filtering performance for metadata combinations
  • Efficient memory usage (important for 15M vectors)
  • Native support for payload indexing (category, price, stock)
  • Can handle metadata updates without full re-indexing
  • Rust implementation delivers consistent low latency

Filtering query example:

from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, Range

client = QdrantClient(url="https://xyz.qdrant.io", api_key="key")

results = client.search(
    collection_name="products",
    query_vector=image_embedding,  # 512-dim CLIP vector
    query_filter=Filter(
        must=[
            FieldCondition(key="in_stock", match={"value": True}),
            FieldCondition(key="category", match={"value": "shoes"}),
            FieldCondition(key="price", range=Range(gte=50, lte=200)),
        ]
    ),
    limit=20
)

Example 4: Migration Strategy - Moving from Pinecone to Self-Hosted

Scenario: A growing company has 20M vectors in Pinecone. Monthly cost has grown to $2000+, and they want to reduce costs by moving to self-hosted Qdrant.

Migration Plan:

๐Ÿ“‹ Zero-Downtime Migration Steps

Phase Action Duration
1. Setup Deploy Qdrant cluster, configure monitoring 1 week
2. Backfill Export from Pinecone, import to Qdrant (parallel with production) 3-5 days
3. Dual-Write Write new vectors to both systems, verify consistency 1 week
4. Shadow Read Query both systems, compare results and latency 1 week
5. Canary Route 5% โ†’ 25% โ†’ 50% traffic to Qdrant 1 week
6. Cutover 100% traffic to Qdrant, maintain Pinecone as backup 1 day
7. Decommission After 2 weeks of stable operation, delete Pinecone index 1 day

Cost comparison after migration:

  • Before (Pinecone): $2000/month
  • After (Qdrant on AWS): ~$600/month infrastructure + engineering overhead
  • Net savings: ~$1400/month (~70% reduction)

โš ๏ธ Caution: Factor in the engineering cost of migration (~200 hours) and ongoing maintenance (10-20 hours/month). Migration makes sense at scale, but not always for smaller datasets.

Common Mistakes to Avoid โš ๏ธ

1. Choosing Based Only on Benchmarks

The Mistake: Selecting a database purely because it topped an ANN benchmark without considering your specific use case.

Why It's Wrong: Benchmarks typically test:

  • Pure vector similarity (no filtering)
  • Uniform data distribution
  • Idealized query patterns
  • Synthetic datasets

Your production system likely has:

  • Complex metadata filtering requirements
  • Skewed access patterns (some vectors queried 100x more)
  • Real-world data with quality issues
  • Varying embedding dimensions

Better Approach: Run benchmarks with your actual data and query patterns. Include filters, mixed query types, and realistic concurrency.

2. Underestimating Operational Complexity

The Mistake: Choosing a self-hosted solution to save money without accounting for engineering overhead.

Hidden Costs:

  • Initial setup and configuration (1-2 weeks)
  • Monitoring and alerting setup
  • Backup and disaster recovery procedures
  • Security hardening and updates
  • Scaling operations as data grows
  • Debugging production issues (often at 2 AM)

๐Ÿ’ก Rule of Thumb: If you don't have dedicated DevOps/SRE resources, strongly prefer managed services. The cost difference is usually less than one engineer's salary.

3. Ignoring Filtering Performance

The Mistake: Testing only pure vector similarity, then discovering in production that filtered queries are 10x slower.

Real-world Impact:

## This is fast
results = db.search(vector=embedding, limit=10)
## Response time: 20ms

## This might be slow depending on database
results = db.search(
    vector=embedding, 
    filter={"user_id": 12345, "category": "tech", "date": ">2023-01-01"},
    limit=10
)
## Response time: 200ms or timeout!

Better Approach: Benchmark with representative filter selectivity. Test cases where filters match 0.1%, 1%, 10%, and 50% of your data.

4. Over-Engineering for Day 1

The Mistake: Setting up a distributed, multi-region vector database when you have 100K vectors and 10 QPS.

Unnecessary Complexity:

  • Kubernetes clusters
  • Multi-region replication
  • Complex sharding strategies
  • Custom load balancing

Better Approach: Start simple! A single PostgreSQL instance with pgvector can handle millions of vectors and hundreds of QPS. Scale only when you have clear evidence you need it.

๐Ÿง  Memory Device - SCALE Principle:

  • Start simple
  • Cost-optimize later
  • Add complexity when needed
  • Learn from real usage
  • Evolve architecture gradually

5. Neglecting Data Migration Strategy

The Mistake: Picking a database without considering how you'll migrate if needed later.

Migration Challenges:

  • Different embedding formats or normalization
  • Metadata schema incompatibilities
  • Query syntax differences
  • Different ID formats
  • Downtime during transition

Better Approach:

  • Use abstraction layers (LangChain, LlamaIndex) when possible
  • Export your vectors regularly to standard formats
  • Document your embedding process and parameters
  • Design your application with swappable backends

6. Ignoring License and Pricing Changes

The Mistake: Building critical infrastructure on a database without understanding the licensing model or pricing trajectory.

Recent Examples:

  • Managed services changing pricing structures
  • Open-source projects adding restrictions
  • "Free tier" limitations discovered in production

Better Approach:

  • Read the license carefully (Apache 2.0 vs. SSPL vs. proprietary)
  • Understand pricing tiers and overage costs
  • Plan for 10x growth - what will it cost?
  • Have a backup option identified

Key Takeaways ๐ŸŽฏ

๐Ÿ“‹ Vector Database Selection Quick Reference

Use Case Recommended Option Rationale
MVP / Prototype Pinecone or pgvector Fast setup, minimal ops
Enterprise / Regulated AWS OpenSearch or self-hosted Control, compliance, integration
E-commerce / Heavy Filtering Qdrant or Weaviate Excellent filter performance
Massive Scale (100M+ vectors) Milvus or Pinecone Proven at scale
Ultra-Low Latency Redis + RediSearch In-memory performance
Cost-Sensitive pgvector or self-hosted Qdrant Lowest infrastructure cost
Hybrid Search (Vector + Text) Elasticsearch or Weaviate Native hybrid capabilities

Essential Decision Framework

Ask these questions in order:

  1. Scale: How many vectors? Growth rate?

    • <1M: Single-node solutions (pgvector, small Qdrant)
    • 1M-10M: Vertical scaling or small cluster
    • 10M: Distributed architecture

  2. Performance: What are your latency requirements?

    • <50ms: In-memory (Redis) or HNSW indexes
    • <200ms: Most modern vector databases
    • 200ms: Even basic solutions work

  3. Filtering: How complex are your metadata queries?

    • None: Any vector database works
    • Simple: Most databases adequate
    • Complex: Qdrant, Weaviate, or Elasticsearch
  4. Operations: What's your team's capacity?

    • No ops team: Managed services only
    • DevOps available: Self-hosted is viable
    • SRE team: Any option works
  5. Budget: What can you spend?

    • <$100/month: pgvector or self-hosted
    • $100-1000/month: Managed services
    • $1000/month: Any solution, optimize TCO

Future-Proofing Considerations

The vector database landscape is evolving rapidly. Design with flexibility:

  • Use abstraction layers (LangChain, LlamaIndex) to minimize coupling
  • Export data regularly in portable formats
  • Monitor costs and performance continuously
  • Re-evaluate annually as new options emerge
  • Stay vendor-neutral in architecture when possible

๐Ÿค” Did you know? Many companies run multiple vector databases in productionโ€”one for high-QPS simple queries (like autocomplete) and another for complex analytical queries (like similarity analysis). Don't assume you need to pick just one!

Further Study ๐Ÿ“š

Deepen your understanding with these resources:

  1. ANN Benchmarks - Compare vector search algorithms with real benchmarks
  2. Vector Database Comparison Guide - Comprehensive feature comparison across databases
  3. LangChain Vector Store Documentation - Integration guides for all major vector databases

Congratulations! ๐ŸŽ‰ You now have a structured framework for evaluating and selecting vector databases. Remember: the "best" database depends entirely on your specific requirements. Start simple, measure actual performance with your data, and evolve your architecture as you scale. The most important decision is making a choice and moving forwardโ€”you can always migrate later if needed!

Practice your understanding with the free flashcards embedded throughout this lesson, and test your knowledge with the quiz questions below. Master these concepts, and you'll be well-equipped to make confident vector database decisions for your AI search and RAG applications.