You are viewing a preview of this lesson. Sign in to start learning
Back to 2026 Modern AI Search & RAG Roadmap

Graph RAG with Neo4J

applications. It supports knowledge graph creation through a pipeline.Read more AI Overview +19 Graph RAG with Neo4j combines the semantic search capabilities of traditional RAG (Retrieval-Augmented Generation) with the structured relationship-mapping of a knowledge graph. This approach allows Large Language Models (LLMs) to answer complex questions that require understanding connections between data points, such as hierarchies, sequences, or multi-hop dependencies. Neo4j +2 Core Components Neo4j Database: Acts as the storage layer for both unstructured text (as vector embeddings) and structured knowledge (as nodes and relationships). Neo4j GraphRAG Python Package: The official first-party library for building these applications, replacing the deprecated neo4j-genai. Vector Index: Stores embeddings of text chunks to perform initial similarity searches. Orchestration: Often managed via tools like LangChain or LlamaIndex to handle the flow between user queries, graph retrieval, and LLM response generation. Neo4j +7 Retrieval Strategies Neo4j supports several advanced retrieval patterns through its Python package: Vector Search: Standard semantic search that finds relevant document chunks based on distance metrics. Vector-Cypher Retrieval: Uses a vector search to find initial "anchor" nodes, then executes a Cypher query to traverse the graph and retrieve related context. Hybrid Search: Combines vector search with full-text search (fuzzy matching) to improve retrieval accuracy for specific names or terms. Graph-Specific Algorithms: Uses the Neo4j Graph Data Science (GDS) library for complex patterns like community detection or page rank to identify global trends in the data. Neo4j +5 Implementation Tools

Last generated

Why Graph RAG? Limitations of Vector-Only Retrieval

Imagine you're building a RAG-powered assistant for a software company. A support engineer asks: "Who manages the team responsible for the billing service, and what other services does that team own?" You've embedded thousands of documentation pages, team wikis, and architecture diagrams into a vector store. The query goes in, the top-k chunks come back — but the answer doesn't. The chunks mention the billing service in isolation. The team assignment lives in a different document. The management chain is scattered across an org chart nobody embedded correctly. The model hallucinates a plausible-sounding answer, and the support engineer files the wrong escalation ticket.

This is the failure mode that Graph RAG is designed to fix. To understand why it works, you first have to understand precisely where vector-only retrieval breaks down — not at the level of vague intuition, but at the level of the data structures involved.

How Vector Search Works — and What It Cannot Do

Vector search converts text into high-dimensional numerical embeddings, where semantic similarity is approximated by geometric proximity. When a user submits a query, the system embeds that query and searches a vector index for the nearest chunk embeddings by a distance metric such as cosine similarity. The top-k results are assembled as context and handed to the LLM.

This works well for a specific class of question: "Find me content that talks about the same thing as this query." Questions like "What is the refund policy?" or "How do I reset a password?" are essentially proximity questions — the answer lives in a chunk that semantically resembles the question.

But notice what this mechanism has no concept of: relationships between entities. The vector index is a flat list of embeddings. Each chunk is an island. There is no edge, no pointer, no connection from one chunk to another that encodes meaning like "this service DEPENDS_ON that service" or "this person MANAGES that team."

Vector Index (flat structure):

  [chunk_001]  →  embedding: [0.21, -0.44, 0.87, ...]
  [chunk_002]  →  embedding: [0.19, -0.41, 0.90, ...]
  [chunk_003]  →  embedding: [-0.03, 0.72, 0.11, ...]
  [chunk_004]  →  embedding: [0.20, -0.43, 0.88, ...]

  ↑ chunk_001 and chunk_002 are "close" in embedding space.
    But chunk_001 mentions "Alice" and chunk_003 mentions
    "Alice's team" — no structure connects them.

The Multi-Hop Problem

Multi-hop questions require following a chain of relationships to reach an answer. The opening example — "Who manages the team that owns the billing service?" — requires at least two hops:

Billing Service  →  OWNED_BY  →  Payments Team  →  MANAGED_BY  →  Alice Chen

Vector search has no traversal mechanism. It can retrieve a chunk mentioning the billing service, and separately retrieve a chunk mentioning the Payments Team, and separately retrieve a chunk mentioning Alice Chen. But assembling those facts into a coherent answer requires knowing that they are connected — and vector proximity does not encode connection. The LLM is left to infer relationships from co-occurrence patterns in the retrieved text. Accuracy degrades predictably as the number of required hops increases.

🎯 Key Principle: Vector search answers "what is similar to this?" Graph traversal answers "what is connected to this, and how?" These are fundamentally different questions, and conflating them is the root cause of most vector-only retrieval failures on relational data.

Structural Context Is Destroyed by Flat Chunking

The problem runs deeper than multi-hop queries. When documents are ingested into a vector store, they are split into chunks and each chunk is embedded independently. This process is destructive: it flattens structural context that was present in the original document.

Three kinds of structure that chunking erases:

1. Parent-child hierarchies. A technical specification might have a top-level section on "Authentication," subsections on "OAuth 2.0" and "API Keys," and sub-subsections on token expiry. When chunked, the subsections become independent embeddings. A query about token expiry retrieves the leaf chunk — but the context that this applies specifically to OAuth 2.0 within the Authentication system is gone.

2. Sequential dependencies. Instructional or procedural content often has meaning that is only correct in sequence. Step 3 of a deployment procedure might say "run the migration script" — but that instruction is only safe after Step 2's preconditions are met. Chunked independently, Step 3 looks like standalone advice.

3. Entity co-references. A document might refer to "the platform," "the system," "it," and "our service" — all meaning the same product. Two chunks about the same entity may embed to different regions of vector space because they use different surface forms.

Original document structure (structural context intact):

  Authentication System
  ├── OAuth 2.0
  │   ├── Token expiry: 3600 seconds
  │   └── Refresh token policy: ...
  └── API Keys
      └── Rotation policy: ...

After flat chunking (structural context destroyed):

  chunk_A: "Token expiry: 3600 seconds"
  chunk_B: "Refresh token policy..."
  chunk_C: "API Keys rotation policy..."
  chunk_D: "OAuth 2.0 is used for..."

  ← No chunk knows it belongs to "OAuth 2.0"
  ← No chunk knows "OAuth 2.0" belongs to "Authentication System"

⚠️ Common Mistake: Assuming that better chunking strategies alone solve this problem. Semantic chunking, sliding-window chunking, and recursive chunking all improve retrieval quality at the margins, but they still produce a flat list of independent embeddings. Hierarchical structure, cross-document entity linking, and typed relationships cannot be recovered from a flat vector index regardless of how chunks are sized.

Knowledge Graphs: A Different Data Model

A knowledge graph represents information as a network of nodes (entities) and edges (relationships between entities). In a typed knowledge graph:

  • Nodes represent entities: a person, a product, a team, a document chunk, a concept.
  • Edges represent relationships with explicit types and direction: (Alice)-[:MANAGES]->(Payments Team), (Payments Team)-[:OWNS]->(Billing Service).
  • Properties on both nodes and edges store attributes: names, dates, text excerpts, confidence scores.
Knowledge graph (structural context preserved):

  (Alice Chen)
      │
      │ MANAGES
      ▼
  (Payments Team)
      │
      │ OWNS              │ OWNS
      ▼                   ▼
  (Billing Service)  (Settlement Service)
      │
      │ DEPENDS_ON
      ▼
  (Auth Service)
      │
      │ DOCUMENTED_IN
      ▼
  (chunk_042: "The auth service uses JWT tokens...")

With this structure, "Who manages the team that owns the billing service?" becomes a deterministic traversal: start at Billing Service, follow OWNED_BY to Payments Team, follow MANAGED_BY to Alice Chen. No guessing. No co-occurrence pattern matching. The relationships are explicit, typed, and traversable rather than implicit, untyped, and inferred.

💡 Mental Model: Think of a vector index as a library where every book has been torn into pages and sorted by topic similarity. You can find pages that discuss a topic. But you can't follow a citation from one page to the page it references, because the citation structure has been destroyed. A knowledge graph is the library with the citation network intact.

Graph RAG: Combining Both Retrieval Modes

Graph RAG is not a replacement for vector search — it is an augmentation of it. The architecture combines a vector index for initial semantic retrieval with graph traversal for contextual expansion, operating in two stages:

Stage 1 — Anchor retrieval: The user query is embedded and used to find the top-k most semantically similar nodes. These anchor nodes are the entry points for graph traversal.

Stage 2 — Graph expansion: From each anchor node, a traversal query follows typed edges to collect related nodes — parent documents, co-mentioned entities, dependent services, authoring teams.

Graph RAG retrieval flow:

  User query: "Who manages the team that owns the billing service?"
        │
        ▼
  [Vector Index]
  Embed query → find similar chunks
  → anchor node: chunk_019 (mentions "billing service")
        │
        ▼
  [Graph Traversal]
  chunk_019 → DESCRIBES → (Billing Service)
                               │
                          OWNED_BY
                               ▼
                          (Payments Team)
                               │
                          MANAGED_BY
                               ▼
                          (Alice Chen)
        │
        ▼
  Assembled context: chunk_019 + team node + person node
        │
        ▼
  [LLM]
  Generate grounded answer: "Alice Chen manages the Payments Team,
  which owns the Billing Service."

This two-stage design is deliberate: vector search is fast and fuzzy-tolerant, making it a practical entry point even for large corpora. Graph traversal is precise but requires a starting point — it expands from anchors rather than searching the whole graph. Together, they cover the retrieval surface that neither covers alone.

📋 Quick Reference: Vector-Only vs. Graph RAG

🔍 Vector-Only RAG 🕸️ Graph RAG
Best for Self-contained factual lookups Relational, multi-hop questions
Relationship handling Implicit (co-occurrence) Explicit (typed edges)
Structural context Lost in chunking Preserved as graph structure
Multi-hop queries Unreliable Deterministic traversal
Complexity Lower Higher (graph maintenance required)
Entity co-reference Not resolved Resolved via shared nodes

With the problem clearly framed, we can move into how Neo4j is architected to serve both roles — vector store and graph database — in a single system.


Core Architecture: Neo4j as a Dual-Purpose Store

Traditional RAG architectures require two separate systems: a vector store for embedding-based retrieval and a relational or document database for structured facts. Graph RAG with Neo4j collapses these into a single database layer, allowing the same node that participates in a graph traversal to also serve as an anchor in a vector similarity search.

One Database, Two Access Modes

Neo4j is a native graph database that stores data as a network of nodes and relationships. Nodes carry arbitrary properties — simple key-value pairs — and this flexibility is what enables the dual-purpose design.

Consider a text chunk extracted from a product documentation page. In a conventional RAG system, that chunk lives as a row in a vector store with an ID, text, and an embedding vector. In Neo4j, that same chunk becomes a Chunk node whose properties include text, source_url, created_at, and — critically — embedding. The embedding is just another property, stored alongside every other attribute.

Chunk Node (id: chunk_042)
├── text: "The GraphRAG Python package provides retriever abstractions..."
├── source_url: "docs.example.com/graphrag/overview"
├── created_at: "2025-11-01"
└── embedding: [0.021, -0.113, 0.047, ... ] ← 1536-dimensional float array

This means there is no architectural seam between "the vector store" and "the graph." A query planner can ask Neo4j to find nodes by embedding proximity, then immediately follow their relationships to neighboring nodes — all within a single query execution context.

The Vector Index: Finding Anchor Nodes

Storing an embedding on a node is necessary but not sufficient for fast retrieval. Without an index, finding nearest neighbors would require a full scan of every node. Neo4j addresses this with a vector index — a secondary structure built on approximate nearest-neighbor (ANN) algorithms — that operates over a named node property.

When you create a vector index, you specify four things:

  • The node label to index (e.g., Chunk)
  • The property name holding the embedding (e.g., embedding)
  • The vector dimension matching the output size of your embedding model
  • The similarity metric — most commonly cosine similarity

Given a query vector, the index returns the top-k nodes whose embeddings are most similar to that query. These become the anchor nodes for the retrieval step. The ANN qualifier matters: approximate algorithms trade a small probability of missing the true nearest neighbor for dramatically faster query times. For RAG applications this trade-off is almost always acceptable, because downstream graph traversal expands context beyond the anchor nodes anyway.

Relationships: The Knowledge Graph Layer

If vector indexes make Neo4j a capable vector store, relationships are what make it a knowledge graph. Relationships in Neo4j are first-class objects with a type, a direction, and their own properties. During ingestion, the pipeline does more than embed text chunks — it also extracts entities and connections from content and represents them as typed nodes and edges.

Some common relationship types in document-oriented Graph RAG pipelines:

Relationship Type Connects Encodes
MENTIONS Chunk → Entity "This chunk references this entity"
PART_OF Chunk → Document Parent-child document structure
AUTHORED_BY Document → Person Provenance and authorship
RELATED_TO Entity → Entity Domain-specific semantic link
NEXT_CHUNK Chunk → Chunk Sequential order within a document

These relationships encode domain knowledge that is invisible to a pure embedding — they capture why two pieces of information belong together, not just that their text is similar. A NEXT_CHUNK edge lets a retriever reconstitute document order after an out-of-order vector search. An AUTHORED_BY edge lets a query filter results by a specific author without relying on the author's name appearing verbatim in every chunk.

                    PART_OF
  [Chunk_042] ─────────────────► [Document: "GraphRAG Overview"]
       │                                      │
  MENTIONS                               AUTHORED_BY
       │                                      │
       ▼                                      ▼
  [Entity: "GraphRAG"]              [Person: "Jane Smith"]
       │
  RELATED_TO
       │
       ▼
  [Entity: "LLM"]

Chunk_042 is reachable by vector search (via its embedding property) and by graph traversal (via its outgoing MENTIONS and PART_OF edges). A retriever can arrive at this chunk either way and then continue traversing in any direction the relationships allow.

💡 Mental Model: Think of the vector index as a spotlight — it finds the most relevant entry points in the graph quickly. Relationships are the roads — once you're at an anchor node, you can drive to neighboring context that the spotlight never directly illuminated.

The Neo4j GraphRAG Python Package

Having a dual-purpose database is only useful if the application layer can drive both modes coherently. The Neo4j GraphRAG Python package (neo4j_graphrag) is the official first-party library for doing exactly this, replacing earlier community-maintained libraries including the previously common neo4j-genai package.

⚠️ Common Mistake: If you encounter tutorials that import from neo4j_genai, those examples use a deprecated library. The current package is neo4j_graphrag.

The package organizes around two primary abstractions:

Retrievers encapsulate the logic for querying Neo4j — whether that is a pure vector search, a vector search followed by a Cypher traversal, or a hybrid approach. Each retriever accepts a Neo4j driver instance, an index name, and configuration parameters, then exposes a uniform interface for executing retrieval given a query embedding.

Pipelines orchestrate the end-to-end flow: receiving a user query, generating a query embedding, invoking a retriever, assembling the retrieved context into a prompt, and calling an LLM to produce a final response. Pipelines treat the graph as the source of truth and keep the LLM stateless — it receives only the context retrieved for that specific query.

from neo4j import GraphDatabase
from neo4j_graphrag.retrievers import VectorRetriever
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG

driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))

retriever = VectorRetriever(
    driver=driver,
    index_name="chunk_embeddings",
    embedder=embedder_instance,  # any object implementing .embed_query(text)
)

llm = OpenAILLM(model_name="gpt-4o")
rag_pipeline = GraphRAG(retriever=retriever, llm=llm)

response = rag_pipeline.search("What does the GraphRAG package provide?")
print(response.answer)

This example uses VectorRetriever, the simplest retriever type. The more powerful VectorCypherRetriever — which performs vector search and then executes a Cypher traversal — follows the same interface pattern but accepts an additional Cypher template parameter. The embedder parameter expects any object with an .embed_query(text: str) -> list[float] method, so you can swap embedding providers without changing your retriever configuration.

Orchestration Layers: LangChain and LlamaIndex

The Neo4j GraphRAG package handles the retrieval and generation pipeline, but production applications often need more: routing different query types to different retrievers, managing conversation history, composing multiple tools, or inserting guardrails around LLM outputs. This is where orchestration frameworks enter.

Both LangChain and LlamaIndex provide integrations that wrap Neo4j retrievers as components within their broader pipeline abstractions. The orchestration layer handles query routing, prompt construction with conversation history, retry logic, and multi-step chains where the output of one retrieval seeds a follow-up query.

Wrong thinking: "I need LangChain to use Neo4j for RAG." ✅ Correct thinking: The Neo4j GraphRAG package is fully self-contained for straightforward pipelines. LangChain or LlamaIndex add value when your application needs multi-step reasoning, tool orchestration, or integration with other data sources alongside Neo4j.

The retriever abstraction in Neo4j GraphRAG is deliberately framework-agnostic — its output is a standard Python object containing text strings — so the same retriever instance can be wrapped by any orchestration layer without modification.

📋 Quick Reference: Core Architecture Components

Component Role Key Configuration
Neo4j Node Stores text chunk + embedding as properties Node label, property names
Vector Index Enables ANN search over embedding property Label, property, dimension, similarity metric
Relationships Encode domain knowledge as typed, directional edges Relationship type (e.g., MENTIONS, PART_OF)
GraphRAG Package Official retriever + pipeline abstractions Retriever type, embedder, LLM instance
Orchestration Layer Query routing, prompt assembly, multi-step chains Framework choice (LangChain, LlamaIndex, custom)

With the dual-purpose storage model and tooling ecosystem understood, the natural next question is: which retrieval strategy should you use, and when?


Retrieval Strategies: From Vector Search to Graph Traversal

Not every question a user asks has the same shape, and retrieval strategies should reflect that. A question like "What does the reconcile function do?" is self-contained — the answer lives in a single chunk of documentation. A question like "Which microservices depend on the authentication module that was updated last sprint?" requires following a chain of relationships across multiple entities. Neo4j's retrieval patterns are designed to match the shape of the question to the structure of the data.

Vector Search: The Baseline Strategy

Vector search is the foundation of every retrieval strategy in a Graph RAG system. The query text is converted to an embedding, and Neo4j's vector index performs an approximate nearest-neighbor (ANN) search, returning the top-k chunk nodes whose embeddings are closest to the query under a chosen distance metric — typically cosine similarity.

This pattern works well for self-contained factual lookups: retrieving a definition, finding the paragraph that describes a configuration option, or surfacing a policy statement. The answer is plausibly contained in one or a small number of chunks, and semantic similarity is a good proxy for relevance.

Where vector search breaks down is on relational questions. If you ask "Who is the on-call engineer for the service that processes payment events?", the answer may require knowing which service handles payment events (one chunk), which team owns that service (an edge in the graph), and who on that team is currently on-call (another edge). No single chunk likely contains all three facts.

❌ Wrong thinking: "If I increase top-k, vector search will eventually gather enough chunks to answer multi-hop questions."

✅ Correct thinking: "Increasing top-k adds noise and token cost without guaranteeing the right relational context. Multi-hop questions need traversal, not more chunks."

Vector-Cypher Retrieval: Anchors and Traversal

Vector-Cypher retrieval is the core Graph RAG pattern that separates Neo4j's approach from a plain vector database. The strategy has two distinct phases:

Phase 1 — Anchor identification: A standard vector search returns a small set of chunk nodes — the anchor nodes — identified by semantic proximity to the query.

Phase 2 — Graph traversal: A parameterized Cypher query accepts the anchor node IDs and traverses the graph outward along typed relationships, collecting additional nodes and their text as context. This traversal is deterministic and structure-driven — it follows edges regardless of whether the destination nodes are semantically similar to the query.

Query Text
    │
    ▼
[Embedding Model]
    │
    ▼
Query Vector ──► [Vector Index] ──► Anchor Node IDs
                                          │
                                          ▼
                              [Cypher Traversal Query]
                              MATCH (anchor)-[:PART_OF]->(doc)
                              MATCH (anchor)-[:MENTIONS]->(entity)
                              MATCH (entity)-[:OWNED_BY]->(team)
                              RETURN anchor.text, doc.title, team.name
                                          │
                                          ▼
                              Expanded Context Set

The Cypher query is a template written at pipeline configuration time, not at query time. This is an important design constraint: the traversal pattern is fixed for a given retriever instance. If your application serves multiple question types requiring different traversal patterns, you need multiple configured retrievers and a routing layer to select between them.

💡 Real-World Example: In a support knowledge base where ticket chunks are linked to product components via [:AFFECTS] edges, and components are linked to engineering teams via [:OWNED_BY] edges, a vector search anchors on ticket chunks most similar to "payment timeout errors." The Cypher traversal then follows [:AFFECTS]->(:Component)-[:OWNED_BY]->(:Team) to retrieve the responsible team's escalation procedure — information that never appears in the ticket text itself.

⚠️ Common Mistake: Writing Cypher traversals that follow an unbounded number of hops. A query like MATCH (anchor)-[*1..5]->(n) with no relationship type filter can return hundreds of loosely connected nodes, bloating the context far beyond what the LLM can use effectively. Always specify relationship types and limit hop depth to what the domain actually requires.

Hybrid Search: Bridging Semantic and Lexical Retrieval

Hybrid search addresses a specific failure mode of pure vector search: the tendency for embedding models to blur together terms that are semantically adjacent but critically distinct. Product codes like SKU-9041-B and SKU-9041-C, or proper nouns like Fargate and Lambda, may embed at nearly the same point in vector space despite referring to entirely different entities.

Hybrid search combines two retrieval signals:

  • Vector similarity score — how close the query embedding is to a chunk's embedding
  • Full-text index score — how well the chunk matches the query terms using lexical scoring (BM25 or similar), which rewards exact or near-exact term overlap

The two scores are merged using Reciprocal Rank Fusion (RRF) or a weighted linear combination, surfacing chunks that score well on at least one dimension.

Query: "Configure Fargate task for SKU-9041-B"

  Vector Search Results          Full-Text Index Results
  (semantic similarity)          (lexical match)
  ───────────────────            ───────────────────
  1. Container task setup        1. SKU-9041-B product spec
  2. ECS configuration guide     2. Fargate billing notes
  3. Lambda deployment steps     3. Task definition reference

              │                          │
              └──────────┬───────────────┘
                         ▼
              [Score Fusion / RRF]
                         │
                         ▼
             Merged Ranked Results
             1. Fargate billing notes    ← exact term match
             2. SKU-9041-B product spec  ← exact term match
             3. Container task setup     ← semantic match

This makes hybrid search particularly valuable in domains with heavy domain-specific jargon, internal identifiers, or named entities unlikely to be well-represented in the embedding model's training distribution. Hybrid search is still a flat retrieval strategy — it returns a ranked list of chunks — but it can be combined with Vector-Cypher by using the hybrid-ranked results as anchors for a subsequent Cypher traversal.

🤔 Did you know? Full-text indexes in Neo4j use the underlying Lucene engine, the same technology powering Elasticsearch. This means battle-tested tokenization, stemming, and fuzzy matching without standing up a separate service.

⚠️ Common Mistake: Defaulting to hybrid search for every use case. Full-text indexes require maintenance and add query latency. For domains where all content is natural language prose without domain-specific identifiers, pure vector search often performs comparably with less operational overhead.

GDS-Augmented Retrieval: Pre-Computed Graph Signals

The retrieval strategies so far are all query-time strategies. The Neo4j Graph Data Science (GDS) library enables a complementary approach: pre-computing structural signals about the graph and storing them as node properties, which retrieval queries can then use to filter or rerank results.

The two most useful GDS algorithms in this context:

Community Detection (e.g., Louvain or Label Propagation): Groups nodes into clusters based on relationship density. In a knowledge graph of technical documentation, community detection might identify clusters corresponding to distinct subsystems. Once each node has a communityId property, retrieval queries can be scoped to chunks within the same community as the anchor, reducing cross-domain noise.

PageRank: Assigns each node an importance score based on how many other nodes reference it and how important those referencing nodes are. Surfacing high-PageRank nodes as part of retrieved context prioritizes foundational or authoritative content over peripheral mentions.

The workflow separates into offline and online phases:

OFFLINE (Ingestion / Scheduled Refresh)
────────────────────────────────────────
Full Graph
    │
    ▼
[GDS Algorithm: e.g., PageRank]
    │
    ▼
Write scores back as node properties:
  (:Chunk {pageRankScore: 0.87, communityId: 14})

ONLINE (Query Time)
────────────────────────────────────────
Vector Search ──► Anchor Nodes
    │
    ▼
Cypher Traversal with property filters:
  WHERE n.communityId = anchor.communityId
  ORDER BY n.pageRankScore DESC
  LIMIT 10
    │
    ▼
  Filtered, Ranked Context

This keeps query-time cost low because the expensive graph algorithms run offline. The trade-off is staleness: if the graph is updated frequently, GDS scores must be refreshed on a schedule. For graphs that change slowly — a product knowledge base, a regulatory document corpus — this is rarely a problem.

💡 Mental Model: Think of GDS pre-computation as adding a "reputation layer" to your graph. Each node carries a signal about its structural importance, and retrieval queries use that signal as a secondary ranking criterion on top of semantic similarity. The vector index tells you what is relevant; the GDS scores tell you what is authoritative or central among the relevant results.

🎯 Key Principle: GDS algorithms are most valuable when your graph has meaningful structural variation — some nodes are genuinely more central or more tightly clustered than others. On a sparse or flat graph, PageRank scores converge to near-uniform values and the signal disappears.

Choosing the Right Strategy

The four strategies are not mutually exclusive. A well-designed Graph RAG system routes different query types to different retrievers rather than applying one approach uniformly.

📋 Quick Reference: Retrieval Strategy Selection

Query Characteristic Recommended Strategy
Self-contained, semantic question Vector search
Multi-hop relational question Vector-Cypher retrieval
Proper nouns, codes, identifiers Hybrid search
"Most important" or "most referenced" GDS-augmented retrieval
Relational + importance filtering Vector-Cypher + GDS properties

The decision should be driven by the question types your application is designed to answer — a point that the final section explores in terms of what goes wrong when strategy selection is not intentional.


Building a Graph RAG Pipeline: Ingestion to Query

Understanding the architecture of Graph RAG is one thing; wiring it into a working system is another. This section walks through the full pipeline from raw documents to a grounded LLM answer. The pipeline has five stages that execute in sequence: chunk and embed, write to graph, extract entities and relationships, create a vector index, and configure the retriever for query execution. A mistake in an early stage propagates silently into retrieval failures later.

Raw Documents
     │
     ▼
┌─────────────┐
│  Chunking   │  Split text into overlapping windows
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Embedding  │  Embedding model → float[] per chunk
└──────┬──────┘
       │
       ▼
┌─────────────────────────┐
│  Neo4j Write (batch)    │  Chunk nodes + embedding property
└──────────┬──────────────┘
           │
           ▼
┌─────────────────────────┐
│  Entity / Rel Extraction│  LLM or NER → typed nodes + edges
└──────────┬──────────────┘
           │
           ▼
┌─────────────────────────┐
│  Vector Index Creation  │  Label, property, dims, metric
└──────────┬──────────────┘
           │
           ▼
┌─────────────────────────┐
│  Vector-Cypher Retriever│  Anchor search → graph traversal
└──────────┬──────────────┘
           │
           ▼
┌─────────────────────────┐
│  LLM Response Generation│  Context → prompt → grounded answer
└─────────────────────────┘

Stage 1: Chunking, Embedding, and Writing to Neo4j

Chunking splits source documents into segments small enough to embed meaningfully but large enough to carry coherent context. A fixed-size window with overlap is the most common approach: each chunk shares some tokens with its neighbors so that a sentence split at a boundary is not orphaned.

Once chunks are produced, pass each through an embedding model to produce a dense vector. Record the model name, output dimension, and similarity metric precisely — all three become required parameters when you create the vector index.

Writing to Neo4j in a single transaction batch is the key performance decision here. Opening one transaction per chunk serializes network round-trips and slows ingestion by an order of magnitude on even modest corpora. Instead, collect chunks into batches of a few hundred and use UNWIND in Cypher to write them in a single call.

from neo4j import GraphDatabase

def ingest_chunks(driver, chunks: list[dict]):
    """
    chunks: list of {id, text, embedding, source_doc, page}
    """
    query = """
    UNWIND $chunks AS chunk
    MERGE (c:Chunk {id: chunk.id})
    SET c.text      = chunk.text,
        c.embedding = chunk.embedding,
        c.source    = chunk.source_doc,
        c.page      = chunk.page
    """
    with driver.session() as session:
        session.run(query, chunks=chunks)

batches = [all_chunks[i:i+200] for i in range(0, len(all_chunks), 200)]
for batch in batches:
    ingest_chunks(driver, batch)

MERGE rather than CREATE is intentional: re-running ingestion after updating source documents will update existing chunks rather than duplicating them.

💡 Pro Tip: Store the embedding model name and version as a property on each Chunk node (e.g., c.embedding_model = "text-embedding-3-small"). When you upgrade models and regenerate embeddings, this property lets you query for stale nodes without cross-referencing an external registry.

Stage 2: Knowledge Graph Construction — Entity and Relationship Extraction

Chunks stored with embeddings give you vector search capability, but not yet a knowledge graph. That requires extracting entities mentioned in each chunk and the relationships between them, then writing those as typed nodes and edges in Neo4j.

Two practical extraction approaches:

Approach Strength Weakness
LLM-based extraction Flexible schema, handles complex relations Slower, higher cost, non-deterministic
NER model (e.g., spaCy) Fast, deterministic, low cost Limited to entity types in training data

For most production pipelines, LLM-based extraction is used for initial schema discovery, and a NER model handles high-volume production extraction once the schema is stable.

import json

EXTRACTION_PROMPT = """
Extract all named entities and relationships from the text below.
Return a JSON object with two keys:
  "entities": [{"id": str, "label": str, "name": str}]
  "relationships": [{"source": str, "type": str, "target": str}]
Only return valid JSON. Do not include explanation.

Text: {text}
"""

def extract_graph_elements(llm_client, chunk_text: str) -> dict:
    response = llm_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": EXTRACTION_PROMPT.format(text=chunk_text)}],
        response_format={"type": "json_object"}
    )
    return json.loads(response.choices[0].message.content)

⚠️ Common Mistake: Skipping entity deduplication. If "OpenAI" appears in 200 chunks and each extraction run creates a new node, you end up with 200 disconnected Organization nodes. Use a stable, normalized identifier as the MERGE key — not a UUID generated at extraction time.

🎯 Key Principle: The quality of your knowledge graph is bounded by the quality of your extraction. A fast pass with a small NER model that misses 30% of entities produces a graph that returns empty traversal results for 30% of relational queries — silently, with no error. Validate extraction quality on a sample before running the full corpus.

Stage 3: Creating the Vector Index

Neo4j does not automatically index the embedding property written in Stage 1. You must explicitly create a vector index before any similarity search can run. The index requires four parameters, and getting any one of them wrong silently breaks retrieval:

def create_vector_index(driver, index_name: str, label: str,
                        property_name: str, dimension: int, metric: str):
    safe_query = f"""
    CREATE VECTOR INDEX {index_name} IF NOT EXISTS
    FOR (n:{label})
    ON (n.{property_name})
    OPTIONS {{
      indexConfig: {{
        `vector.dimensions`: {dimension},
        `vector.similarity_function`: '{metric}'
      }}
    }}
    """
    with driver.session() as session:
        session.run(safe_query)

⚠️ Common Mistake: Re-running ingestion with a different embedding model that produces a different dimension without recreating the index. The index will silently reject vectors of the wrong dimension, and queries will return zero results with no error message. IF NOT EXISTS prevents errors on re-runs but does not update an existing index if the configuration changed — you must drop the old index and create a new one when changing embedding models.

Stage 4: Configuring the Vector-Cypher Retriever

With chunks embedded and indexed, and entities linked via relationships, you can configure a Vector-Cypher retriever. The Cypher query template receives anchor node IDs as parameters and must return a column named text to pass as context to the LLM.

from neo4j_graphrag.retrievers import VectorCypherRetriever
from neo4j_graphrag.embeddings import OpenAIEmbeddings

## Traversal: from anchor chunks, follow MENTIONS edges to shared entities,
## then collect other chunks that mention those same entities.
TRAVERSAL_QUERY = """
MATCH (anchor:Chunk)
WHERE anchor.id IN $node_ids
MATCH (anchor)-[:MENTIONS]->(entity)<-[:MENTIONS]-(related:Chunk)
WHERE related.id <> anchor.id
RETURN anchor.text + '\n' + related.text AS text
LIMIT 10
"""

embedder = OpenAIEmbeddings(model="text-embedding-3-small")

retriever = VectorCypherRetriever(
    driver=driver,
    index_name="chunk_embeddings",
    embedder=embedder,
    retrieval_query=TRAVERSAL_QUERY,
    neo4j_database="neo4j"
)

The $node_ids parameter is injected automatically by the retriever framework — you do not construct it manually. The framework runs the vector search, collects the top-k anchor node IDs, and substitutes them into $node_ids before executing the Cypher query. The traversal above implements a two-hop pattern: Chunk → Entity ← Chunk.

The LIMIT 10 clause deserves attention. Without a limit, a highly connected entity (a company name appearing in thousands of chunks) can cause the traversal to return hundreds of context snippets, flooding the LLM's context window with loosely relevant text — a tradeoff explored further in the next section.

Stage 5: Response Generation — The LLM as a Stateless Consumer

The final stage assembles retrieved context into a prompt and hands it to an LLM. The architectural principle is deliberate: the LLM is stateless and the graph is the source of truth.

from neo4j_graphrag.generation import GraphRAG
from neo4j_graphrag.llm import OpenAILLM

llm = OpenAILLM(
    model_name="gpt-4o-mini",
    model_params={"temperature": 0}  # deterministic for grounded Q&A
)

pipeline = GraphRAG(retriever=retriever, llm=llm)

response = pipeline.search(
    query_text="Which engineering teams contributed to the authentication module?",
    retriever_config={"top_k": 5}
)

print(response.answer)

The GraphRAG object wraps three operations — embed the query, retrieve context via the configured retriever, assemble a prompt, call the LLM — into a single .search() call. Setting temperature=0 maximizes determinism, which makes the system easier to debug and test.

Wrong thinking: "The LLM will fill in the gaps if retrieval misses something." ✅ Correct thinking: "If the graph doesn't contain the answer, the LLM should say so. Retrieval quality determines answer quality."

Designing prompts that enforce this boundary — "Answer only using the context provided. If the context does not contain the answer, say 'I don't know.'" — is a small but important implementation detail that separates Graph RAG from an ungrounded LLM query.

📋 Quick Reference: Pipeline Stages at a Glance

Stage Input Output Key Risk
Chunk + Embed Raw documents Chunk objects with vectors Wrong chunk size loses context
Write to Neo4j Chunk batch Chunk nodes with embedding property One-tx-per-chunk kills performance
Extract entities Chunk text Typed entity nodes + relationship edges Duplicate entities break traversal
Create index Label, property, dims, metric Searchable vector index Dimension mismatch = silent zero results
Configure retriever Index name, Cypher template VectorCypherRetriever object Over-traversal floods context window
Generate response Query string Grounded LLM answer LLM filling gaps beyond retrieved context

Common Mistakes and Key Takeaways

Every Graph RAG implementation eventually runs into a class of failure that isn't obvious during design but becomes painfully visible in production: the graph traversal returns nothing useful, the LLM produces vague answers despite a well-structured knowledge graph, or the retrieval system silently drifts out of sync with the underlying data. These failures share a common thread — they stem not from misunderstanding the individual components, but from misunderstanding how those components interact under realistic conditions.

Mistake 1: Sparse or Noisy Relationship Extraction

Relationship extraction during ingestion directly determines whether Vector-Cypher retrieval is useful at all. If the entity linking step produces inaccurate, inconsistent, or incomplete relationships, the vector search can still find the correct anchor node, but the subsequent Cypher traversal finds nothing meaningful to return.

User query: "Who approved the budget for Project Helios?"

Vector search → finds chunk: "Project Helios Q3 planning document"

Cypher traversal:
  (chunk)-[:MENTIONS]->(entity:Project {name: "Helios"})
  (entity)<-[:APPROVED_BY]-(person:Person)

  ❌ If APPROVED_BY was never extracted → traversal returns 0 results
  ❌ If entity stored as "Project Helios" vs "Helios" (inconsistent) → no match
  ✅ If extraction was clean and consistent → returns the approver node

A practical failure mode: an LLM extraction prompt sometimes generates AUTHORED_BY, sometimes WRITTEN_BY, and sometimes CREATED_BY for the same semantic relationship. All three edges exist in the graph, but Cypher queries written against AUTHORED_BY miss two-thirds of the actual authorship edges. The fix is a relationship normalization step during ingestion — mapping synonym predicates to a canonical set before writing to Neo4j.

⚠️ Common Mistake: Assuming that a capable LLM extraction prompt is sufficient on its own. LLMs produce variable output formats, hallucinate relationship types not present in the text, and normalize entity names inconsistently. Post-extraction validation and a defined relationship ontology (a fixed set of allowed relationship types) are load-bearing parts of the pipeline.

Mistake 2: Over-Traversing the Graph

Over-traversal occurs when a Cypher query follows too many relationship hops without meaningful filters, returning a subgraph that is technically connected to the anchor node but too broad to be useful as LLM context.

-- ❌ Problematic: unbounded traversal, no type filters
MATCH (chunk:Chunk)-[:MENTIONS*1..5]->(n)
WHERE id(chunk) IN $anchorIds
RETURN n.text

-- ✅ Better: bounded hops, typed relationships, property filter
MATCH (chunk:Chunk)-[:MENTIONS]->(entity:Product)
      <-[:OWNS]-(team:Team)-[:MEMBER_OF]->(dept:Department)
WHERE id(chunk) IN $anchorIds
  AND dept.active = true
RETURN entity.name, team.name, dept.name

When the retrieved text is concatenated into the LLM prompt, the relevant signal is diluted by peripherally related content, and answer quality degrades — a well-documented property of long, noisy contexts where models tend to weight information near the beginning and end of the window more heavily than the middle. Token costs also scale directly with context size.

💡 Pro Tip: Define a maximum useful context size for your LLM and work backwards when writing Cypher templates. If your model handles roughly 8,000 tokens of context well and each node's text averages 200 tokens, your traversal should return at most 30–40 nodes. Use LIMIT clauses and relationship type filters as hard constraints, not suggestions.

Mistake 3: Stale Vector Index After Re-Ingestion

This mistake is operationally subtle and damaging because it produces no errors — just silently wrong retrieval.

When documents are re-ingested, new chunk nodes and embeddings are written to Neo4j. But a vector index is not automatically updated when new nodes are added. If the index was created against the original set of nodes and not refreshed, it continues to serve embeddings for nodes that may have been superseded or changed.

Ingestion Run 1:
  - Chunks A, B, C written; vector index created → covers A, B, C ✅

Ingestion Run 2 (document updated):
  - Chunks A', B', C', D written (replacing A, B, C)
  - Vector index NOT refreshed → still covers A, B, C ❌

Query time:
  - Vector search finds chunk B (stale, pre-update content)
  - Graph traversal runs from B → returns context anchored to old content

⚠️ Common Mistake: Treating vector index creation as a one-time setup step. In any system where documents are updated, index management must be part of the ingestion pipeline design from the start.

Mistake 4: Applying Graph RAG to Relationship-Sparse Corpora

Graph RAG is not a universal upgrade to vector-only RAG. It is a targeted solution for datasets where entities have explicit, meaningful relationships that determine the answer to the questions users ask.

Applying the full Graph RAG stack to a corpus of standalone FAQ articles, independent product descriptions, or disconnected research abstracts adds substantial complexity without improving retrieval quality over a well-tuned vector search. If the data has no meaningful relationships, the graph traversal step adds nothing.

A practical decision framework — ask three questions about your corpus:

  1. Are entities in the corpus related to each other in ways that affect answers? (e.g., organizational hierarchies, product dependencies)
  2. Do users ask relational questions? (e.g., "What products does this team own?" vs. "What is the refund policy?")
  3. Can those relationships be reliably extracted from the source text? If relationships are implicit or require deep inference, extraction quality will be too low to be useful.

If the answer to all three is yes, Graph RAG adds clear value. If any answer is no, the complexity cost is unlikely to be justified.

🤔 Did you know? The marginal value of graph structure is highest when questions require two or more relationship hops to answer. "Which engineers work on products owned by teams in the Infrastructure division?" requires traversing at least three relationship types — structurally impossible for vector search alone, but a natural Cypher traversal. That gap is where Graph RAG earns its complexity cost.

Mistake 5: Applying One Retrieval Strategy Uniformly

The four retrieval strategies covered in this lesson are tools optimized for different query shapes. A common mistake is to pick one strategy during development and apply it uniformly to every query the application receives.

Example application: Enterprise document search

Query type                                    → Best strategy
──────────────────────────────────────────────────────────────
"What is the return policy?"                  → Vector search
"Which engineers maintain this API?"           → Vector-Cypher
"Find documents mentioning SKU-4821"           → Hybrid search
"What are the most influential dependencies?"  → GDS-augmented

The right architecture routes queries to the appropriate retrieval strategy based on query classification — either rule-based (keywords, query patterns) or model-based (a lightweight classifier that predicts query type).

💡 Pro Tip: Start with a logging layer that records query text, retrieval strategy used, retrieved context, and user feedback signals. After a period of production traffic, analyze which query types are associated with low-quality answers. That analysis almost always reveals systematic retrieval strategy mismatches that point directly to routing improvements.

Key Takeaways

Across this lesson, the conceptual territory has moved from the structural limitations of vector-only retrieval, through the dual-purpose architecture of Neo4j, through the mechanics of each retrieval strategy, and through the implementation stages of a full pipeline. The through-line is that Graph RAG is not a component — it is a set of design decisions that interact. The quality of entity extraction constrains the value of graph traversal. The specificity of Cypher queries constrains the signal-to-noise ratio of LLM context. The freshness of the vector index constrains the accuracy of anchor node selection. Each layer depends on the one before it, which means failures compound rather than isolate.

📋 The Five Mistakes and Their Fixes

Mistake How It Manifests Fix
Noisy relationship extraction Vector-Cypher returns empty or irrelevant subgraphs Define a relationship ontology; validate extraction samples post-ingestion
Over-traversal Inflated context dilutes signal; LLM answers become vague Bound hops explicitly; filter by relationship type; use LIMIT
Stale vector index Retrieval anchors to outdated nodes after re-ingestion Refresh index as part of every ingestion pipeline run; add health checks
Wrong corpus fit Graph RAG complexity without retrieval gains Evaluate whether your corpus and query patterns have meaningful, extractable relationships before committing
Uniform retrieval strategy Strategy optimized for test queries fails on production distribution Route queries to the appropriate strategy based on query classification

⚠️ Critical final point: The most expensive mistakes in this list are the ones that fail silently. A stale vector index doesn't throw an exception. An over-traversal query returns too much, not an error. Sparse relationships produce answers that are merely less good, not obviously wrong. Build observability into the pipeline from the start: log retrieval strategy, number of nodes returned, context token count, and anchor node IDs for every query. These signals are how you catch silent failures before users do.