An agentic RAG system has access to tools including vector_search, {{1}}, calculator, and {{2}}. The agent selects the appropriate tool based on query {{3}}.

["sql_query","API","requirements"]

In multi-agent collaboration, the {{1}} agent routes sub-queries to {{2}} agents with domain-specific expertise, then {{3}} their responses.

["orchestrator","specialist","synthesizes"]

Agentic RAG Systems

Build intelligent multi-step RAG with query planning, parallel retrieval, and context-aware routing.

Agentic RAG Systems

Master agentic RAG systems with free flashcards and spaced repetition practice. This lesson covers autonomous agent architectures, multi-step reasoning chains, tool integration patterns, and dynamic retrieval strategies—essential concepts for building advanced AI search applications that can plan, reason, and adapt their retrieval strategies based on user needs.

Welcome to Agentic RAG 🤖

Welcome to the cutting edge of Retrieval-Augmented Generation! While traditional RAG systems passively retrieve documents and generate responses, agentic RAG systems actively reason about what information they need, decide which tools to use, and orchestrate multi-step workflows to answer complex queries. Think of the difference between a librarian who simply fetches the books you request versus one who understands your research goal, suggests related materials, cross-references sources, and guides you through a structured research process.

Agentic RAG represents a paradigm shift from static pipelines to dynamic, goal-oriented systems that can:

Plan multi-step retrieval strategies
Reason about which information sources to query
Adapt their approach based on intermediate results
Use tools beyond simple vector search (calculators, APIs, databases)
Self-correct when initial attempts don't yield satisfactory results

This architecture powers the most sophisticated AI assistants and search systems deployed today.

Core Concepts

What Makes a RAG System "Agentic"? 🧠

An agentic RAG system incorporates autonomous decision-making into the retrieval-generation pipeline. Instead of following a fixed retrieve→generate sequence, it operates as an intelligent agent that:

Receives a goal (user query)
Plans a strategy (which retrievals to perform, in what order)
Executes actions (retrieval, computation, API calls)
Observes results (evaluates retrieved content quality)
Adapts behavior (refines queries, tries alternative sources)
Terminates when the goal is satisfied

The key difference from traditional RAG:

Traditional RAG	Agentic RAG
Single retrieval step	Multi-step retrieval chains
Fixed pipeline	Dynamic strategy selection
Query → Retrieve → Generate	Query → Plan → Act → Observe → Reflect → Generate
No self-correction	Can retry with refined queries
Single knowledge source	Multiple tools and sources

💡 Think of it this way: Traditional RAG is like a vending machine (you press a button, get a product), while agentic RAG is like a personal shopper (understands your needs, searches multiple stores, asks clarifying questions, makes recommendations).

The Agent Loop: Plan-Act-Observe-Reflect 🔄

At the heart of agentic RAG lies the agent loop, inspired by reinforcement learning and cognitive architectures:

┌─────────────────────────────────────────────┐
│          THE AGENTIC RAG CYCLE              │
└─────────────────────────────────────────────┘

     ┌──────────────────┐
     │   📋 PLAN        │
     │  What info do I  │
     │  need? What      │
     │  strategy?       │
     └────────┬─────────┘
              │
              ↓
     ┌──────────────────┐
     │   🎯 ACT         │
     │  Execute         │
     │  retrieval/tool  │
     │  calls           │
     └────────┬─────────┘
              │
              ↓
     ┌──────────────────┐
     │   👀 OBSERVE     │
     │  Examine         │
     │  results         │
     └────────┬─────────┘
              │
              ↓
     ┌──────────────────┐
     │   🤔 REFLECT     │
     │  Did I answer    │
     │  the query?      │
     │  Need more info? │
     └────────┬─────────┘
              │
       ┌──────┴──────┐
       │             │
       ↓             ↓
    ✅ Done      🔁 Loop back to PLAN
       │
       ↓
  📝 GENERATE
    Final response

Each phase involves LLM reasoning:

PLAN: The agent analyzes the query and generates a strategy. For "What were Apple's revenues last quarter and how does that compare to Microsoft?", it might plan:

Search for Apple Q4 2023 earnings
Search for Microsoft Q4 2023 earnings
Compare the two values

ACT: Execute the planned action using available tools (vector search, SQL queries, API calls, calculators).

OBSERVE: Process the results. Did the retrieval return relevant documents? Are there gaps?

REFLECT: Meta-reasoning about progress. "I found Apple's revenue but the Microsoft results are from Q3, not Q4. I need to refine my query."

Agent Architectures for RAG 🏗️

Several architectural patterns have emerged for building agentic RAG systems:

1. ReAct (Reasoning + Acting)

The ReAct pattern interleaves reasoning traces with action execution. The agent generates explicit thought chains before each action:

User: "What's the capital of the country where CERN is located?"

Thought 1: I need to find out where CERN is located.
Action 1: search_knowledge_base("CERN location")
Observation 1: CERN is located in Geneva, Switzerland...

Thought 2: Now I know CERN is in Switzerland. I need Switzerland's capital.
Action 2: search_knowledge_base("Switzerland capital")
Observation 2: The capital of Switzerland is Bern...

Thought 3: I have the answer now.
Final Answer: Bern is the capital of Switzerland, where CERN is located.

The reasoning traces make the agent's decision-making transparent and debuggable.

2. Tool-Augmented Generation

Agents have access to a toolbox of capabilities beyond vector search:

Tool Type	Purpose	Example
🔍 Vector Search	Semantic retrieval	Find documents similar to query embedding
💾 SQL Database	Structured queries	Query sales data by region and date
🧮 Calculator	Numerical computation	Compute percentage changes
🌐 Web API	Live data	Fetch current stock prices
📊 Code Executor	Data analysis	Run Python for statistics/plotting
🔗 Graph Query	Relationship traversal	Find connections in knowledge graph

The agent selects the appropriate tool(s) based on query requirements:

tools = [
    Tool(
        name="vector_search",
        description="Search document embeddings for semantic similarity",
        func=vector_search_function
    ),
    Tool(
        name="sql_query",
        description="Query structured database for exact values",
        func=sql_query_function
    ),
    Tool(
        name="calculator",
        description="Perform mathematical calculations",
        func=calculator_function
    )
]

agent = Agent(tools=tools, llm=llm)

3. Multi-Agent Collaboration

Specialized agents work together, each expert in a domain:

┌──────────────────────────────────────────────┐
│         MULTI-AGENT RAG SYSTEM               │
└──────────────────────────────────────────────┘

          ┌─────────────────┐
          │  🎭 ORCHESTRATOR│
          │     AGENT       │
          └────────┬────────┘
                   │
       ┌───────────┼───────────┐
       │           │           │
       ↓           ↓           ↓
┌────────────┐ ┌────────────┐ ┌────────────┐
│📚 RESEARCH │ │💼 FINANCIAL│ │⚖️ LEGAL    │
│   AGENT    │ │   AGENT    │ │   AGENT    │
│            │ │            │ │            │
│- Academic  │ │- Earnings  │ │- Contracts │
│  papers    │ │  data      │ │- Compliance│
│- Citations │ │- Market    │ │- Case law  │
└────────────┘ └────────────┘ └────────────┘

The orchestrator routes sub-queries to specialized agents, then synthesizes their responses.

Dynamic Retrieval Strategies 🎯

Agentic RAG systems employ sophisticated retrieval strategies that adapt based on query complexity and context:

Query Decomposition

Complex queries are broken into sub-queries that can be answered independently:

Original Query:
"Compare the environmental impact of electric vs gas vehicles 
over 10 years including manufacturing and operation"

         ┌──────────────────────┐
         │   DECOMPOSE          │
         └──────────┬───────────┘
                    │
      ┌─────────────┼─────────────┐
      │             │             │
      ↓             ↓             ↓
┌──────────┐  ┌──────────┐  ┌──────────┐
│ EV mfg   │  │ Gas mfg  │  │ Operation│
│ impact   │  │ impact   │  │ emissions│
└────┬─────┘  └────┬─────┘  └────┬─────┘
     │             │             │
     └─────────────┼─────────────┘
                   ↓
            ┌──────────────┐
            │  SYNTHESIZE  │
            │  Comparative │
            │  analysis    │
            └──────────────┘

This enables parallel retrieval and more targeted searches.

The agent refines its retrieval strategy based on result quality:

Iteration	Query	Observation	Refinement
1	"transformer architecture"	Too broad, 10000 results	Add specificity
2	"transformer attention mechanism"	Better, but missing implementation	Add constraint
3	"transformer self-attention code implementation"	Relevant results found	✅ Success

Hypothetical Document Embeddings (HyDE)

Instead of embedding the query directly, the agent:

Generates a hypothetical answer document
Embeds that document
Searches for similar real documents

This works because the hypothetical document is stylistically similar to actual documents in the corpus:

## Traditional approach
query = "What causes ocean acidification?"
query_embedding = embed(query)
results = vector_search(query_embedding)

## HyDE approach
hypothetical_doc = llm.generate(
    f"Write a paragraph answering: {query}"
)
## Output: "Ocean acidification occurs when CO2 from the atmosphere 
## dissolves in seawater, forming carbonic acid..."

hypothetical_embedding = embed(hypothetical_doc)
results = vector_search(hypothetical_embedding)  # Better matches!

Retrieval with Feedback

The agent uses relevance feedback to improve results:

1. Initial retrieval → Get top-K documents
                       ↓
2. LLM evaluates   → "Docs 1,3,5 relevant; 2,4 not"
                       ↓
3. Adjust strategy → Search for docs similar to 1,3,5
                      but dissimilar to 2,4
                       ↓
4. Final retrieval → Refined result set

Memory and State Management 💾

Agentic RAG systems maintain conversation memory and working memory:

Short-term (Working) Memory: Active context for current task

Current query and sub-queries
Retrieved documents in this session
Intermediate reasoning traces
Tool execution results

Long-term (Episodic) Memory: Historical interactions

Past conversations with this user
Previously successful retrieval strategies
User preferences and context

┌─────────────────────────────────────────┐
│          MEMORY ARCHITECTURE            │
└─────────────────────────────────────────┘

    ┌───────────────────────────┐
    │   WORKING MEMORY          │
    │   ┌─────────────────┐     │
    │   │ Current query   │     │
    │   │ Sub-queries     │     │
    │   │ Reasoning trace │     │
    │   │ Retrieved docs  │     │
    │   └─────────────────┘     │
    └────────────┬──────────────┘
                 │
    ┌────────────┴──────────────┐
    │   LONG-TERM MEMORY        │
    │   ┌─────────────────┐     │
    │   │ Conversation    │     │
    │   │ history         │     │
    │   │ User profile    │     │
    │   │ Preferences     │     │
    │   └─────────────────┘     │
    └───────────────────────────┘

Memory enables:

Contextual understanding: "Tell me more about that" refers to previous topic
Progressive refinement: "No, I meant the Python version" corrects earlier query
Personalization: Remembers user's expertise level and interests

Evaluation and Self-Correction 🔍

Sophisticated agentic RAG systems self-evaluate their outputs:

Retrieval Quality Assessment

Before generating a final answer, the agent checks:

def evaluate_retrieval(query, documents):
    scores = {
        "relevance": llm.score_relevance(query, documents),
        "coverage": llm.check_coverage(query, documents),
        "recency": check_document_freshness(documents),
        "diversity": measure_document_diversity(documents)
    }
    
    if scores["relevance"] < 0.7:
        return "REFINE_QUERY"  # Try different search terms
    elif scores["coverage"] < 0.6:
        return "EXPAND_SEARCH"  # Need more sources
    else:
        return "PROCEED"  # Good to generate answer

Answer Validation

After generating a response, validate it:

Validation Check	Method	Action if Failed
Factual consistency	NLI model checks if answer contradicts sources	Regenerate without contradictory claims
Completeness	LLM judges if query fully answered	Retrieve additional info
Source attribution	Verify all claims cite retrieved docs	Add citations or remove unsupported claims
Coherence	Check logical flow and consistency	Restructure answer

Correction Strategies

When validation fails:

Query refinement: Rephrase search terms
Source expansion: Query additional knowledge bases
Reasoning path adjustment: Try alternative decomposition
Tool switching: Use different retrieval method
Clarification request: Ask user for more context

❌ Low confidence answer detected
         ↓
    ┌─────────────────┐
    │ CORRECTION      │
    │ STRATEGY        │
    └────────┬────────┘
             │
  ┌──────────┼──────────┐
  │          │          │
  ↓          ↓          ↓
🔄 Refine  🔍 Expand  ❓ Clarify
  Query     Sources    with User

Examples

Example 1: Multi-Step Research Query 📚

User Query: "What are the latest advancements in quantum computing and what companies are leading the field?"

Traditional RAG would perform a single vector search and generate an answer. An agentic system:

┌──────────────────────────────────────────────┐
│ STEP 1: QUERY ANALYSIS & PLANNING           │
└──────────────────────────────────────────────┘
Agent reasoning: This query has two components:
  A) Recent technical advancements
  B) Leading companies
  
Strategy: Decompose into sub-queries with temporal constraint

┌──────────────────────────────────────────────┐
│ STEP 2: EXECUTE SUB-QUERY A                 │
└──────────────────────────────────────────────┘
Action: vector_search(
    query="quantum computing advancements breakthroughs",
    date_filter="after:2023-01-01",
    top_k=10
)

Observation: Retrieved 10 papers/articles about:
  - Error correction improvements
  - New qubit architectures
  - Quantum advantage demonstrations

┌──────────────────────────────────────────────┐
│ STEP 3: EXECUTE SUB-QUERY B                 │
└──────────────────────────────────────────────┘
Action: vector_search(
    query="companies quantum computing development investment",
    top_k=10
)

Observation: Found mentions of IBM, Google, IonQ, Rigetti

┌──────────────────────────────────────────────┐
│ STEP 4: ENRICHMENT                          │
└──────────────────────────────────────────────┘
Agent reasoning: I have company names but no recent specifics.
Let me get their latest developments.

Action: For each company, search(
    query=f"{company} quantum computing 2024"
)

Observation: Found specific projects:
  - IBM: Condor processor (1121 qubits)
  - Google: Willow chip error correction
  - IonQ: Forte Enterprise system

┌──────────────────────────────────────────────┐
│ STEP 5: SYNTHESIS                           │
└──────────────────────────────────────────────┘
Generate comprehensive answer combining:
  ✓ Technical advancements from Step 2
  ✓ Company leaders from Step 3
  ✓ Specific projects from Step 4
  ✓ Temporal context (latest = 2024)

Final Answer: "Recent quantum computing advancements include significant improvements in error correction rates and the development of larger qubit systems. Google's Willow chip demonstrated breakthrough error correction, while IBM's Condor processor achieved 1121 qubits. Leading companies include IBM, Google, IonQ (with their Forte Enterprise system), and Rigetti. These organizations have made the most substantial investments and achieved measurable quantum advantage in specific applications."

💡 Key Insight: The agent performed 12+ retrievals, synthesized information across temporal and topical dimensions, and validated company-specific claims with targeted searches.

Example 2: Tool Selection and Numerical Reasoning 🧮

User Query: "If a company's revenue grew from $50M in 2020 to $85M in 2024, what's the compound annual growth rate?"

This requires both retrieval (for the formula) and computation:

┌──────────────────────────────────────────────┐
│ STEP 1: TASK ANALYSIS                       │
└──────────────────────────────────────────────┘
Agent reasoning: This needs:
  1. CAGR formula (knowledge retrieval)
  2. Numerical calculation (tool use)

┌──────────────────────────────────────────────┐
│ STEP 2: RETRIEVE FORMULA                    │
└──────────────────────────────────────────────┘
Action: vector_search("CAGR formula compound annual growth rate")

Observation: Retrieved documents explaining:
  CAGR = (Ending Value / Beginning Value)^(1/n) - 1
  where n = number of years

┌──────────────────────────────────────────────┐
│ STEP 3: EXECUTE CALCULATION                 │
└──────────────────────────────────────────────┘
Agent reasoning: I have the formula, now compute.

Action: calculator(
    expression="(85/50)**(1/4) - 1"
)

Observation: Result = 0.1418 (14.18%)

┌──────────────────────────────────────────────┐
│ STEP 4: VALIDATION                          │
└──────────────────────────────────────────────┘
Agent reasoning: Let me verify this makes sense.
  - 50 * 1.1418^4 = 50 * 1.70 ≈ 85 ✓
  - Growth rate seems reasonable for a growing company ✓

┌──────────────────────────────────────────────┐
│ STEP 5: GENERATE EXPLANATION                │
└──────────────────────────────────────────────┘
Combine formula retrieval + calculation + interpretation

Implementation code:

from langchain.agents import AgentExecutor, create_react_agent
from langchain.tools import Tool

tools = [
    Tool(
        name="vector_search",
        func=lambda q: vector_db.search(q, top_k=5),
        description="Search knowledge base for information"
    ),
    Tool(
        name="calculator",
        func=lambda expr: eval(expr),  # In production, use safe eval
        description="Perform mathematical calculations"
    )
]

agent = create_react_agent(llm=llm, tools=tools, prompt=agent_prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = agent_executor.invoke({
    "input": "If revenue grew from $50M in 2020 to $85M in 2024, what's the CAGR?"
})

Final Answer: "The compound annual growth rate (CAGR) is 14.18%. This is calculated using the formula CAGR = (Ending Value / Beginning Value)^(1/n) - 1, where the ending value is $85M, beginning value is $50M, and n is 4 years. This represents strong sustained growth over the period."

Example 3: Self-Correction with Feedback Loop 🔄

User Query: "What was the impact of the 2023 banking crisis on tech startups?"

Watch how the agent self-corrects:

┌──────────────────────────────────────────────┐
│ ITERATION 1: Initial Attempt                │
└──────────────────────────────────────────────┘
Action: vector_search("2023 banking crisis tech startups")

Observation: Retrieved documents about:
  - General banking crisis overview
  - SVB collapse
  - Regulatory responses
  
Self-Evaluation: ⚠️ Missing specific impact on startups!
  Relevance score: 0.6/1.0 (too low)
  Coverage: General context OK, but lacks startup-specific data

┌──────────────────────────────────────────────┐
│ ITERATION 2: Refined Query                  │
└──────────────────────────────────────────────┘
Agent reasoning: I need more specific information about
how startups were affected. Let me refine my query.

Action: vector_search(
    "Silicon Valley Bank SVB collapse startup funding impact"
)

Observation: Better! Found:
  - 50% of US startups banked with SVB
  - Funding delays and concerns
  - Bridge financing solutions
  
Self-Evaluation: ✅ Much better relevance (0.85/1.0)
  But still missing: quantitative impact data

┌──────────────────────────────────────────────┐
│ ITERATION 3: Expand with Structured Data   │
└──────────────────────────────────────────────┘
Agent reasoning: I should look for actual numbers.
Let me try a SQL query for structured data.

Action: sql_query("""
    SELECT metric, value, quarter 
    FROM startup_funding_data
    WHERE year = 2023 AND quarter IN ('Q1', 'Q2')
""")

Observation: Retrieved:
  - Q1 2023: $76B funding
  - Q2 2023: $52B funding (32% decline)
  - Avg deal size decreased 28%

Self-Evaluation: ✅ Now I have complete information:
  ✓ Context (SVB collapse)
  ✓ Qualitative impact (funding concerns)
  ✓ Quantitative impact (specific numbers)
  ✓ Timeline (Q1-Q2 2023)
  
Confidence: 0.92/1.0 → Proceed to generation

Final Answer: "The 2023 banking crisis, particularly the collapse of Silicon Valley Bank in March 2023, significantly impacted tech startups. Approximately 50% of US venture-backed startups held deposits with SVB, causing immediate liquidity concerns. Funding declined 32% from Q1 to Q2 2023 (from $76B to $52B), and average deal sizes decreased 28%. Many startups faced delayed funding rounds and pursued bridge financing while seeking alternative banking relationships. The crisis accelerated a broader venture funding slowdown that had begun in late 2022."

💡 Key Insight: The agent detected insufficient information quality, refined its approach twice using different strategies (query refinement, then tool switching to structured data), and only generated the final answer when confidence was high.

Example 4: Multi-Agent Collaboration System 🤝

User Query: "Analyze the legal and financial implications of the proposed Microsoft-Activision merger."

This requires expertise from multiple domains:

┌──────────────────────────────────────────────┐
│         ORCHESTRATOR AGENT                   │
│  Analyzes query, routes to specialists      │
└──────────────────────────────────────────────┘
         |
         | Decompose into:
         | - Legal analysis
         | - Financial analysis
         |
    ┌────┴────┐
    ↓         ↓
┌─────────┐ ┌─────────┐
│⚖️ LEGAL  │ │💰 FINANCE│
│ AGENT   │ │ AGENT   │
└────┬────┘ └────┬────┘
     │           │
     ↓           ↓

⚖️ LEGAL AGENT PROCESS:
─────────────────────────
1. Search antitrust case law
   → Clayton Act precedents
   → Market concentration thresholds
   
2. Retrieve regulatory filings
   → FTC concerns about gaming market
   → EU Commission statements
   
3. Analysis: Identifies key issues:
   - Market share in gaming (30%+ combined)
   - Exclusive content concerns (Call of Duty)
   - Regulatory approval risks

💰 FINANCE AGENT PROCESS:
─────────────────────────
1. Query financial databases
   → Microsoft balance sheet
   → Activision valuation metrics
   
2. Calculate deal metrics
   → $68.7B purchase price
   → 45% premium to market price
   → Microsoft's cash reserves: $100B+
   
3. Retrieve analyst reports
   → Revenue synergies estimates
   → Integration cost projections

┌──────────────────────────────────────────────┐
│         ORCHESTRATOR SYNTHESIS               │
└──────────────────────────────────────────────┘
Combines both analyses:
  ✓ Legal risks and mitigation strategies
  ✓ Financial structure and affordability
  ✓ Cross-cutting insights (e.g., legal delays
    impact financing costs)

Implementation architecture:

class SpecializedAgent:
    def __init__(self, domain, tools, llm):
        self.domain = domain
        self.tools = tools
        self.llm = llm
    
    def analyze(self, query):
        # Domain-specific retrieval and reasoning
        results = []
        for tool in self.tools:
            result = tool.execute(query)
            results.append(result)
        return self.synthesize(results)

legal_agent = SpecializedAgent(
    domain="legal",
    tools=[case_law_search, regulatory_filing_search],
    llm=llm
)

finance_agent = SpecializedAgent(
    domain="finance",
    tools=[financial_db_query, analyst_report_search, calculator],
    llm=llm
)

class OrchestratorAgent:
    def __init__(self, specialist_agents, llm):
        self.specialists = specialist_agents
        self.llm = llm
    
    def process_query(self, query):
        # Route to specialists
        tasks = self.decompose_query(query)
        results = {}
        
        for task in tasks:
            agent = self.route_to_specialist(task)
            results[task] = agent.analyze(task)
        
        # Synthesize specialist outputs
        return self.synthesize_multi_agent(results)

orchestrator = OrchestratorAgent(
    specialist_agents=[legal_agent, finance_agent],
    llm=llm
)

Final Answer: "The Microsoft-Activision merger presents significant legal and financial considerations. Legally, the $68.7B acquisition faces antitrust scrutiny due to combined market share exceeding 30% in gaming. Key regulatory concerns include potential exclusive content arrangements (particularly Call of Duty) and market concentration. FTC and EU regulators required behavioral remedies including 10-year commitments to maintain cross-platform availability. Financially, Microsoft is well-positioned with over $100B in cash reserves, making the 45% premium sustainable. Projected revenue synergies of $2B+ annually justify the valuation, though integration costs are estimated at $1-2B. The 18-month regulatory approval process increased financing costs by approximately $500M in interest expenses."

💡 Key Insight: Multiple specialized agents with domain-specific knowledge bases and tools can tackle complex queries requiring diverse expertise more effectively than a single generalist agent.

Common Mistakes

⚠️ Over-Complicated Agent Loops

Mistake: Creating agents that loop endlessly or make too many retrieval calls.

## ❌ BAD: No termination conditions
while True:
    results = agent.retrieve(query)
    query = agent.refine(query, results)
    # This could run forever!

Solution: Implement explicit termination conditions:

## ✅ GOOD: Clear stopping criteria
max_iterations = 5
confidence_threshold = 0.85

for iteration in range(max_iterations):
    results = agent.retrieve(query)
    confidence = agent.evaluate_quality(results)
    
    if confidence >= confidence_threshold:
        break  # Good enough!
    
    if iteration == max_iterations - 1:
        # Use best attempt so far
        break
    
    query = agent.refine(query, results)

⚠️ Ignoring Tool Cost and Latency

Mistake: Treating all tools as equally expensive to call.

## ❌ BAD: Calling expensive API for simple query
if "what time" in query:
    agent.call_expensive_llm_tool()  # Overkill!

Solution: Implement tool cost awareness:

## ✅ GOOD: Cost-aware tool selection
tools = [
    Tool(name="regex", cost=0.0001, latency_ms=1),
    Tool(name="vector_search", cost=0.001, latency_ms=50),
    Tool(name="llm_call", cost=0.02, latency_ms=1000),
]

def select_tool(query, tools):
    # Try cheapest tools first
    for tool in sorted(tools, key=lambda t: t.cost):
        if tool.can_handle(query):
            return tool
    return tools[-1]  # Fallback to most capable

⚠️ Insufficient Error Handling

Mistake: Assuming retrieval always succeeds.

## ❌ BAD: No handling of empty results
results = vector_search(query)
return generate_answer(results[0])  # Crashes if empty!

Solution: Robust error handling with fallbacks:

## ✅ GOOD: Handle failure cases
try:
    results = vector_search(query)
    
    if not results:
        # Try alternative strategy
        results = keyword_search(query)
    
    if not results:
        return {
            "answer": "I couldn't find relevant information.",
            "confidence": 0.0,
            "suggestion": "Try rephrasing your question or asking something more specific."
        }
    
    return generate_answer(results)
    
except Exception as e:
    log_error(e)
    return fallback_response(query)

⚠️ Poor Prompt Engineering for Agent Reasoning

Mistake: Vague instructions for the agent's reasoning process.

## ❌ BAD: Unclear guidance
prompt = "Answer the question using available tools."

Solution: Explicit reasoning templates:

## ✅ GOOD: Structured reasoning format
prompt = """
You are an agent with access to these tools: {tool_descriptions}

For each step, follow this format:

Thought: [Analyze what information you need next]
Action: [Choose a tool to use: {tool_names}]
Action Input: [Specify the input for the tool]
Observation: [The tool's result will appear here]
... (repeat Thought/Action/Observation as needed)
Thought: I now have enough information to answer
Final Answer: [Your comprehensive answer]

Query: {query}

Let's begin:
Thought:"""

⚠️ Not Validating Agent Outputs

Mistake: Trusting generated tool calls without validation.

## ❌ BAD: Executing arbitrary code
action = agent.next_action()
exec(action)  # Dangerous!

Solution: Validate and sanitize:

## ✅ GOOD: Whitelist and validate
ALLOWED_TOOLS = {"vector_search", "calculator", "sql_query"}

action = agent.next_action()

if action.tool_name not in ALLOWED_TOOLS:
    raise ValueError(f"Tool {action.tool_name} not allowed")

if action.tool_name == "sql_query":
    # Validate SQL to prevent injection
    if not is_safe_sql(action.input):
        raise ValueError("Unsafe SQL query")

result = execute_tool(action)

⚠️ Memory Management Issues

Mistake: Not limiting context window size as conversation grows.

## ❌ BAD: Unbounded memory growth
conversation_history.append(user_message)
conversation_history.append(agent_response)
## Eventually exceeds token limits!

Solution: Implement memory summarization:

## ✅ GOOD: Bounded memory with summarization
MAX_MESSAGES = 20

if len(conversation_history) > MAX_MESSAGES:
    # Summarize older messages
    old_messages = conversation_history[:10]
    summary = llm.summarize(old_messages)
    
    conversation_history = [
        {"role": "system", "content": f"Previous context: {summary}"}
    ] + conversation_history[10:]

conversation_history.append(user_message)

Key Takeaways

🎯 Core Principles:

Agentic RAG = RAG + Autonomous Reasoning: The system plans, acts, observes, and reflects rather than following a fixed pipeline.
The Agent Loop is Central: Plan → Act → Observe → Reflect is the fundamental pattern for dynamic retrieval.
Tool Augmentation Expands Capabilities: Vector search is just one tool among many (SQL, calculators, APIs, code execution).
Multi-Step Reasoning Handles Complexity: Query decomposition, iterative refinement, and synthesis enable complex query answering.
Self-Correction Improves Quality: Agents should evaluate their own outputs and retry with refined strategies.
Memory Management is Critical: Both working memory (current task) and long-term memory (conversation history) must be managed.
Validation Prevents Failures: Always validate retrieval quality, tool outputs, and generated answers.
Specialized Agents for Expertise: Multi-agent systems with domain specialists outperform single generalist agents on complex queries.

🔧 Implementation Checklist:

⚡ Performance Considerations:

Agentic systems trade latency for quality (multiple retrieval rounds)
Optimize by caching frequent tool results
Parallelize independent sub-queries when possible
Set appropriate timeouts for each agent action
Balance between exploration (trying new strategies) and exploitation (using proven approaches)

📚 Further Study

ReAct Paper: "ReAct: Synergizing Reasoning and Acting in Language Models" - Original research on interleaving reasoning and action
LangChain Agent Documentation: https://python.langchain.com/docs/modules/agents/ - Practical implementation guide for building agents
HyDE Method: "Precise Zero-Shot Dense Retrieval without Relevance Labels" - Hypothetical document embeddings technique

📋 Quick Reference: Agentic RAG Architecture

Pattern	Use Case	Key Component
ReAct	Transparent reasoning traces	Thought→Action→Observation loops
Tool-Augmented	Multi-modal information needs	Toolbox with diverse capabilities
Multi-Agent	Domain expertise required	Orchestrator + specialists
Query Decomposition	Complex multi-part queries	Sub-query generation and synthesis
Iterative Refinement	Low initial retrieval quality	Feedback loop with quality scoring
HyDE	Sparse retrieval results	Hypothetical document generation

Agent Loop Phases:

📋 PLAN	Decompose query, select strategy
🎯 ACT	Execute tool calls
👀 OBSERVE	Process results
🤔 REFLECT	Evaluate quality, decide next step

Termination Conditions: Max iterations (3-5) | Confidence threshold (>0.85) | Empty results | User satisfaction

📝

Ready to practice?

This lesson has 15 questions to help you learn

Agentic RAG Systems

Agentic RAG Systems

Welcome to Agentic RAG 🤖

Core Concepts

What Makes a RAG System "Agentic"? 🧠

The Agent Loop: Plan-Act-Observe-Reflect 🔄

Agent Architectures for RAG 🏗️

1. ReAct (Reasoning + Acting)

2. Tool-Augmented Generation

3. Multi-Agent Collaboration

Dynamic Retrieval Strategies 🎯

Query Decomposition

Iterative Refinement

Hypothetical Document Embeddings (HyDE)

Retrieval with Feedback

Memory and State Management 💾

Evaluation and Self-Correction 🔍

Retrieval Quality Assessment

Answer Validation

Correction Strategies

Examples

Example 1: Multi-Step Research Query 📚

Example 2: Tool Selection and Numerical Reasoning 🧮

Example 3: Self-Correction with Feedback Loop 🔄

Example 4: Multi-Agent Collaboration System 🤝

Common Mistakes

⚠️ Over-Complicated Agent Loops

⚠️ Ignoring Tool Cost and Latency

⚠️ Insufficient Error Handling

⚠️ Poor Prompt Engineering for Agent Reasoning

⚠️ Not Validating Agent Outputs

⚠️ Memory Management Issues

Key Takeaways

📚 Further Study

📋 Quick Reference: Agentic RAG Architecture