What is Agentic Retrieval? Beyond Traditional RAG

Traditional RAG (Retrieval-Augmented Generation) follows a fixed pattern: encode the query, retrieve the top-K documents, stuff them into a prompt, and generate a response. This works well for simple questions but falls apart when the information need is complex, ambiguous, or requires reasoning across multiple sources.

Agentic retrieval is the next evolution: instead of a fixed retrieve-then-generate pipeline, an AI agent dynamically decides how to search, what tools to use, and when to refine its queries based on intermediate results.

How Traditional RAG Falls Short

Consider the question: "Compare Mixpeek's video processing capabilities with its document processing features, and explain which is better for a media company with 10TB of mixed content."

A traditional RAG system would:

Encode the entire question as one vector
Retrieve the top-5 most similar chunks
Hope that those 5 chunks contain information about both video processing AND document processing AND pricing/capacity considerations

This rarely works. The single query embedding is a compromise between multiple information needs, and the retrieved chunks are unlikely to cover all aspects of the question.

How Agentic Retrieval Works

An agentic retrieval system gives an LLM access to retrieval tools and lets it plan a multi-step search strategy:

Decomposition — The agent breaks the complex question into sub-queries: "video processing capabilities", "document processing features", "capacity for 10TB mixed content"
Tool Selection — For each sub-query, the agent chooses the most appropriate search tool: vector search for conceptual questions, keyword search for specific features, metadata filtering for capacity specs
Evaluation — After each retrieval step, the agent evaluates whether the results are sufficient or if it needs to refine the query
Synthesis — Once all sub-queries are answered, the agent synthesizes a comprehensive response with proper attribution

from mixpeek import Mixpeek

client = Mixpeek(api_key="your-api-key")

# Create a retriever with agent_search capability
retriever = client.retrievers.create(
    name="agentic_retriever",
    collection_id="knowledge-base",
    stages=[
        {
            "type": "agent_search",
            "model": "gpt-4",
            "tools": [
                {
                    "type": "vector_search",
                    "description": "Search by semantic similarity",
                    "parameters": {"top_k": 20}
                },
                {
                    "type": "keyword_search",
                    "description": "Search by exact keyword match",
                    "parameters": {"fields": ["title", "content"]}
                },
                {
                    "type": "filter",
                    "description": "Filter by metadata fields",
                    "parameters": {"fields": ["category", "date", "modality"]}
                }
            ],
            "max_iterations": 5
        }
    ]
)

# The agent autonomously plans its search strategy
results = client.retrievers.execute(
    retriever_id=retriever.id,
    query="Compare video vs document processing for a media company with 10TB of content"
)

Key Components

Tool Schemas

Each retrieval tool is described with a schema that tells the agent what it does and what parameters it accepts. The agent uses these descriptions to decide which tool to call for each sub-query. Well-written tool descriptions are critical — they are the agent's instruction manual.

Working Memory

The agent maintains a scratchpad of retrieved information across iterations. This prevents redundant searches and allows the agent to build up context incrementally. Each new tool call is informed by what the agent has already found.

Iteration Limits

Without guardrails, an agent can loop indefinitely — searching, finding nothing useful, and searching again with slight variations. Set a maximum iteration count (typically 3-5) and implement early stopping when the agent determines it has sufficient information.

Agentic vs. Traditional RAG

Aspect	Traditional RAG	Agentic Retrieval
Query handling	Single vector lookup	Multi-step decomposition
Tool use	One search method	Multiple tools selected dynamically
Refinement	None — one-shot retrieval	Iterative based on results
Complex questions	Often incomplete answers	Comprehensive, multi-faceted answers
Latency	Low (single retrieval)	Higher (multiple retrieval rounds)
Cost	Lower (one embedding + one LLM call)	Higher (multiple LLM calls for planning)

When to Use Agentic Retrieval

Complex analytical questions that require information from multiple sources or perspectives
Comparison queries where the user needs information about multiple entities
Exploratory search where the user's information need is not fully specified upfront
Multi-modal queries that require searching across different data types and combining results

For simple factual lookups ("What is the API rate limit?"), traditional RAG is faster and cheaper. Use agentic retrieval when the question complexity justifies the additional compute cost.

Read more about this approach in our glossary entry on agentic retrieval, or explore our FAQ on RAG for foundational context.