The Summarize stage uses language models to generate summaries from document sets. It can create single summaries from multiple documents, per-document summaries, or answer questions based on the retrieved content.
Stage Category : REDUCE (Aggregates documents)Transformation : N documents → 1 summary document (or N documents with summaries)
When to Use
Use Case Description RAG summarization Generate answers from search results Document synthesis Combine multiple sources into one summary Key points extraction Distill long documents to essentials Question answering Answer user questions from retrieved docs
When NOT to Use
Scenario Recommended Alternative Just formatting for LLM rag_prepare (no LLM call)Extracting structured data llm_enrichReal-time low-latency Pre-compute summaries
Parameters
Parameter Type Default Description promptstring Required Summarization instructions (must include {{DOCUMENTS}}) providerstring googleLLM provider: openai, google, anthropic model_namestring provider default Specific LLM model to use content_fieldstring contentField containing text to summarize group_bystring none Field to group by (one summary per group); omit for a single summary max_input_tokensinteger 8000Max tokens to send to LLM include_sourcesboolean trueAdd source document IDs to output output_fieldstring summaryField for summary output
Available Models
Set provider and model_name together. If provider is omitted, it is inferred from model_name (defaults to google / gemini-2.5-flash-lite).
Provider model_name examplesSpeed Quality googlegemini-2.5-flash-liteFast Good openaigpt-4o-miniFast Good openaigpt-4oMedium Excellent anthropicclaude-haiku-4-5-20251001Fast Good
Configuration Examples
Basic RAG Summary
Detailed Summary
Per-Category Summaries
Executive Brief
Q&A with Sources
{
"stage_name" : "summarize" ,
"stage_type" : "reduce" ,
"config" : {
"stage_id" : "summarize" ,
"parameters" : {
"provider" : "openai" ,
"model_name" : "gpt-4o-mini" ,
"prompt" : "Based on the provided documents, answer the user's question: {{INPUT.query}} \n\n {{DOCUMENTS}}" ,
"include_sources" : true
}
}
}
Grouping
Single Summary (default)
With no group_by, all documents are combined into one summary (N→1):
[Doc1, Doc2, Doc3] → "Combined summary of all documents..."
Per-Group Summaries
Set group_by to a field path to produce one summary per unique group value (N→M).
Use {{GROUP_VALUE}} in the prompt to reference the current group:
group_by: "metadata.category"
[Doc1(A), Doc2(A), Doc3(B)] → ["A" summary, "B" summary]
Output Schema
The summary is written to output_field (default summary). When include_sources is
true, source_document_ids is added; when include_metadata is true, document_count
and tokens_used are added.
Single Summary (no group_by)
{
"summary" : "Based on the documents, the answer is..." ,
"source_document_ids" : [ "doc_123" , "doc_456" ],
"document_count" : 2 ,
"tokens_used" : 1250
}
Per-Group (with group_by)
One summary document per unique group value:
[
{
"summary" : "Summary for the electronics category..." ,
"source_document_ids" : [ "doc_123" , "doc_456" ],
"document_count" : 2
},
{
"summary" : "Summary for the clothing category..." ,
"source_document_ids" : [ "doc_789" ],
"document_count" : 1
}
]
Metric Value Latency 500-2000ms Token usage Depends on input size Max input Model context window Streaming Supported
Summarization calls the LLM and incurs API costs. Use rag_prepare if you only need to format content for external LLM calls.
Common Pipeline Patterns
Full RAG Pipeline
[
{
"stage_name" : "hybrid_search" ,
"stage_type" : "filter" ,
"config" : {
"stage_id" : "feature_search" ,
"parameters" : {
"searches" : [
{ "feature_uri" : "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1" , "query" : { "input_mode" : "text" , "value" : "{{INPUT.query}}" }, "top_k" : 50 }
],
"final_top_k" : 50
}
}
},
{
"stage_name" : "rerank" ,
"stage_type" : "sort" ,
"config" : {
"stage_id" : "rerank" ,
"parameters" : {
"inference_name" : "BAAI__bge_reranker_v2_m3" ,
"top_k" : 10
}
}
},
{
"stage_name" : "summarize" ,
"stage_type" : "reduce" ,
"config" : {
"stage_id" : "summarize" ,
"parameters" : {
"provider" : "openai" ,
"model_name" : "gpt-4o" ,
"prompt" : "Answer the user's question based on the provided documents: {{INPUT.query}} \n\n {{DOCUMENTS}}" ,
"include_sources" : true
}
}
}
]
Multi-Document Synthesis
[
{
"stage_name" : "semantic_search" ,
"stage_type" : "filter" ,
"config" : {
"stage_id" : "feature_search" ,
"parameters" : {
"searches" : [
{ "feature_uri" : "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1" , "query" : { "input_mode" : "text" , "value" : "{{INPUT.topic}}" }, "top_k" : 20 }
],
"final_top_k" : 20
}
}
},
{
"stage_name" : "structured_filter" ,
"stage_type" : "filter" ,
"config" : {
"stage_id" : "attribute_filter" ,
"parameters" : {
"conditions" : {
"field" : "metadata.type" ,
"operator" : "eq" ,
"value" : "research_paper"
}
}
}
},
{
"stage_name" : "summarize" ,
"stage_type" : "reduce" ,
"config" : {
"stage_id" : "summarize" ,
"parameters" : {
"provider" : "anthropic" ,
"model_name" : "claude-haiku-4-5-20251001" ,
"prompt" : "Synthesize the research findings from these papers on {{INPUT.topic}}. Identify common themes, contradictions, and gaps in the research. \n\n {{DOCUMENTS}}" ,
"max_input_tokens" : 32000
}
}
}
]
Preview Summaries
[
{
"stage_name" : "semantic_search" ,
"stage_type" : "filter" ,
"config" : {
"stage_id" : "feature_search" ,
"parameters" : {
"searches" : [
{ "feature_uri" : "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1" , "query" : { "input_mode" : "text" , "value" : "{{INPUT.query}}" }, "top_k" : 10 }
],
"final_top_k" : 10
}
}
},
{
"stage_name" : "summarize" ,
"stage_type" : "reduce" ,
"config" : {
"stage_id" : "summarize" ,
"parameters" : {
"provider" : "openai" ,
"model_name" : "gpt-4o-mini" ,
"prompt" : "Create a one-sentence summary for the '{{GROUP_VALUE}}' group: \n\n {{DOCUMENTS}}" ,
"group_by" : "metadata.source" ,
"output_field" : "preview"
}
}
}
]
Comparison: summarize vs rag_prepare
Feature summarize rag_prepare Calls LLM Yes No Output Generated summary Formatted context Latency 500-2000ms < 10ms Cost LLM API costs Free Use case End-to-end RAG Prepare for external LLM
Error Handling
Error Behavior Token limit exceeded Truncates input, continues LLM timeout Retry once, then fail Rate limit Automatic backoff Empty input Returns empty summary