The Rerank stage uses cross-encoder models to re-score and reorder search results. Unlike bi-encoder models (used in semantic search), cross-encoders process the query and document together, enabling more accurate relevance scoring at the cost of higher latency.
Stage Category : SORT (Reorders documents)Transformation : N documents → top_k documents (re-ranked by relevance)
When to Use
Use Case Description Two-stage retrieval Fast recall (search) + precise ranking (rerank) High-precision requirements When ranking quality is critical Top-N optimization Improve quality of final displayed results RAG applications Better context selection for LLM generation
When NOT to Use
Scenario Recommended Alternative Large result sets (1000+) Too slow; use sort_by_field Real-time requirements (< 20ms) Use search scores directly Simple attribute sorting sort_by_field
Parameters
Parameter Type Default Description inference_namestring BAAI__bge_reranker_v2_m3Reranking inference service. List options with GET /engine/inference. Ignored when feature_uri is set. feature_uristring nullCustom reranker plugin (mixpeek://...). Overrides inference_name. top_kinteger nullNumber of results to keep after reranking (omit to keep all, reordered). querystring {{INPUT.query}}Query for relevance scoring document_fieldstring contentDocument field to rerank against
Available Models
The built-in reranker is BAAI__bge_reranker_v2_m3 (multilingual cross-encoder). For other models, deploy your own via a custom extractor and reference it with feature_uri.
Configuration Examples
Basic Reranking
High-Quality Reranking
Custom Query
Large Candidate Pool
{
"stage_name" : "rerank" ,
"stage_type" : "sort" ,
"config" : {
"stage_id" : "rerank" ,
"parameters" : {
"inference_name" : "BAAI__bge_reranker_v2_m3" ,
"top_k" : 10
}
}
}
How Cross-Encoders Work
Bi-Encoder (Search) Cross-Encoder (Rerank) Query and doc encoded separately Query + doc encoded together Pre-compute doc embeddings Must process each pair Fast (< 10ms for millions) Slower (50-100ms for 100 docs) Good approximate ranking Precise relevance scoring
Cross-encoders see the full context of both query and document together, enabling better understanding of semantic relationships.
Two-Stage Retrieval Pattern
The recommended pattern is fast recall followed by precise reranking:
[
{
"stage_name" : "search" ,
"stage_type" : "filter" ,
"config" : {
"stage_id" : "feature_search" ,
"parameters" : {
"searches" : [
{
"feature_uri" : "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1" ,
"query" : { "input_mode" : "text" , "value" : "{{INPUT.query}}" },
"top_k" : 100
}
],
"final_top_k" : 100
}
}
},
{
"stage_name" : "rerank" ,
"stage_type" : "sort" ,
"config" : {
"stage_id" : "rerank" ,
"parameters" : {
"inference_name" : "BAAI__bge_reranker_v2_m3" ,
"top_k" : 10
}
}
}
]
Why this works:
Search stage : Fast, retrieves 100 candidates (< 20ms)
Rerank stage : Slower but precise, picks best 10 (50-100ms)
Total : High-quality results in 70-120ms
Metric Value Latency 50-100ms (depends on candidate count) Optimal input size 50-200 documents Maximum practical ~500 documents Batching Automatic
Reranking 1000+ documents is not recommended. Use top_k limits in the search stage to control candidate pool size.
Output
Each returned document includes:
Field Type Description document_idstring Unique document identifier scorefloat Reranker relevance score original_scorefloat Score from previous stage rerank_positioninteger Position after reranking
Common Pipeline Patterns
Search + Rerank + Limit
[
{
"stage_name" : "search" ,
"stage_type" : "filter" ,
"config" : {
"stage_id" : "feature_search" ,
"parameters" : {
"searches" : [
{
"feature_uri" : "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1" ,
"query" : { "input_mode" : "text" , "value" : "{{INPUT.query}}" },
"top_k" : 100
}
],
"final_top_k" : 100
}
}
},
{
"stage_name" : "rerank" ,
"stage_type" : "sort" ,
"config" : {
"stage_id" : "rerank" ,
"parameters" : {
"inference_name" : "BAAI__bge_reranker_v2_m3" ,
"top_k" : 20
}
}
},
{
"stage_name" : "limit" ,
"stage_type" : "reduce" ,
"config" : {
"stage_id" : "limit" ,
"parameters" : {
"limit" : 5
}
}
}
]
Search + Filter + Rerank
[
{
"stage_name" : "search" ,
"stage_type" : "filter" ,
"config" : {
"stage_id" : "feature_search" ,
"parameters" : {
"searches" : [
{
"feature_uri" : "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1" ,
"query" : { "input_mode" : "text" , "value" : "{{INPUT.query}}" },
"top_k" : 200
}
],
"final_top_k" : 200
}
}
},
{
"stage_name" : "category_filter" ,
"stage_type" : "filter" ,
"config" : {
"stage_id" : "attribute_filter" ,
"parameters" : {
"field" : "metadata.category" ,
"operator" : "eq" ,
"value" : "{{INPUT.category}}"
}
}
},
{
"stage_name" : "rerank" ,
"stage_type" : "sort" ,
"config" : {
"stage_id" : "rerank" ,
"parameters" : {
"inference_name" : "BAAI__bge_reranker_v2_m3" ,
"top_k" : 10
}
}
}
]
Custom Reranker (BYO Model)
Use your own reranker model deployed as a custom extractor instead of the built-in models. Set feature_uri to route reranking through your extractor’s inference endpoint.
Parameters
Parameter Type Default Description feature_uristring nullFeature URI of a custom reranker plugin. Overrides inference_name when set.
Your plugin must accept {pairs: [[query, doc], ...]} and return {scores: [float, ...]}.
Configuration Example
{
"stage_name" : "my_rerank" ,
"config" : {
"stage_id" : "rerank" ,
"parameters" : {
"feature_uri" : "mixpeek://my_reranker@1.0.0/rerank" ,
"top_k" : 10
}
}
}
Set inference_type: "rerank" in your plugin’s manifest to declare compatibility with the rerank stage.
Trade-offs
Aspect Impact Higher precision Better relevance scoring Higher latency 50-100ms per batch Limited scale Best for < 500 candidates API costs Per-document scoring