A second-stage ranking process that reorders initial search results using a more computationally expensive but accurate scoring model. Re-ranking is essential for maximizing precision in multimodal retrieval pipelines where first-stage recall is prioritized over exact ordering.
Re-ranking takes the top-k results from a fast first-stage retrieval system and rescores them using a more powerful model. Cross-encoder rerankers process the query and each candidate document jointly, enabling fine-grained interaction between query and document tokens. This produces more accurate relevance scores than bi-encoder models that encode query and document independently.
Cross-encoder rerankers (BGE-reranker, Cohere Rerank, ColBERT) take concatenated query-document pairs and output relevance scores. Processing is O(n) per query where n is the number of candidates to rerank. Typical rerank depths are 50-200 candidates. Latency adds 50-200ms for reranking 100 candidates. Learning-to-rank (LTR) models combine multiple features (BM25 score, semantic score, metadata) into a final ranking.