NEWVectors or files. Pick a path.Start →

    What is Re-ranking

    Re-ranking - Refining search result order after initial retrieval

    A second-stage ranking process that reorders initial search results using a more computationally expensive but accurate scoring model. Re-ranking is essential for maximizing precision in multimodal retrieval pipelines where first-stage recall is prioritized over exact ordering.

    How It Works

    Re-ranking takes the top-k results from a fast first-stage retrieval system and rescores them using a more powerful model. Cross-encoder rerankers process the query and each candidate document jointly, enabling fine-grained interaction between query and document tokens. This produces more accurate relevance scores than bi-encoder models that encode query and document independently.

    Technical Details

    Cross-encoder rerankers (BGE-reranker, Cohere Rerank, ColBERT) take concatenated query-document pairs and output relevance scores. Processing is O(n) per query where n is the number of candidates to rerank. Typical rerank depths are 50-200 candidates. Latency adds 50-200ms for reranking 100 candidates. Learning-to-rank (LTR) models combine multiple features (BM25 score, semantic score, metadata) into a final ranking.

    Best Practices

    • Use a two-stage pipeline: fast retrieval (bi-encoder + ANN) then accurate reranking (cross-encoder)
    • Rerank a sufficient number of candidates (50-200) to not miss relevant results
    • Choose reranker model size based on latency budget and accuracy requirements
    • Combine multiple signals (semantic score, keyword score, recency) in learning-to-rank

    Common Pitfalls

    • Reranking too few candidates, missing relevant results that the first stage retrieved
    • Reranking too many candidates, adding unnecessary latency without improving results
    • Using a reranker trained on general data for a specialized domain without fine-tuning
    • Not measuring the marginal improvement of reranking to justify the added latency

    Advanced Tips

    • Use multimodal rerankers that jointly score text and visual features for cross-modal retrieval
    • Implement cascaded reranking with progressively more expensive models at each stage
    • Apply distillation from cross-encoder rerankers to improve bi-encoder quality for the first stage
    • Use user feedback (clicks, dwell time) to train personalized reranking models
    Managed Mixpeek

    Put multimodal search to work

    Connect a bucket and Mixpeek runs the whole multimodal search pipeline for you: extraction, indexing, and search over your own objects. No models to wire up, nothing to host.

    Start with Managed
    MVS · bring your own

    Already have vectors?

    Keep your embeddings on your own cloud and run dense, sparse, and BM25 search directly on object storage. First 1M vectors free.

    Start with MVS