Mixpeek Logo

    What is Cross-Encoder

    Cross-Encoder - Joint query-document encoding model for precise relevance scoring

    A cross-encoder is a neural model that jointly processes a query and document together through a single transformer encoder to produce a relevance score. By allowing full attention between query and document tokens, cross-encoders achieve the highest accuracy in relevance ranking but at a significant computational cost that limits their use to reranking small candidate sets.

    How It Works

    A cross-encoder takes the concatenation of a query and document as input, separated by a special token (e.g., [SEP]). The combined input is processed through a transformer model (typically BERT), and a classification head on top of the [CLS] token outputs a relevance score (0 to 1). Because all attention layers see both the query and document simultaneously, the model captures fine-grained interactions between specific query terms and document passages. This full cross-attention is what gives cross-encoders their superior accuracy.

    Technical Details

    Cross-encoders are typically fine-tuned from pretrained language models (BERT, RoBERTa, DeBERTa) using binary cross-entropy or margin-based ranking losses on labeled query-document pairs. The input format is '[CLS] query [SEP] document [SEP]' with a maximum total length of 512 tokens (or longer with extended-context models). Inference requires one forward pass per query-document pair, making it O(n) per query where n is the number of candidate documents. This cost restricts cross-encoders to reranking 100-1000 candidates retrieved by a faster first-stage method.

    Best Practices

    • Use cross-encoders exclusively as rerankers on top of a fast first-stage retriever (BM25, dense retrieval)
    • Limit reranking to the top 100-500 candidates to keep latency under control
    • Fine-tune on domain-specific query-document pairs for the best relevance accuracy
    • Use distillation from cross-encoders to improve the quality of your first-stage bi-encoder
    • Consider DeBERTa-v3 as the backbone for better cross-attention performance than BERT

    Common Pitfalls

    • Trying to use a cross-encoder for initial retrieval over a full corpus, which is prohibitively slow
    • Not truncating long documents to fit the model's context window, causing silent information loss
    • Using a cross-encoder trained on one domain (e.g., web search) for a very different domain (e.g., medical) without fine-tuning
    • Ignoring the latency impact of reranking too many candidates in a production setting
    • Confusing cross-encoder scores with calibrated probabilities without proper calibration

    Advanced Tips

    • Use cross-encoder scores as soft labels to distill knowledge into faster bi-encoder or ColBERT models
    • Implement batched inference with GPU parallelism to rerank 100+ candidates in under 50ms
    • Apply cross-encoders for data labeling and evaluation when ground-truth relevance labels are expensive
    • Explore lightweight cross-encoder architectures (TinyBERT, MiniLM) for lower-latency reranking
    • Use ensemble cross-encoders (averaging scores from multiple models) for the highest reranking accuracy