Reason-ModernColBERT
by lightonai
Late-interaction retriever trained for reasoning-intensive search queries
lightonai/Reason-ModernColBERTmixpeek://text_extractor@v1/lighton_reason_moderncolbert_v1Overview
Reason-ModernColBERT is a PyLate ColBERT model fine-tuned from LightOn's GTE-ModernColBERT-v1 on the ReasonIR dataset. It targets retrieval problems where the query is not a short keyword string but a reasoning-heavy prompt that requires matching evidence across paragraphs.
On Mixpeek, this makes it a useful text retrieval companion for agents. After visual, audio, or document extractors produce text evidence, Reason-ModernColBERT can retrieve passages that match an agent's intermediate reasoning state with token-level MaxSim scoring instead of collapsing each document into one dense vector.
Architecture
ModernBERT-based late-interaction retriever trained with PyLate. It maps queries and passages to sequences of 128-dimensional token vectors and scores them with MaxSim. The model supports 8,192-token documents and 128-token queries according to the model card.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "agent-evidence",source: { url: "s3://knowledge-base/extracted-text/" },feature_extractors: [{feature: "text_embeddings",model: "lightonai/Reason-ModernColBERT",params: {interaction: "late",max_document_tokens: 8192}}]});
Capabilities
- Reasoning-intensive retrieval over long passages
- Late-interaction token matching with MaxSim
- 8K-token document support
- Useful for agent queries that include context, constraints, and partial findings
- Fine-tuned on ReasonIR data from GTE-ModernColBERT-v1
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| BRIGHT | NDCG@10 | Outperforms models up to 7B | LightOn model card |
| Stack Exchange splits | NDCG@10 | +2.5 average over ReasonIR-8B | LightOn model card |
Performance
Late interaction increases index size relative to single-vector dense retrieval
Specification
Research Paper
Reason-ModernColBERT
arxiv.orgBuild a pipeline with Reason-ModernColBERT
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio