NEWManaged multimodal retrieval.Explore platform →
    Models/Embeddings/lightonai/GTE-ModernColBERT-v1
    HFText EmbeddingsApache-2.0

    GTE-ModernColBERT-v1

    by lightonai

    Late interaction retrieval model with record-breaking long-context performance

    119Kdl/month
    149Mparams
    Identifiers
    Model ID
    lightonai/GTE-ModernColBERT-v1
    Feature URI
    mixpeek://text_extractor@v1/lighton_gte_moderncolbert_v1

    Overview

    GTE-ModernColBERT-v1 is a ColBERT-style late interaction retrieval model built on the ModernBERT architecture. Instead of compressing an entire document into a single vector, it produces 128-dimensional embeddings for every token, then scores query-document pairs using MaxSim — for each query token, find the best-matching document token and sum the scores. This token-level matching preserves fine-grained detail that single-vector models lose.

    The model's standout capability is long-context retrieval. On the LongEmbed benchmark (documents up to 32K tokens), it scores 88.39 mean — roughly 10 points above the previous state of the art. It also outperforms ColBERT-small on BEIR while supporting documents up to 32K tokens natively. Trained in just 15K steps on MS MARCO using LightOn's PyLate library, it demonstrated that the ModernBERT + ColBERT recipe produces competitive results with minimal training compute.

    Architecture

    ModernBERT encoder (from Alibaba-NLP/gte-modernbert-base) with a linear projection layer (768 → 128 dimensions, no bias, no activation). Produces per-token 128-dim embeddings. Default query length 32 tokens, document length up to 32K tokens. Scoring via MaxSim operator. Trained with knowledge distillation on MS MARCO using PyLate.

    Mixpeek SDK Integration

    from mixpeek import Mixpeek
    mx = Mixpeek(api_key="YOUR_KEY")
    # Index documents with late interaction embeddings for precision retrieval
    mx.ingest(
    collection_id="knowledge-base",
    source="s3://documents/",
    extractors=[
    {
    "type": "text_embedding",
    "model": "lightonai/GTE-ModernColBERT-v1",
    "output_feature": "colbert_tokens"
    },
    {
    "type": "text_embedding",
    "model": "BAAI/bge-m3",
    "output_feature": "dense_embedding"
    }
    ]
    )

    Capabilities

    • Late interaction retrieval with per-token 128-dim embeddings
    • Long-context support up to 32K tokens (tested to 32,768)
    • 88.39 mean on LongEmbed benchmark (~10 points above prior SOTA)
    • 54.75 NDCG@10 on BEIR — outperforms ColBERT-small
    • Apache 2.0 license, reproducible training with PyLate

    Use Cases on Mixpeek

    Precision retrieval for entity-rich queries in Mixpeek multi-stage pipelines
    Long-document search where single-vector compression loses detail
    Second-stage rescoring after dense retrieval for factoid and exact-match queries

    Benchmarks

    DatasetMetricScoreSource
    BEIR (15 datasets)NDCG@1054.75LightOn, 2025 — Model Card
    LongEmbed (32K context)Mean Score88.39LightOn, 2025 — Blog Post
    NanoBEIRNDCG@1067.58LightOn, 2025 — Model Card

    Performance

    Input SizeUp to 32,768 tokens (default 300, extensible)
    Embedding Dim128 per token
    GPU Latency~12ms / document (A100, 300 tokens)
    GPU Throughput~800 documents/sec (A100, batch 64)
    GPU Memory~0.6 GB

    Specification

    FrameworkHF
    Organizationlightonai
    FeatureText Embeddings
    Output1024-dim vector
    Modalitiesdocument, audio
    RetrieverText Similarity
    Parameters149M
    LicenseApache-2.0
    Downloads/mo119K

    Research Paper

    LightOn Releases GTE-ModernColBERT, First SOTA Late-Interaction Model Trained on PyLate

    arxiv.org

    Build a pipeline with GTE-ModernColBERT-v1

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio