NEWManaged multimodal retrieval.Explore platform →
    Models/Embeddings/lightonai/Agent-ModernColBERT
    HFText EmbeddingsApache 2.0

    Agent-ModernColBERT

    by lightonai

    150M late-interaction retriever optimized for agentic reasoning traces

    1.9Kdl/month
    150Mparams
    Identifiers
    Model ID
    lightonai/Agent-ModernColBERT
    Feature URI
    mixpeek://text_extractor@v1/lighton_agent_moderncolbert_v1

    Overview

    Agent-ModernColBERT is a 150M parameter late-interaction retrieval model from LightOn, specifically trained on agentic retrieval data where queries contain reasoning traces alongside the search intent. Built on ModernBERT architecture via PyLate, it achieves 72.53% accuracy on BrowseComp-Plus — exceeding configurations using GPT-5 + Qwen3-8B despite being 26x smaller than AgentIR-4B. This makes it uniquely suited for AI agent tool-use pipelines where the query is a chain-of-thought reasoning trace, not a clean user query.

    Architecture

    ModernBERT-based late-interaction model trained with PyLate on AgentIR data. Uses per-token 128-dim embeddings with MaxSim scoring, like ColBERT. The key innovation is training on reasoning trace + query pairs, so the model learns to extract search intent from noisy agentic context — function calls, intermediate thoughts, and partial conclusions.

    Mixpeek SDK Integration

    from mixpeek import Mixpeek
    mx = Mixpeek(api_key="YOUR_KEY")
    mx.ingest.documents(
    source="s3://knowledge-base/",
    collection="agent_kb",
    feature_extractors=[{
    "name": "text_embeddings",
    "model": "lightonai/Agent-ModernColBERT",
    "params": {"interaction": "late", "dim": 128}
    }]
    )

    Capabilities

    • Retrieval from AI agent reasoning traces (not just clean queries)
    • Late-interaction scoring for fine-grained token matching
    • Tiny model footprint (150M) with outsized agentic performance
    • Compatible with standard ColBERT indexing and serving
    • Strong zero-shot transfer to general retrieval tasks

    Use Cases on Mixpeek

    MCP tool-use pipelines where agents search during reasoning
    RAG systems where the query is an agent's chain-of-thought
    Agentic web browsing and research workflows
    Multi-step retrieval where context accumulates across steps

    Benchmarks

    DatasetMetricScoreSource
    BrowseComp-PlusAccuracy72.53%Exceeds GPT-5 + Qwen3-8B setup
    AgentIRRetrieval AccCompetitive with 4B modelsAt 150M params (26x smaller)

    Performance

    Input SizeVariable
    GPU LatencyInput dependent
    GPU Throughput~2000 documents/sec (A100, batch 128)
    GPU Memory~0.4 GB

    Specification

    FrameworkHF
    Organizationlightonai
    FeatureText Embeddings
    Output1024-dim vector
    Modalitiesdocument, audio
    RetrieverText Similarity
    Parameters150M
    LicenseApache 2.0
    Downloads/mo1.9K

    Build a pipeline with Agent-ModernColBERT

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio