NEWManaged multimodal retrieval.Explore platform →
    Models/Embeddings/perplexity-ai/pplx-embed-context-v1-4b
    HFText EmbeddingsMIT

    pplx-embed-context-v1-4b

    by perplexity-ai

    Contextual embedding that encodes document chunks with awareness of surrounding content

    5.2Kdl/month
    4Bparams
    Identifiers
    Model ID
    perplexity-ai/pplx-embed-context-v1-4b
    Feature URI
    mixpeek://text_extractor@v1/perplexity_pplx_embed_context_4b_v1

    Overview

    PPLX Embed Context is the first open-weight contextual embedding model. Unlike standard embedders that process each chunk in isolation, it sees the full document while encoding each chunk — so the resulting vector captures both local content and the chunk's relationship to surrounding text. This eliminates the retrieval failure mode where a chunk about 'it increased 15%' is meaningless without knowing what 'it' refers to.

    On Mixpeek, contextual embeddings improve retrieval precision for document-heavy pipelines where chunks reference earlier sections, use pronouns, or contain relative statements. One model call replaces the traditional approach of prepending context summaries to each chunk.

    Architecture

    Diffusion-pretrained Qwen3 backbone with contextual attention mechanism. 4B parameters. Encodes each chunk while attending to the full document context. Produces dense embeddings where each chunk vector encodes both local semantics and positional context within the document.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "document-collection",
    source: { url: "https://example.com/report.pdf" },
    feature_extractors: [{
    feature: "text_embeddings",
    model: "perplexity-ai/pplx-embed-context-v1-4b"
    }]
    });

    Capabilities

    • Document-aware chunk embedding (sees full document while encoding each chunk)
    • SOTA on ConTEB contextual retrieval benchmark (81.96 nDCG@10)
    • Competitive on standard MTEB retrieval (69.66 nDCG@10)
    • Eliminates pronoun/reference ambiguity in chunk embeddings
    • MIT license for unrestricted commercial use

    Use Cases on Mixpeek

    RAG pipelines: improve retrieval of ambiguous chunks that reference prior context
    Legal document search: clauses that reference definitions or earlier sections
    Technical documentation: chunks with relative references ('as described above')
    Financial reports: tables and commentary that depend on surrounding narrative

    Benchmarks

    DatasetMetricScoreSource
    ConTEB (contextual retrieval)nDCG@1081.96%Perplexity Research, 2026
    MTEB Multilingual v2 (retrieval)nDCG@1069.66%Perplexity Research, 2026

    Performance

    Input SizeText (full document context)
    Embedding Dim4096
    GPU Latency~25ms / chunk (A100)
    GPU Throughput~40 chunks/sec (A100)
    GPU Memory~9 GB

    Specification

    FrameworkHF
    Organizationperplexity-ai
    FeatureText Embeddings
    Output1024-dim vector
    Modalitiesdocument, audio
    RetrieverText Similarity
    Parameters4B
    LicenseMIT
    Downloads/mo5.2K

    Research Paper

    pplx-embed: State-of-the-Art Embedding Models for Web-Scale Retrieval

    arxiv.org

    Build a pipeline with pplx-embed-context-v1-4b

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio