NEWManaged multimodal retrieval.Explore platform →
    Models/Embeddings/nvidia/NV-Embed-v2
    HFText EmbeddingsCC-BY-NC-4.0

    NV-Embed-v2

    by nvidia

    Top-ranked 7B text embedding model on MTEB English benchmark

    1.5Mdl/month
    7Bparams
    Identifiers
    Model ID
    nvidia/NV-Embed-v2
    Feature URI
    mixpeek://text_extractor@v1/nvidia_nv_embed_v2

    Overview

    NV-Embed-v2 is NVIDIA's 7B-parameter text embedding model that held the #1 position on the MTEB English benchmark with a score of 72.31. It uses a latent attention layer to remove the mean token pooling bottleneck and applies a two-stage contrastive training recipe: first on retrieval datasets, then on a blend of retrieval plus non-retrieval tasks (classification, clustering, STS).

    On Mixpeek, NV-Embed-v2 is the highest-accuracy text embedder available for English-dominant workloads. Its 7B parameter count delivers superior quality for knowledge bases, legal corpora, and technical documentation where recall matters more than latency.

    Architecture

    Mistral-7B decoder backbone with a learned latent attention pooling layer replacing mean pooling. 7B parameters. Two-stage instruction-tuned contrastive training with causal attention masks removed during embedding.

    Mixpeek SDK Integration

    from mixpeek import Mixpeek
    mixpeek = Mixpeek(api_key="YOUR_API_KEY")
    mixpeek.ingest.documents(
    collection="knowledge_base",
    source={"type": "s3", "bucket": "enterprise-docs"},
    pipeline={
    "embedding": {
    "model": "mixpeek://text_extractor@v1/nvidia_nv_embed_v2"
    }
    }
    )

    Capabilities

    • 72.31 average on MTEB English benchmark (former #1)
    • 4096-dimensional embeddings
    • Strong on retrieval, classification, clustering, and STS tasks simultaneously
    • Instruction-tuned for task-specific query formatting

    Use Cases on Mixpeek

    High-accuracy enterprise knowledge base search
    Legal and compliance document retrieval where recall is critical
    Technical documentation search across large codebases
    Academic literature retrieval and citation matching

    Benchmarks

    DatasetMetricScoreSource
    MTEB English (avg)Score72.31Model card
    MTEB RetrievalNDCG@1062.84Model card

    Performance

    Input SizeVariable
    GPU Latency~18ms per passage (A100)
    GPU Throughput~250 passages/sec (A100, batch 32)
    GPU Memory~14 GB

    Specification

    FrameworkHF
    Organizationnvidia
    FeatureText Embeddings
    Output1024-dim vector
    Modalitiesdocument, audio
    RetrieverText Similarity
    Parameters7B
    LicenseCC-BY-NC-4.0
    Downloads/mo1.5M

    Research Paper

    Model paper or technical report

    arxiv.org

    Build a pipeline with NV-Embed-v2

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio