NEWManaged multimodal retrieval.Explore platform →
    Models/Embeddings/nomic-ai/nomic-embed-multimodal-3b
    HFVisual EmbeddingsModel card

    nomic-embed-multimodal-3b

    by nomic-ai

    3B visual-document retriever for text queries over screenshots, pages, and image-heavy documents

    12Kdl/month
    3Bparams
    Identifiers
    Model ID
    nomic-ai/nomic-embed-multimodal-3b
    Feature URI
    mixpeek://image_extractor@v1/nomic_embed_multimodal_3b_v1

    Overview

    Nomic Embed Multimodal 3B is a visual-document retrieval model built on Qwen2.5-VL-3B. It is trained for text-to-visual-document retrieval, where the indexed unit is the rendered page or screenshot rather than OCR text alone.

    On Mixpeek, it is useful when an agent needs to search document pages that contain charts, forms, tables, product screenshots, or other information that does not survive plain text extraction.

    Architecture

    PEFT adapter on Qwen2.5-VL-3B-Instruct for visual-document retrieval. The model is aligned for queries against page images across English, Italian, French, German, and Spanish content.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "visual-docs",
    source: { url: "https://example.com/annual-report.pdf" },
    feature_extractors: [{
    feature: "multimodal_embedding",
    model: "nomic-ai/nomic-embed-multimodal-3b"
    }]
    });

    Capabilities

    • Text-to-visual-document retrieval
    • Multilingual page retrieval across five documented languages
    • Works on screenshots and rendered pages where OCR can lose layout
    • Fits between lightweight CLIP-style retrieval and larger VLM reranking

    Use Cases on Mixpeek

    Search page screenshots by natural language when charts or layout matter
    Retrieve slide or PDF pages for an agent before detailed VLM reasoning
    Find visually similar product, invoice, or report pages without a separate OCR-only path

    Specification

    FrameworkHF
    Organizationnomic-ai
    FeatureVisual Embeddings
    Output768-dim vector
    Modalitiesvideo, image
    RetrieverVector Search
    Parameters3B
    LicenseModel card
    Downloads/mo12K

    Research Paper

    Nomic Embed Multimodal 3B

    arxiv.org

    Build a pipeline with nomic-embed-multimodal-3b

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio