Mixpeek Logo
    Models/Embeddings/BAAI/bge-large-en-v1.5
    HFText Embeddingsmit

    bge-large-en-v1.5

    by BAAI

    BAAI General Embedding — state-of-the-art text retrieval

    5.8Mdl/month
    631likes
    335Mparams
    Identifiers
    Model ID
    BAAI/bge-large-en-v1.5
    Feature URI
    mixpeek://text_extractor@v1/baai_bge_large_v1

    Overview

    BGE (BAAI General Embedding) is a family of text embedding models that achieve top performance on the MTEB benchmark. The large-en-v1.5 variant produces 1024-dimensional embeddings optimized for English text retrieval and semantic similarity.

    On Mixpeek, BGE powers text-based semantic search over extracted text content — transcriptions, captions, OCR results, and document text.

    Architecture

    BERT-Large architecture (24 layers, 1024-dim hidden, 16 attention heads) with task-specific training using contrastive learning on curated text pairs. Uses [CLS] token pooling with optional instruction prefix.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    await mx.collections.ingest({
      collection_id: "my-collection",
      source: { url: "https://example.com/report.pdf" },
      feature_extractors: [{
        name: "text_embedding",
        version: "v1",
        params: {
          model_id: "BAAI/bge-large-en-v1.5"
        }
      }]
    });

    Capabilities

    • 1024-dimensional dense text embeddings
    • Top-ranked on MTEB retrieval benchmarks
    • Instruction-aware embedding with task prefixes
    • Optimized for asymmetric retrieval (query vs. passage)

    Use Cases on Mixpeek

    Semantic search over transcribed audio/video content
    Document similarity and deduplication
    RAG pipeline embedding backend
    Cross-document concept matching

    Specification

    FrameworkHF
    OrganizationBAAI
    FeatureText Embeddings
    Output1024-dim vector
    Modalitiesdocument, audio
    RetrieverText Similarity
    Parameters335M
    Licensemit
    Downloads/mo5.8M
    Likes631

    Research Paper

    C-Pack: Packaged Resources To Advance General Chinese Embedding

    arxiv.org

    Build a pipeline with bge-large-en-v1.5

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Pipeline Builder