NEWWhy single embeddings fail for video.Read the post →
    Models/Embeddings/google/embeddinggemma-300m
    HFText EmbeddingsGemma License

    embeddinggemma-300m

    by google

    Google's efficient multilingual text embedding -- 100+ languages in 300M parameters

    1.5Mdl/month
    308Mparams
    Identifiers
    Model ID
    google/embeddinggemma-300m
    Feature URI
    mixpeek://text_extractor@v1/google_embeddinggemma_300m_v1

    Overview

    EmbeddingGemma is Google's compact text embedding model based on the Gemma 3 architecture, designed for deployment on edge devices while maintaining strong multilingual performance. At 308M parameters, it runs in under 200MB of RAM when quantized and achieves sub-22ms inference on EdgeTPU.

    Despite its small size, it ranks as the highest-performing open multilingual text embedding model under 500M parameters on MTEB. It supports Matryoshka representations (768 and 128 dimensions) for flexible memory/quality tradeoffs. On Mixpeek, it provides a lightweight embedding option for high-throughput text indexing where GPU resources are limited.

    Architecture

    Gemma 3 decoder backbone, 308M parameters. 2048 token context. Matryoshka dimensions: 768 (full) and 128 (compressed). Distilled from a larger teacher model. Optimized for TPU/mobile inference.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "my-collection",
    source: { url: "https://example.com/articles.json" },
    feature_extractors: [{
    name: "text_embedding",
    version: "v1",
    params: {
    model_id: "google/embeddinggemma-300m"
    }
    }]
    });

    Capabilities

    • 100+ language support
    • Matryoshka dimension reduction (768/128)
    • Sub-200MB quantized footprint
    • EdgeTPU and mobile-optimized
    • 2048 token context

    Use Cases on Mixpeek

    On-device semantic search in mobile applications
    High-throughput text indexing at scale
    Multilingual document retrieval
    Lightweight embedding for resource-constrained deployments

    Benchmarks

    DatasetMetricScoreSource
    MTEB (multilingual avg)Score64.1Google, 2025 -- Model Card

    Performance

    Input SizeUp to 2048 tokens
    GPU Latency~3ms / passage (A100)
    GPU Throughput~3000 passages/sec (A100, batch 128)
    GPU Memory~0.6 GB

    Specification

    FrameworkHF
    Organizationgoogle
    FeatureText Embeddings
    Output1024-dim vector
    Modalitiesdocument, audio
    RetrieverText Similarity
    Parameters308M
    LicenseGemma License
    Downloads/mo1.5M

    Research Paper

    EmbeddingGemma

    arxiv.org

    Build a pipeline with embeddinggemma-300m

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio