Mixpeek Logo
    Models/Embeddings/google/siglip-base-patch16-224
    HFVisual Embeddingsapache-2.0

    siglip-base-patch16-224

    by google

    Sigmoid Loss for Language Image Pre-Training — efficient contrastive learning

    1.2Mdl/month
    78likes
    203Mparams
    Identifiers
    Model ID
    google/siglip-base-patch16-224
    Feature URI
    mixpeek://image_extractor@v1/google_siglip_base_v1

    Overview

    SigLIP replaces CLIP's softmax-based contrastive loss with a simple pairwise sigmoid loss, enabling more efficient training on larger batch sizes without requiring a global normalization step.

    On Mixpeek, SigLIP offers a lighter-weight alternative to CLIP for visual embedding extraction, with comparable accuracy on many benchmarks while being faster to run at inference time.

    Architecture

    Vision Transformer (ViT-B/16) with 12 layers, 768-dim hidden size, 12 attention heads. Uses sigmoid contrastive loss instead of softmax, eliminating the need for large batch normalization.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    await mx.collections.ingest({
      collection_id: "my-collection",
      source: { url: "https://example.com/image.jpg" },
      feature_extractors: [{
        name: "image_embedding",
        version: "v1",
        params: {
          model_id: "google/siglip-base-patch16-224"
        }
      }]
    });

    Capabilities

    • Efficient contrastive image-text learning
    • 768-dimensional dense vector embeddings
    • Lower memory footprint than CLIP ViT-L
    • Strong zero-shot classification performance

    Use Cases on Mixpeek

    High-throughput visual indexing of large image catalogs
    Real-time visual similarity for recommendation engines
    Lightweight embedding extraction for edge deployments

    Specification

    FrameworkHF
    Organizationgoogle
    FeatureVisual Embeddings
    Output768-dim vector
    Modalitiesvideo, image
    RetrieverVector Search
    Parameters203M
    Licenseapache-2.0
    Downloads/mo1.2M
    Likes78

    Research Paper

    Sigmoid Loss for Language Image Pre-Training

    arxiv.org

    Build a pipeline with siglip-base-patch16-224

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Pipeline Builder