NEWManaged multimodal retrieval.Explore platform →
    Models/Embeddings/LCO-Embedding/LCO-Embedding-Omni-7B
    HFVisual EmbeddingsApache 2.0

    LCO-Embedding-Omni-7B

    by LCO-Embedding

    SOTA omni-modal embedding for text, images, audio, and video in one vector space

    2.1Kdl/month
    7Bparams
    Identifiers
    Model ID
    LCO-Embedding/LCO-Embedding-Omni-7B
    Feature URI
    mixpeek://image_extractor@v1/lco_embedding_omni_7b_v1

    Overview

    LCO-Embedding-Omni-7B is a language-centric omni-modal embedding model that maps text, images, audio, and video into a shared vector space. It achieves state-of-the-art on both the MIEB image embedding benchmark and MAEB audio embedding benchmark — notably reaching audio SOTA without explicit audio training data.

    Built on Qwen2.5-Omni-Thinker-7B with a sentence-transformer last-token-pooling head, it demonstrates the 'Generation-Representation Scaling Law': strong generative backbones produce strong embeddings across all modalities.

    Architecture

    7B parameter model using Qwen2.5-Omni-Thinker as backbone. Employs last-token pooling via sentence-transformers for fixed-dimensional embeddings. Cross-modal alignment enables retrieval across modality boundaries without modality-specific heads.

    Mixpeek SDK Integration

    mixpeek.ingest.from_url(
    url="s3://media-assets/clip.mp4",
    collection="media_library",
    feature_extractors=[{
    "type": "embed",
    "model": "mixpeek://embed@v1/lco_embedding_omni_7b_v1"
    }]
    )

    Capabilities

    • Text embedding
    • Image embedding
    • Audio embedding
    • Video embedding
    • Cross-modal retrieval
    • Zero-shot classification

    Use Cases on Mixpeek

    Unified multimodal search across video, audio, and text
    Cross-modal retrieval (find video by audio query)
    Single-model replacement for multiple modality-specific encoders

    Benchmarks

    DatasetMetricScoreSource
    MIEB (image)Avg ScoreSOTAModel card
    MAEB (audio)Avg ScoreSOTAModel card

    Performance

    Input SizeVariable
    GPU Latency~45ms per item (A100)
    GPU Throughput~200 items/sec batch
    GPU MemoryModel dependent

    Specification

    FrameworkHF
    OrganizationLCO-Embedding
    FeatureVisual Embeddings
    Output768-dim vector
    Modalitiesvideo, image
    RetrieverVector Search
    Parameters7B
    LicenseApache 2.0
    Downloads/mo2.1K

    Build a pipeline with LCO-Embedding-Omni-7B

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio