LCO-Embedding-Omni-7B
by LCO-Embedding
SOTA omni-modal embedding for text, images, audio, and video in one vector space
LCO-Embedding/LCO-Embedding-Omni-7Bmixpeek://image_extractor@v1/lco_embedding_omni_7b_v1Overview
LCO-Embedding-Omni-7B is a language-centric omni-modal embedding model that maps text, images, audio, and video into a shared vector space. It achieves state-of-the-art on both the MIEB image embedding benchmark and MAEB audio embedding benchmark — notably reaching audio SOTA without explicit audio training data.
Built on Qwen2.5-Omni-Thinker-7B with a sentence-transformer last-token-pooling head, it demonstrates the 'Generation-Representation Scaling Law': strong generative backbones produce strong embeddings across all modalities.
Architecture
7B parameter model using Qwen2.5-Omni-Thinker as backbone. Employs last-token pooling via sentence-transformers for fixed-dimensional embeddings. Cross-modal alignment enables retrieval across modality boundaries without modality-specific heads.
Mixpeek SDK Integration
mixpeek.ingest.from_url(url="s3://media-assets/clip.mp4",collection="media_library",feature_extractors=[{"type": "embed","model": "mixpeek://embed@v1/lco_embedding_omni_7b_v1"}])
Capabilities
- Text embedding
- Image embedding
- Audio embedding
- Video embedding
- Cross-modal retrieval
- Zero-shot classification
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| MIEB (image) | Avg Score | SOTA | Model card |
| MAEB (audio) | Avg Score | SOTA | Model card |
Performance
Common Pipeline Companions
Specification
Build a pipeline with LCO-Embedding-Omni-7B
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio