LCO-Embedding-Omni-7B

by LCO-Embedding

SOTA omni-modal embedding for text, images, audio, and video in one vector space

2.1Kdl/month

7Bparams

HuggingFace Run on your data

Identifiers

Model ID

LCO-Embedding/LCO-Embedding-Omni-7B

Feature URI

mixpeek://image_extractor@v1/lco_embedding_omni_7b_v1

Overview

LCO-Embedding-Omni-7B is a language-centric omni-modal embedding model that maps text, images, audio, and video into a shared vector space. It achieves state-of-the-art on both the MIEB image embedding benchmark and MAEB audio embedding benchmark: notably reaching audio SOTA without explicit audio training data.

Built on Qwen2.5-Omni-Thinker-7B with a sentence-transformer last-token-pooling head, it demonstrates the 'Generation-Representation Scaling Law': strong generative backbones produce strong embeddings across all modalities.

Architecture

7B parameter model using Qwen2.5-Omni-Thinker as backbone. Employs last-token pooling via sentence-transformers for fixed-dimensional embeddings. Cross-modal alignment enables retrieval across modality boundaries without modality-specific heads.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "embed",
    version: "v1",
    parameters: { model_id: "mixpeek://embed@v1/lco_embedding_omni_7b_v1" },
  },
});