colqwen-omni-v0.1
by vidore
Omnimodal ColBERT retrieval for documents, audio, and video search
vidore/colqwen-omni-v0.1mixpeek://image_extractor@v1/vidore_colqwen_omni_v1Overview
ColQwen Omni extends the ColPali paradigm to all modalities — documents, audio, and video — using ColBERT-style multi-vector representations built on Qwen2.5-Omni-3B. Unlike dense single-vector models, multi-vector retrieval preserves fine-grained token-level matching, delivering higher precision on complex queries.
On Mixpeek, ColQwen Omni powers late-interaction retrieval across document pages, audio recordings, and video content. Its zero-shot audio retrieval (no audio training data needed) makes it especially useful for indexing podcasts, meetings, and lecture recordings alongside visual content.
Architecture
Qwen2.5-Omni-3B-Instruct fine-tuned for ColBERT-style multi-vector output. Dynamic image resolution (max 1024 patches). Audio/video towers frozen during training — audio retrieval is zero-shot. Trained with colpali-engine 0.3.11 on 127K query-page pairs.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "mixed-media",source: { url: "https://example.com/podcast.mp3" },feature_extractors: [{feature: "multimodal_embedding",model: "vidore/colqwen-omni-v0.1"}]});
Capabilities
- ColBERT-style multi-vector retrieval across all modalities
- Zero-shot audio retrieval without audio training data
- Dynamic image resolution up to 1024 patches
- 30-minute podcast embedded in under 10 seconds
- Fine-grained token-level matching for complex queries
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| ViDoRe V1 (visual doc) | nDCG@5 | ~90% | Vidore Blog, 2025 |
Performance
Specification
Research Paper
ColPali: Efficient Document Retrieval with Vision Language Models
arxiv.orgBuild a pipeline with colqwen-omni-v0.1
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio