BidirLM-Omni-2.5B-Embedding
by BidirLM
Bidirectional omni-modal encoder for text, images, and audio in a shared vector space
BidirLM/BidirLM-Omni-2.5B-Embeddingmixpeek://image_extractor@v1/bidirlm_omni_25b_v1Overview
BidirLM-Omni-2.5B-Embedding is a 2.5B parameter bidirectional embedding model that encodes text, images, and audio into a shared 2048-dimensional vector space. Based on Qwen3 with custom bidirectional attention (replacing the standard causal mask), it achieves state-of-the-art results on MTEB Multilingual V2, MIEB (image), and MAEB (audio) benchmarks simultaneously — making it one of the first models to top leaderboards across all three modalities. Supports 119+ languages with 32K context.
Architecture
Modified Qwen3-2.5B with bidirectional attention replacing causal attention for encoding tasks. Modality-specific input adapters project images (via CLIP-style patches) and audio (via mel-spectrogram frames) into the same token space as text. Mean pooling over the final hidden states produces 2048-dimensional embeddings. The bidirectional attention is critical — causal LLM attention degrades embedding quality because later tokens can't attend to earlier ones.
Mixpeek SDK Integration
from mixpeek import Mixpeekmx = Mixpeek(api_key="YOUR_KEY")mx.ingest.videos(source="s3://media/mixed-content/",collection="omni_search",feature_extractors=[{"name": "visual_embeddings","model": "BidirLM/BidirLM-Omni-2.5B-Embedding","params": {"modalities": ["text", "image", "audio"], "dim": 2048}}])
Capabilities
- Unified text, image, and audio embeddings in shared vector space
- Cross-modal retrieval (text query → image/audio results and vice versa)
- 119+ language support for multilingual text embedding
- 32K context window for long document embedding
- State-of-the-art across MTEB, MIEB, and MAEB simultaneously
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| MTEB Multilingual V2 | Mean Score | SOTA at 2.5B scale | Text embedding benchmark |
| MIEB | Mean Score | SOTA at 2.5B scale | Image embedding benchmark |
| MAEB | Mean Score | SOTA at 2.5B scale | Audio embedding benchmark |
Performance
Specification
Build a pipeline with BidirLM-Omni-2.5B-Embedding
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio