Qwen3-VL-Embedding-8B
by Qwen
#1 multimodal embedding model — unified text, image, screenshot, and video retrieval
Qwen/Qwen3-VL-Embedding-8Bmixpeek://text_extractor@v1/qwen3_vl_embed_8b_v1Overview
Qwen3-VL-Embedding-8B is a unified multimodal embedding model that projects text, images, screenshots, and video into a shared vector space. It achieves state-of-the-art results on MMEB-V2 (77.9 overall), the most comprehensive multimodal retrieval benchmark, and scores 83.3 on visual document retrieval — making it the strongest general-purpose multimodal embedding available.
Built on the Qwen3-VL vision-language backbone, it supports Matryoshka flexible dimensionality (64 to 4096), 32K context windows, and 30+ languages. On Mixpeek, it powers cross-modal retrieval where a text query can match images, screenshots, video frames, or documents in a single vector search pass.
Architecture
Qwen3-VL vision-language backbone (8B parameters) with shared projection heads for text, image, and video modalities. Uses Matryoshka Representation Learning for flexible embedding dimensions from 64 to 4096. Supports interleaved text-image input sequences up to 32K tokens.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "my-collection",source: { url: "https://example.com/slides.pdf" },feature_extractors: [{name: "image_embedding",version: "v1",params: {model_id: "Qwen/Qwen3-VL-Embedding-8B",embedding_dim: 1024}}]});
Capabilities
- Unified embeddings across text, images, video, and screenshots
- Matryoshka flexible dimensionality (64–4096)
- 32K context window for long documents and multi-frame video
- 30+ language support including CJK
- #1 on MMEB-V2 multimodal retrieval benchmark
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| MMEB-V2 (overall) | Score | 77.9 | Qwen, 2026 — MMEB-V2 Leaderboard |
| MMEB-V2 (visual doc retrieval) | Score | 83.3 | Qwen, 2026 — MMEB-V2 Leaderboard |
| MTEB Multilingual | Score | 70.58 | Qwen, 2026 — Model Card |
Performance
Specification
Research Paper
Qwen3-Embedding: Advancing Text and Multimodal Retrieval
arxiv.orgBuild a pipeline with Qwen3-VL-Embedding-8B
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio