Qwen3-Embedding-8B
by Qwen
#1 multilingual text embedding — 100+ languages, 32K context, instruction-tuned
Qwen/Qwen3-Embedding-8Bmixpeek://text_extractor@v1/qwen3_embedding_8b_v1Overview
Qwen3-Embedding-8B is the flagship text embedding model from the Qwen3 family, achieving state-of-the-art results on the MTEB Multilingual leaderboard (70.58). It supports 100+ languages with instruction-tuned task conditioning, meaning you can prefix queries with task descriptions to optimize retrieval for specific use cases.
The 32K context window handles full documents, long articles, and code files without truncation. On Mixpeek, it serves as the backbone for text-only retrieval pipelines — embedding transcripts, document text, metadata, and code for semantic search.
Architecture
Decoder-only transformer (Qwen3 backbone, 8B parameters) fine-tuned for embedding via instruction-tuned contrastive learning. Produces dense vectors up to 4096 dimensions with Matryoshka support. 32K context window with RoPE position encoding.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "my-collection",source: { url: "https://example.com/report.pdf" },feature_extractors: [{name: "text_embedding",version: "v1",params: {model_id: "Qwen/Qwen3-Embedding-8B",embedding_dim: 1024}}]});
Capabilities
- 100+ language support with native multilingual training
- 32K context window for full-document embedding
- Instruction-tuned task conditioning for query optimization
- Matryoshka flexible dimensionality (64–4096)
- #1 on MTEB Multilingual benchmark
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| MTEB Multilingual | Score | 70.58 | Qwen, 2026 — MTEB Leaderboard |
| MTEB English | Score | 72.3 | Qwen, 2026 — Model Card |
Performance
Specification
Research Paper
Qwen3-Embedding: Advancing Text and Multimodal Retrieval
arxiv.orgBuild a pipeline with Qwen3-Embedding-8B
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio