Qwen3-Embedding-4B

by Qwen

Top-ranked multilingual text embedding with 100+ languages and 32K context

2.6Mdl/month

298likes

4.0Bparams

HuggingFace Run on your data, free

Identifiers

Model ID

Qwen/Qwen3-Embedding-4B

Feature URI

mixpeek://text_extractor@v1/qwen3_embedding_4b_v1

Overview

Qwen3-Embedding-4B is the mid-size model in the Qwen3 Embedding family that achieves top performance on the MTEB multilingual leaderboard with a score of 69.45, excelling across text retrieval, code retrieval, classification, clustering, and bitext mining. It balances strong embedding quality with reasonable compute requirements.

On Mixpeek, Qwen3-Embedding-4B is the recommended text embedding model for production pipelines that need best-in-class multilingual retrieval quality. It powers semantic search over transcripts, documents, and extracted text across 100+ languages.

Architecture

Dense transformer built on the Qwen3 4B foundation model with the same three-stage training pipeline as the 0.6B variant: unsupervised pre-training, supervised fine-tuning, and model merging. Supports flexible embedding dimensions from 32 to 2048 via Matryoshka training and instruction-aware embedding.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "text_embedding",
    version: "v1",
    parameters: { model_id: "Qwen/Qwen3-Embedding-4B" },
  },
});

Capabilities

Top-ranked on MTEB multilingual leaderboard (69.45)
100+ language support with state-of-the-art multilingual transfer
Flexible embedding dimensions from 32 to 2048
32K token context window for long documents
Strong performance on code retrieval and classification tasks

Use Cases on Mixpeek

Production-grade multilingual semantic search across document collections

RAG pipeline embedding backend for enterprise knowledge bases

Cross-lingual document matching and deduplication at scale

Benchmarks

Dataset	Metric	Score	Source
MTEB Multilingual	Avg Score	69.45	Qwen3-Embedding paper, June 2025
MTEB Retrieval (en)	nDCG@10	Top-tier among open models	Qwen3-Embedding paper, June 2025
Code Retrieval	MRR	Best among 4B-class models	Qwen3-Embedding paper, June 2025