jina-embeddings-v5-text-small
by jinaai
Highest-scoring sub-1B multilingual embedding model with task-specific LoRA adapters
jinaai/jina-embeddings-v5-text-smallmixpeek://text_extractor@v1/jina_embeddings_v5_small_v1Overview
Jina Embeddings v5 Text Small is a 677M-parameter multilingual text embedding model built on the Qwen3-0.6B-Base backbone. It achieves the highest MTEB English v2 score (71.7) among all multilingual models under 1B parameters by combining embedding distillation from the larger 4B variant with four task-specific LoRA adapters for retrieval, similarity, clustering, and classification.
On Mixpeek, jina-embeddings-v5-text-small is the optimal choice for multilingual text embedding at scale, matching the retrieval quality of the 3.8B v4 model at 5.6x smaller size. Its 32K token context length and Matryoshka dimension flexibility (1024 down to 32) make it ideal for both long-document and cost-constrained pipelines across 119+ languages.
Architecture
Qwen3-0.6B-Base backbone with last-token pooling. 677M parameters. Four independent task-specific LoRA adapters (retrieval, similarity, clustering, classification) trained on frozen backbone weights. Supports 32K context via adjusted RoPE base frequencies. Matryoshka truncation from 1024 to 32 dimensions.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "my-collection",source: { url: "https://example.com/document.pdf" },feature_extractors: [{name: "text_embedding",version: "v1",params: {model_id: "jinaai/jina-embeddings-v5-text-small"}}]});
Capabilities
- 71.7 avg on MTEB English v2 (best under 1B multilingual)
- 1024-dimensional embeddings with Matryoshka truncation to 32-dim
- 32K token context length via RoPE
- 119+ language support
- Task-specific LoRA adapters for optimal per-task performance
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| MTEB English v2 (avg) | Score | 71.7 | Jina AI, 2025 — Model Card |
| MMTEB (multilingual, task-level avg) | Score | 67.0 | Jina AI, 2025 — Model Card |
| BEIR (retrieval) | nDCG@10 | 56.67 | Jina AI, 2025 — Model Card |
Performance
Specification
Research Paper
jina-embeddings-v5-text: Task-Targeted Embedding Distillation
arxiv.orgBuild a pipeline with jina-embeddings-v5-text-small
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio