jina-embeddings-v4
by jinaai
Universal multimodal multilingual embeddings with task-specific LoRA adapters
jinaai/jina-embeddings-v4mixpeek://image_extractor@v1/jina_embeddings_v4Overview
Jina Embeddings v4 is a 3.8B-parameter multimodal embedding model built on the Qwen2.5-VL-3B-Instruct backbone. It unifies text and image representations through a shared pathway, supporting both single-vector (2048-dim, truncatable to 128) and multi-vector (128-dim per token) output modes for late-interaction retrieval.
Three task-specific LoRA adapters (60M parameters each) optimize performance for retrieval, text-matching, and code search without modifying the frozen backbone. On Mixpeek, jina-embeddings-v4 powers cross-modal search across documents with tables, charts, and mixed-media content, excelling where visual layout matters as much as text.
Architecture
Qwen2.5-VL-3B-Instruct backbone with vision encoder for image-to-token conversion. Dual output modes: single-vector (2048-dim via mean pooling) and multi-vector (128-dim per token via projection layers). Three frozen LoRA adapters (60M each) for retrieval, text-matching, and code search tasks.
Mixpeek SDK Integration
from mixpeek import Mixpeekmx = Mixpeek(api_key="YOUR_KEY")mx.ingest(collection_id="mixed-media-docs",source="s3://reports/",extractors=[{"type": "visual_embedding","model": "jinaai/jina-embeddings-v4","output_feature": "multimodal_embedding"}])
Capabilities
- Multimodal: text and image in a shared embedding space
- 2048-dimensional single-vector or 128-dim multi-vector output
- Task-specific LoRA adapters for retrieval, matching, and code
- Matryoshka dimensions (2048 down to 128)
- Strong on visually rich documents: tables, charts, diagrams
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| MTEB-en (text retrieval) | nDCG@10 | 55.97 | Jina AI, 2025 — jina-embeddings-v4 paper |
| CLIP Benchmark (cross-modal) | Score | 84.11 | Jina AI, 2025 — jina-embeddings-v4 paper |
| LongEmbed | Score | 67.11 | Jina AI, 2025 — jina-embeddings-v4 paper |
Performance
Specification
Research Paper
jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval
arxiv.orgBuild a pipeline with jina-embeddings-v4
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio