jina-embeddings-v4

by jinaai

Universal multimodal multilingual embeddings with task-specific LoRA adapters

615Kdl/month

529likes

3.8Bparams

HuggingFace Run on your data

Identifiers

Model ID

jinaai/jina-embeddings-v4

Feature URI

mixpeek://image_extractor@v1/jina_embeddings_v4

Overview

Jina Embeddings v4 is a 3.8B-parameter multimodal embedding model built on the Qwen2.5-VL-3B-Instruct backbone. It unifies text and image representations through a shared pathway, supporting both single-vector (2048-dim, truncatable to 128) and multi-vector (128-dim per token) output modes for late-interaction retrieval.

Three task-specific LoRA adapters (60M parameters each) optimize performance for retrieval, text-matching, and code search without modifying the frozen backbone. On Mixpeek, jina-embeddings-v4 powers cross-modal search across documents with tables, charts, and mixed-media content, excelling where visual layout matters as much as text.

Architecture

Qwen2.5-VL-3B-Instruct backbone with vision encoder for image-to-token conversion. Dual output modes: single-vector (2048-dim via mean pooling) and multi-vector (128-dim per token via projection layers). Three frozen LoRA adapters (60M each) for retrieval, text-matching, and code search tasks.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "visual_embedding",
    version: "v1",
    parameters: { model_id: "jinaai/jina-embeddings-v4" },
  },
});

Capabilities

Multimodal: text and image in a shared embedding space
2048-dimensional single-vector or 128-dim multi-vector output
Task-specific LoRA adapters for retrieval, matching, and code
Matryoshka dimensions (2048 down to 128)
Strong on visually rich documents: tables, charts, diagrams

Use Cases on Mixpeek

Cross-modal document retrieval where layout and visuals matter (charts, infographics)

Multilingual semantic search across mixed-media collections

Code search and retrieval with the dedicated code LoRA adapter

Benchmarks

Dataset	Metric	Score	Source
MTEB-en (text retrieval)	nDCG@10	55.97	Jina AI, 2025: jina-embeddings-v4 paper
CLIP Benchmark (cross-modal)	Score	84.11	Jina AI, 2025: jina-embeddings-v4 paper
LongEmbed	Score	67.11	Jina AI, 2025: jina-embeddings-v4 paper