jina-embeddings-v5-omni-small

by jinaai

True omni-modal embeddings: text, image, audio, and video in one vector space

84Kdl/month

105likes

1.6Bparams

HuggingFace Run on your data

Identifiers

Model ID

jinaai/jina-embeddings-v5-omni-small

Feature URI

mixpeek://image_extractor@v1/jina_embeddings_v5_omni_small

Overview

Jina Embeddings v5 Omni Small is a 2B-parameter embedding model that accepts text, images, audio, and video as input and produces 1024-dimensional vectors in a shared embedding space. This means you can index a video, then query it with text, an image, or an audio clip -- all vectors live in the same space.

The model aligns with jina-embeddings-v5-text, so text-only queries remain high quality. It supports Matryoshka representation learning, allowing you to truncate embeddings to smaller dimensions (512, 256) with graceful quality degradation.

Architecture

Based on a multimodal encoder with separate modality-specific preprocessors feeding into a shared transformer backbone. Supports Matryoshka dimensions (1024, 512, 256). Available in GGUF format for llama.cpp deployment.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "visual_embeddings",
    version: "v1",
    parameters: { model_id: "jinaai/jina-embeddings-v5-omni-small" },
  },
});