jina-embeddings-v5-omni-small
by jinaai
True omni-modal embeddings: text, image, audio, and video in one vector space
jinaai/jina-embeddings-v5-omni-smallmixpeek://image_extractor@v1/jina_embeddings_v5_omni_smallOverview
Jina Embeddings v5 Omni Small is a 2B-parameter embedding model that accepts text, images, audio, and video as input and produces 1024-dimensional vectors in a shared embedding space. This means you can index a video, then query it with text, an image, or an audio clip -- all vectors live in the same space.
The model aligns with jina-embeddings-v5-text, so text-only queries remain high quality. It supports Matryoshka representation learning, allowing you to truncate embeddings to smaller dimensions (512, 256) with graceful quality degradation.
Architecture
Based on a multimodal encoder with separate modality-specific preprocessors feeding into a shared transformer backbone. Supports Matryoshka dimensions (1024, 512, 256). Available in GGUF format for llama.cpp deployment.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "my-collection",source: { url: "https://example.com/video.mp4" },feature_extractors: [{feature: "visual_embeddings",model: "jinaai/jina-embeddings-v5-omni-small"}]});
Capabilities
- Accepts text, images, audio, and video as embedding input
- 1024-dimensional output aligned across all modalities
- Matryoshka dimensions for size-quality tradeoff
- Compatible with jina-embeddings-v5-text vector space
- GGUF format available for edge deployment
Use Cases on Mixpeek
Specification
Research Paper
Jina Embeddings v5
arxiv.orgBuild a pipeline with jina-embeddings-v5-omni-small
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio