jina-embeddings-v5-omni-nano
by jinaai
Compact omni-modal embedding model for text, images, video, and audio in one vector space
jinaai/jina-embeddings-v5-omni-nanomixpeek://image_extractor@v1/jina_embeddings_v5_omni_nanoOverview
Jina Embeddings v5 Omni Nano is the smallest model in the Jina v5 omni family, placing text, images, video frames, and audio into a single shared vector space. At ~239M parameters, it runs efficiently on edge devices and high-throughput pipelines.
The model shares the same text embedding space as jina-v5-text, meaning existing text indexes remain backwards-compatible when adding multimodal content. This makes it the lowest-friction path to cross-modal search.
Architecture
Multimodal transformer encoder with separate input projections for text, image, video, and audio modalities. All modalities project into a shared embedding space. Matryoshka representation learning enables flexible output dimensions.
Mixpeek SDK Integration
from mixpeek import Mixpeekmx = Mixpeek(api_key="YOUR_KEY")mx.ingest(collection_id="media-library",source="s3://assets/",extractors=[{"type": "visual_embedding","model": "jinaai/jina-embeddings-v5-omni-nano","output_feature": "omni_embedding"}])
Capabilities
- Omni-modal: text, images, video, audio in one space
- Backwards-compatible with jina-v5-text indexes
- ~239M parameters for edge/high-throughput deployment
- Matryoshka dimensions for flexible storage
- Apache 2.0 license
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| Cross-modal retrieval | Recall@10 | Competitive with 677M variant | Jina AI, May 2026 |
Performance
Specification
Research Paper
Jina Embeddings v5 Omni: Multimodal Embeddings for Text, Image, Audio, and Video
arxiv.orgBuild a pipeline with jina-embeddings-v5-omni-nano
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio