WorldSeek-Omni-2B-Preview
by WorldSeek-AI
Compact any-to-any omni model for text, image, video, and audio perception
WorldSeek-AI/WorldSeek-Omni-2B-Previewmixpeek://video_extractor@v1/worldseek_omni_2b_preview_v1Overview
WorldSeek Omni 2B Preview is a compact any-to-any model that combines text, image, video, and audio inputs. It is built from Qwen language and ASR components and is positioned for multimodal understanding rather than a single isolated extraction task.
On Mixpeek, it is relevant for agent perception workflows that need one compact model to inspect a retrieved image, listen to a clip, or reason over a short video segment before deciding the next tool call.
Architecture
Transformer-based any-to-any model with Qwen and Qwen3-ASR base components. The model card lists text, image, video, and audio tags with Apache 2.0 licensing.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "agent-observations",source: { url: "https://example.com/support-call-with-screen-share.mp4" },feature_extractors: [{feature: "scene_caption",model: "WorldSeek-AI/WorldSeek-Omni-2B-Preview"}]});
Capabilities
- Text, image, video, and audio input support
- Compact 2B-class omni model
- Any-to-any task framing
- Apache 2.0 license
Use Cases on Mixpeek
Specification
Research Paper
WorldSeek Omni 2B Preview
arxiv.orgBuild a pipeline with WorldSeek-Omni-2B-Preview
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio