WorldSeek-Omni-2B-Preview

by WorldSeek-AI

Compact any-to-any omni model for text, image, video, and audio perception

21dl/month

2B previewparams

HuggingFace Run on your data

Identifiers

Model ID

WorldSeek-AI/WorldSeek-Omni-2B-Preview

Feature URI

mixpeek://video_extractor@v1/worldseek_omni_2b_preview_v1

Overview

WorldSeek Omni 2B Preview is a compact any-to-any model that combines text, image, video, and audio inputs. It is built from Qwen language and ASR components and is positioned for multimodal understanding rather than a single isolated extraction task.

On Mixpeek, it is relevant for agent perception workflows that need one compact model to inspect a retrieved image, listen to a clip, or reason over a short video segment before deciding the next tool call.

Architecture

Transformer-based any-to-any model with Qwen and Qwen3-ASR base components. The model card lists text, image, video, and audio tags with Apache 2.0 licensing.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "scene_caption",
    version: "v1",
    parameters: { model_id: "WorldSeek-AI/WorldSeek-Omni-2B-Preview" },
  },
});