MiniMax-M3

by MiniMaxAI

Agent-native MoE vision-language model with native video understanding at 1M context

200Kdl/month

428B (23B active)params

HuggingFace Run on your data, free

Identifiers

Model ID

MiniMaxAI/MiniMax-M3

Feature URI

mixpeek://video_extractor@v1/minimax_m3_vl_v1

Overview

MiniMax-M3 is a sparse mixture-of-experts vision-language model, about 428B total parameters with roughly 23B active per token, trained natively on text, images, and video from the start rather than bolting vision onto a text LLM. Its headline trick is MiniMax Sparse Attention (MSA), which cuts per-token attention compute to about 1/20 of dense attention and delivers 9x prefill and 15x decode speedups at a 1M-token context, so it can reason over long videos and multi-document sessions in one pass.

On Mixpeek, MiniMax-M3 is a strong scene-understanding extractor for video and image collections: it produces grounded descriptions, answers questions about frames, and drives agentic pipelines where an agent inspects footage, decides what matters, and stores the result as searchable metadata. Its long context makes it a fit for whole-clip understanding rather than isolated frames.

Architecture

Sparse MoE transformer, ~428B total / ~23B active parameters, natively multimodal (text, image, video). MiniMax Sparse Attention (MSA) reduces attention compute and memory so the model sustains a 1M-token context with large prefill/decode speedups over dense attention. Custom modeling code; served via Transformers.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "scene_description",
    version: "v1",
    parameters: { model_id: "MiniMaxAI/MiniMax-M3" },
  },
});