Kimi-VL-A3B-Thinking-2506
by moonshotai
Efficient MoE reasoning VLM with 2.8B activated parameters and SOTA video understanding
moonshotai/Kimi-VL-A3B-Thinking-2506mixpeek://image_extractor@v1/moonshotai_kimi_vl_a3b_v1Overview
Kimi-VL-A3B-Thinking is Moonshot AI's efficient Mixture-of-Experts vision-language model that activates only 2.8B of its 16B total parameters per forward pass. It achieves state-of-the-art video understanding among open-source models while supporting native-resolution images up to 3.2 megapixels and 131K token context.
On Mixpeek, Kimi-VL powers high-quality scene captioning, visual reasoning, and OCR extraction at a fraction of the compute cost of dense 7B+ models. Its MoE architecture makes it especially cost-effective for batch processing large video libraries.
Architecture
Mixture-of-Experts VLM: MoonViT vision encoder (native-resolution, up to 3.2M pixels) + MLP projector + Moonlight-16B-A3B MoE language decoder. 16B total / ~2.8B activated parameters. 131K max context. Long-CoT SFT + reinforcement learning with 20% reduced thinking tokens.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "video-library",source: { url: "https://example.com/training-session.mp4" },feature_extractors: [{feature: "scene_caption",model: "moonshotai/Kimi-VL-A3B-Thinking-2506"}]});
Capabilities
- SOTA video understanding for open-source (65.2 on VideoMMMU)
- Only 2.8B activated parameters (MoE efficiency)
- Native high-resolution image support up to 3.2 megapixels
- 131K token context for long documents
- Strong OCR (869 on OCRBench) and GUI grounding (91.4 on ScreenSpot-V2)
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| VideoMMMU | Accuracy | 65.2 | Moonshot AI, 2025 — arxiv:2504.07491 |
| MMMU | Pass@1 | 64.0 | Moonshot AI, 2025 — arxiv:2504.07491 |
| MathVision | Pass@1 | 56.9 | Moonshot AI, 2025 — arxiv:2504.07491 |
Performance
Specification
Research Paper
Kimi-VL Technical Report
arxiv.orgBuild a pipeline with Kimi-VL-A3B-Thinking-2506
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio