Voxtral-Mini-4B-Realtime-2602
by mistralai
Open-source realtime streaming speech-to-text with sub-500ms latency across 13 languages
mistralai/Voxtral-Mini-4B-Realtime-2602mixpeek://transcription@v1/mistral_voxtral_mini_4b_v1Overview
Voxtral Mini 4B Realtime is among the first open-source speech models to achieve offline-comparable accuracy with sub-500ms latency. Its natively streaming architecture pairs a causal audio encoder (~0.6B params) with a Ministral-3-based LLM decoder (~3.4B params), both using sliding window attention for constant-memory streaming inference.
On Mixpeek, Voxtral powers realtime and near-realtime transcription of audio and video content across 13 languages, with configurable latency from 240ms to 2.4s to balance speed against accuracy for live subtitling or batch processing.
Architecture
Two-component streaming architecture: (1) causal transformer audio encoder (0.6B params, 32 layers, causal attention) and (2) Ministral-3-based LLM decoder (3.4B params, 26 layers). Both use sliding window attention for streaming. Configurable transcription delay from 240ms to 2.4s.
Mixpeek SDK Integration
from mixpeek import Mixpeekmx = Mixpeek(api_key="YOUR_KEY")mx.ingest(collection_id="meeting-recordings",source="s3://audio/",extractors=[{"type": "transcription","model": "mistralai/Voxtral-Mini-4B-Realtime-2602","output_feature": "transcript"}])
Capabilities
- Realtime streaming transcription with <500ms latency
- 13 language support including English, Spanish, French, German
- Configurable latency/accuracy tradeoff (240ms-2.4s delay)
- Natively streaming architecture (no chunking workarounds)
- Apache 2.0 open-source
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| FLEURS (13 languages, 480ms) | Average WER | 8.72% | Mistral AI, Feb 2026 — Voxtral Realtime paper |
| FLEURS English (480ms) | WER | 4.90% | Mistral AI, Feb 2026 — Voxtral Realtime paper |
| FLEURS (13 languages, 2.4s) | Average WER | 6.73% | Mistral AI, Feb 2026 — Voxtral Realtime paper |
Performance
Specification
Research Paper
Voxtral Realtime
arxiv.orgBuild a pipeline with Voxtral-Mini-4B-Realtime-2602
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio