granite-speech-4.1-2b-plus
by ibm-granite
Speaker-attributed ASR — diarization, word timestamps, and keyword biasing in 2B
ibm-granite/granite-speech-4.1-2b-plusmixpeek://transcription@v1/ibm_granite_speech_41_2b_plus_v1Overview
Granite Speech 4.1 2B Plus extends the base Granite Speech model with speaker attribution, word-level timestamp alignment (38.8ms average accuracy), and keyword biasing -- all in a single 2B parameter model. Unlike pipeline approaches that chain separate ASR and diarization models, it produces speaker-labeled, timestamped transcripts in one forward pass.
With a Word Diarization Error Rate (WDER) of 0.9% on the FISHER dataset, it delivers production-grade speaker attribution. Keyword biasing lets you improve recognition of domain-specific terms (product names, technical jargon) without fine-tuning. On Mixpeek, it powers meeting transcription and call analytics pipelines where speaker identity and precise timing matter.
Architecture
Autoregressive encoder-decoder (2B parameters) with multi-task training heads for ASR, speaker attribution, and timestamp alignment. Supports keyword biasing via attention-based shallow fusion. Native vLLM serving support.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "my-collection",source: { url: "https://example.com/meeting.mp4" },feature_extractors: [{name: "transcription",version: "v1",params: {model_id: "ibm-granite/granite-speech-4.1-2b-plus",enable_diarization: true,keywords: ["Mixpeek", "RAG", "embeddings"]}}]});
Capabilities
- Joint ASR + speaker diarization in one pass
- Word-level timestamps (38.8ms average accuracy)
- Keyword biasing without fine-tuning
- WDER 0.9% on FISHER dataset
- Apache 2.0 license, vLLM-ready
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| FISHER (speaker diarization) | WDER | 0.9% | IBM, 2026 — Model Card |
| Timestamp accuracy | Mean deviation | 38.8ms | IBM, 2026 — Model Card |
Performance
Specification
Research Paper
Granite Speech 4.1: Speaker-Attributed ASR
arxiv.orgBuild a pipeline with granite-speech-4.1-2b-plus
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio