NEWManaged multimodal retrieval.Explore platform →
    Models/Speech & Audio/ibm-granite/granite-4.0-1b-speech
    HFTranscriptionApache 2.0

    granite-4.0-1b-speech

    by ibm-granite

    #1 Open ASR Leaderboard at 1B — edge-deployable multilingual transcription

    120Kdl/month
    1Bparams
    Identifiers
    Model ID
    ibm-granite/granite-4.0-1b-speech
    Feature URI
    mixpeek://transcription@v1/ibm_granite_40_1b_speech_v1

    Overview

    Granite 4.0 1B Speech is the smallest model to reach #1 on the HuggingFace Open ASR Leaderboard. At just 1B parameters, it achieves 1.42% WER on LibriSpeech Clean and 5.52% average WER across benchmarks, while running at 280x realtime factor on GPU.

    It supports English and Japanese with keyword list biasing for domain-specific vocabulary. The compact size makes it ideal for edge deployment, serverless functions, and cost-sensitive pipelines where Whisper Large v3 (1.5B) is too heavy. On Mixpeek, it serves as the default transcription model for latency-sensitive and high-volume audio processing.

    Architecture

    Compact encoder-decoder (1B parameters) optimized for throughput. Supports keyword biasing via attention-based shallow fusion. English + Japanese language support.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "my-collection",
    source: { url: "https://example.com/podcast.mp3" },
    feature_extractors: [{
    name: "transcription",
    version: "v1",
    params: {
    model_id: "ibm-granite/granite-4.0-1b-speech"
    }
    }]
    });

    Capabilities

    • #1 on HuggingFace Open ASR Leaderboard at release
    • LibriSpeech Clean WER: 1.42%
    • 280x realtime factor on GPU
    • Keyword list biasing for domain vocabulary
    • Apache 2.0 license, only 1B parameters

    Use Cases on Mixpeek

    High-volume audio transcription at minimal compute cost
    Edge ASR for mobile and embedded devices
    Serverless transcription in latency-sensitive pipelines
    Cost-efficient batch processing of large audio archives

    Benchmarks

    DatasetMetricScoreSource
    LibriSpeech CleanWER1.42%IBM, 2026 — Model Card
    Open ASR Leaderboard (avg)WER5.52%IBM, 2026 — Model Card

    Performance

    Input SizeVariable-length audio
    GPU Latency~0.21s / minute of audio (A100, RTFx 280)
    GPU Throughput~280x realtime (A100)
    GPU Memory~2.5 GB

    Specification

    FrameworkHF
    Organizationibm-granite
    FeatureTranscription
    Outputtext + timestamps
    Modalitiesvideo, audio
    RetrieverTranscript Search
    Parameters1B
    LicenseApache 2.0
    Downloads/mo120K

    Research Paper

    Granite 4.0 Speech

    arxiv.org

    Build a pipeline with granite-4.0-1b-speech

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio