Mixpeek Logo
    Models/Speech & Audio/openai/whisper-large-v3
    HFTranscriptionapache-2.0

    whisper-large-v3

    by openai

    Robust speech recognition trained on 680K hours of multilingual audio

    6.1Mdl/month
    5,445likes
    1.55Bparams
    Identifiers
    Model ID
    openai/whisper-large-v3
    Feature URI
    mixpeek://transcription@v1/openai_whisper_large_v3

    Overview

    Whisper is a general-purpose speech recognition model trained on a massive dataset of diverse audio. It supports multilingual transcription, translation, and language identification. The large-v3 variant achieves near-human accuracy on many benchmarks.

    On Mixpeek, Whisper powers audio transcription for video and audio content, generating timestamped text that enables full-text search across spoken content.

    Architecture

    Encoder-decoder Transformer with 32 encoder layers and 32 decoder layers. Processes 30-second audio segments as 80-channel log-mel spectrograms. Uses multi-task training format with special tokens for timestamps, language, and task type.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    await mx.collections.ingest({
      collection_id: "my-collection",
      source: { url: "https://example.com/video.mp4" },
      feature_extractors: [{
        name: "audio_transcription",
        version: "v1",
        params: {
          model_id: "openai/whisper-large-v3"
        }
      }]
    });

    Capabilities

    • 99+ language transcription and translation
    • Word-level timestamps
    • Robust to background noise, accents, and domain-specific vocabulary
    • Automatic language detection

    Use Cases on Mixpeek

    Transcribe video libraries for full-text search
    Generate subtitles and closed captions at scale
    Call center analytics — search call recordings by content
    Podcast and webinar content indexing

    Specification

    FrameworkHF
    Organizationopenai
    FeatureTranscription
    Outputtext + timestamps
    Modalitiesvideo, audio
    RetrieverTranscript Search
    Parameters1.55B
    Licenseapache-2.0
    Downloads/mo6.1M
    Likes5,445

    Research Paper

    Robust Speech Recognition via Large-Scale Weak Supervision

    arxiv.org

    Build a pipeline with whisper-large-v3

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Pipeline Builder