Mixpeek Logo
    Models/Speech & Audio/facebook/wav2vec2-large-960h
    HFTranscriptionapache-2.0

    wav2vec2-large-960h

    by facebook

    Self-supervised speech representations for automatic speech recognition

    19Kdl/month
    34likes
    317Mparams
    Identifiers
    Model ID
    facebook/wav2vec2-large-960h
    Feature URI
    mixpeek://transcription@v1/facebook_wav2vec2_large_v1

    Overview

    Wav2Vec 2.0 learns speech representations from raw audio through self-supervised pre-training, then fine-tunes with a small amount of labeled data. The 960h variant is fine-tuned on the full LibriSpeech dataset.

    On Mixpeek, Wav2Vec2 provides an alternative to Whisper for English transcription, with strong performance on clear speech and a smaller memory footprint.

    Architecture

    CNN feature encoder (7 convolutional layers) followed by a 24-layer Transformer. Self-supervised pre-training uses contrastive loss over quantized speech representations. Fine-tuned with CTC loss.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    await mx.collections.ingest({
      collection_id: "my-collection",
      source: { url: "https://example.com/podcast.mp3" },
      feature_extractors: [{
        name: "audio_transcription",
        version: "v1",
        params: {
          model_id: "facebook/wav2vec2-large-960h"
        }
      }]
    });

    Capabilities

    • Self-supervised pre-training on unlabeled audio
    • Strong English ASR performance
    • Raw waveform input (no spectrogram needed)
    • Efficient fine-tuning with limited labeled data

    Use Cases on Mixpeek

    English-focused transcription workflows
    Low-resource language adaptation with limited training data
    Audio content indexing for search and discovery

    Specification

    FrameworkHF
    Organizationfacebook
    FeatureTranscription
    Outputtext + timestamps
    Modalitiesvideo, audio
    RetrieverTranscript Search
    Parameters317M
    Licenseapache-2.0
    Downloads/mo19K
    Likes34

    Research Paper

    wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

    arxiv.org

    Build a pipeline with wav2vec2-large-960h

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Pipeline Builder