Mixpeek Logo
    Models/Speech & Audio/pyannote/speaker-diarization-3.1
    HFSpeaker Diarizationmit

    speaker-diarization-3.1

    by pyannote

    Who spoke when — end-to-end neural speaker diarization

    13.8Mdl/month
    1,601likes
    18Mparams
    Identifiers
    Model ID
    pyannote/speaker-diarization-3.1
    Feature URI
    mixpeek://transcription@v1/pyannote_diarization_v3

    Overview

    Pyannote's speaker diarization pipeline segments audio into speaker-homogeneous regions, determining "who spoke when" without requiring prior knowledge of the number or identity of speakers.

    On Mixpeek, speaker diarization enriches transcription data with speaker labels, enabling queries like "find all segments where Speaker A talks about budgets."

    Architecture

    End-to-end pipeline: (1) segmentation model based on PyanNet (SincNet + LSTM + feedforward), (2) embedding extraction using ECAPA-TDNN, (3) agglomerative clustering for speaker assignment. Supports overlapping speech detection.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    await mx.collections.ingest({
      collection_id: "my-collection",
      source: { url: "https://example.com/meeting.mp4" },
      feature_extractors: [{
        name: "speaker_diarization",
        version: "v1",
        params: {
          model_id: "pyannote/speaker-diarization-3.1"
        }
      }]
    });

    Capabilities

    • Automatic speaker count estimation
    • Overlapping speech detection
    • Speaker embedding extraction
    • Fine-tunable on custom speaker data

    Use Cases on Mixpeek

    Meeting transcription with speaker attribution
    Interview and podcast analysis — attribute quotes to speakers
    Call center analytics — separate agent and customer speech

    Specification

    FrameworkHF
    Organizationpyannote
    FeatureSpeaker Diarization
    Outputspeaker segments
    Modalitiesvideo, audio
    RetrieverSpeaker Filter
    Parameters18M
    Licensemit
    Downloads/mo13.8M
    Likes1,601

    Research Paper

    Powerset multi-class cross entropy loss for neural speaker diarization

    arxiv.org

    Build a pipeline with speaker-diarization-3.1

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Pipeline Builder