speaker-diarization-3.1

by pyannote

Who spoke when — end-to-end neural speaker diarization

13.8Mdl/month

1,601likes

18Mparams

HuggingFace Use in Pipeline

Identifiers

Model ID

pyannote/speaker-diarization-3.1

Feature URI

mixpeek://transcription@v1/pyannote_diarization_v3

Overview

Pyannote's speaker diarization pipeline segments audio into speaker-homogeneous regions, determining "who spoke when" without requiring prior knowledge of the number or identity of speakers.

On Mixpeek, speaker diarization enriches transcription data with speaker labels, enabling queries like "find all segments where Speaker A talks about budgets."

Architecture

End-to-end pipeline: (1) segmentation model based on PyanNet (SincNet + LSTM + feedforward), (2) embedding extraction using ECAPA-TDNN, (3) agglomerative clustering for speaker assignment. Supports overlapping speech detection.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

await mx.collections.ingest({
  collection_id: "my-collection",
  source: { url: "https://example.com/meeting.mp4" },
  feature_extractors: [{
    name: "speaker_diarization",
    version: "v1",
    params: {
      model_id: "pyannote/speaker-diarization-3.1"
    }
  }]
});