`community-1` speaker diarization

This pipeline ingests mono audio sampled at 16kHz and outputs speaker diarization.

stereo or multi-channel audio files are automatically downmixed to mono by averaging the channels.
audio files sampled at a different rate are resampled to 16kHz automatically upon loading.

The main improvements brought by Community-1 are:

improved speaker assignment and counting
simpler reconciliation with transcription timestamps with exclusive speaker diarization
easy offline use (i.e. without internet connection)
(optionally) hosted on pyannoteAI cloud

Setup

pip install pyannote.audio
Accept user conditions
Create access token at hf.co/settings/tokens.

Quick start

# download the pipeline from Huggingface
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-community-1", 
    token="{huggingface-token}")

# run the pipeline locally on your computer
output = pipeline("audio.wav")

# print the predicted speaker diarization 
for turn, speaker in output.speaker_diarization:
    print(f"{speaker} speaks between t={turn.start:.3f}s and t={turn.end:.3f}s")

Benchmark

Out of the box, Community-1 is much better than speaker-diarization-3.1.

We report diarization error rates (in %) on large collection of academic benchmarks (fully automatic processing, no forgiveness collar, nor skipping overlapping speech).

Benchmark (last updated in 2025-09)	`legacy` (3.1)	`community-1`	`precision-2`
AISHELL-4	12.2	11.7	11.4
AliMeeting (channel 1)	24.5	20.3	15.2
AMI (IHM)	18.8	17.0	12.9
AMI (SDM)	22.7	19.9	15.6
AVA-AVD	49.7	44.6	37.1
CALLHOME (part 2)	28.5	26.7	16.6
DIHARD 3 (full)	21.4	20.2	14.7
Ego4D (dev.)	51.2	46.8	39.0
MSDWild	25.4	22.8	17.3
RAMC	22.2	20.8	10.5
REPERE (phase2)	7.9	8.9	7.4
VoxConverse (v0.3)	11.2	11.2	8.5

Precision-2 model is even better and can be tested like this:

Create an API key on pyannoteAI dashboard (free credits included)
Change one line of code

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
-     'pyannote/speaker-diarization-community-1', token="{huggingface-token}")
+     'pyannote/speaker-diarization-precision-2', token="{pyannoteAI-api-key}")
diarization = pipeline("audio.wav")  # runs on pyannoteAI servers

Processing on GPU

pyannote.audio pipelines run on CPU by default. You can send them to GPU with the following lines:

import torch
pipeline.to(torch.device("cuda"))

Processing from memory

Pre-loading audio files in memory may result in faster processing:

waveform, sample_rate = torchaudio.load("audio.wav")
output = pipeline({"waveform": waveform, "sample_rate": sample_rate})

Monitoring progress

Hooks are available to monitor the progress of the pipeline:

from pyannote.audio.pipelines.utils.hook import ProgressHook
with ProgressHook() as hook:
    output = pipeline("audio.wav", hook=hook)

Controlling the number of speakers

In case the number of speakers is known in advance, one can use the num_speakers option:

output = pipeline("audio.wav", num_speakers=2)

One can also provide lower and/or upper bounds on the number of speakers using min_speakers and max_speakers options:

output = pipeline("audio.wav", min_speakers=2, max_speakers=5)

Exclusive speaker diarization

Community-1 pretrained pipeline returns a new exclusive speaker diarization, on top of the regular speaker diarization, available as output.exclusive_speaker_diarization.

This is a feature which is backported from our latest commercial model that simplifies the reconciliation between fine-grained speaker diarization timestamps and (sometimes not so precise) transcription timestamps.

Offline use

In the terminal, copy the pipeline on disk:

# make sure git-lfs is installed (https://git-lfs.com)
git lfs install

# create a directory on disk
mkdir /path/to/directory

# when prompted for a password, use an access token with write permissions.
# generate one from your settings: https://huggingface.co/settings/tokens
git clone https://hf.co/pyannote/speaker-diarization-community-1 /path/to/directory/pyannote-speaker-diarization-community-1

In Python, use the pipeline without internet connection:

# load pipeline from disk (works without internet connection)
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained('/path/to/directory/pyannote-speaker-diarization-community-1')

# run the pipeline locally on your computer
output = pipeline("audio.wav")

Citations

Speaker segmentation model

@inproceedings{Plaquet23,
  author={Alexis Plaquet and Hervé Bredin},
  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}

Speaker embedding model

@inproceedings{Wang2023,
  title={Wespeaker: A research and production oriented speaker embedding learning toolkit},
  author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
  booktitle={ICASSP 2023, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

Speaker clustering

@article{Landini2022,
  author={Landini, Federico and Profant, J{\'a}n and Diez, Mireia and Burget, Luk{\'a}{\v{s}}},
  title={{Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks}},
  year={2022},
  journal={Computer Speech \& Language},
}

Acknowledgment

Training and tuning made possible thanks to GENCI on the Jean Zay supercomputer.

Downloads last month: 44

community-1 speaker diarization