community-1 speaker diarization
	
This pipeline ingests mono audio sampled at 16kHz and outputs speaker diarization.
- stereo or multi-channel audio files are automatically downmixed to mono by averaging the channels.
- audio files sampled at a different rate are resampled to 16kHz automatically upon loading.
The main improvements brought by Community-1 are:
- improved speaker assignment and counting
- simpler reconciliation with transcription timestamps with exclusive speaker diarization
- easy offline use (i.e. without internet connection)
- (optionally) hosted on pyannoteAI cloud
Setup
- pip install pyannote.audio
- Accept user conditions
- Create access token at hf.co/settings/tokens.
Quick start
# download the pipeline from Huggingface
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-community-1", 
    token="{huggingface-token}")
# run the pipeline locally on your computer
output = pipeline("audio.wav")
# print the predicted speaker diarization 
for turn, speaker in output.speaker_diarization:
    print(f"{speaker} speaks between t={turn.start:.3f}s and t={turn.end:.3f}s")
Benchmark
Out of the box, Community-1 is much better than speaker-diarization-3.1. 
We report diarization error rates (in %) on large collection of academic benchmarks (fully automatic processing, no forgiveness collar, nor skipping overlapping speech).
| Benchmark (last updated in 2025-09) | legacy(3.1) | community-1 | precision-2 | 
|---|---|---|---|
| AISHELL-4 | 12.2 | 11.7 | 11.4 | 
| AliMeeting (channel 1) | 24.5 | 20.3 | 15.2 | 
| AMI (IHM) | 18.8 | 17.0 | 12.9 | 
| AMI (SDM) | 22.7 | 19.9 | 15.6 | 
| AVA-AVD | 49.7 | 44.6 | 37.1 | 
| CALLHOME (part 2) | 28.5 | 26.7 | 16.6 | 
| DIHARD 3 (full) | 21.4 | 20.2 | 14.7 | 
| Ego4D (dev.) | 51.2 | 46.8 | 39.0 | 
| MSDWild | 25.4 | 22.8 | 17.3 | 
| RAMC | 22.2 | 20.8 | 10.5 | 
| REPERE (phase2) | 7.9 | 8.9 | 7.4 | 
| VoxConverse (v0.3) | 11.2 | 11.2 | 8.5 | 
Precision-2 model is even better and can be tested like this:
- Create an API key on pyannoteAI dashboard (free credits included)
- Change one line of code
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
-     'pyannote/speaker-diarization-community-1', token="{huggingface-token}")
+     'pyannote/speaker-diarization-precision-2', token="{pyannoteAI-api-key}")
diarization = pipeline("audio.wav")  # runs on pyannoteAI servers
Processing on GPU
pyannote.audio pipelines run on CPU by default.
You can send them to GPU with the following lines:
import torch
pipeline.to(torch.device("cuda"))
Processing from memory
Pre-loading audio files in memory may result in faster processing:
waveform, sample_rate = torchaudio.load("audio.wav")
output = pipeline({"waveform": waveform, "sample_rate": sample_rate})
Monitoring progress
Hooks are available to monitor the progress of the pipeline:
from pyannote.audio.pipelines.utils.hook import ProgressHook
with ProgressHook() as hook:
    output = pipeline("audio.wav", hook=hook)
Controlling the number of speakers
In case the number of speakers is known in advance, one can use the num_speakers option:
output = pipeline("audio.wav", num_speakers=2)
One can also provide lower and/or upper bounds on the number of speakers using min_speakers and max_speakers options:
output = pipeline("audio.wav", min_speakers=2, max_speakers=5)
Exclusive speaker diarization
Community-1 pretrained pipeline returns a new exclusive speaker diarization, on top of the regular speaker diarization, available as output.exclusive_speaker_diarization.
This is a feature which is backported from our latest commercial model that simplifies the reconciliation between fine-grained speaker diarization timestamps and (sometimes not so precise) transcription timestamps.
Offline use
- In the terminal, copy the pipeline on disk:
# make sure git-lfs is installed (https://git-lfs.com)
git lfs install
# create a directory on disk
mkdir /path/to/directory
# when prompted for a password, use an access token with write permissions.
# generate one from your settings: https://huggingface.co/settings/tokens
git clone https://hf.co/pyannote/speaker-diarization-community-1 /path/to/directory/pyannote-speaker-diarization-community-1
- In Python, use the pipeline without internet connection:
# load pipeline from disk (works without internet connection)
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained('/path/to/directory/pyannote-speaker-diarization-community-1')
# run the pipeline locally on your computer
output = pipeline("audio.wav")
Citations
- Speaker segmentation model
@inproceedings{Plaquet23,
  author={Alexis Plaquet and Hervé Bredin},
  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}
- Speaker embedding model
@inproceedings{Wang2023,
  title={Wespeaker: A research and production oriented speaker embedding learning toolkit},
  author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
  booktitle={ICASSP 2023, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}
- Speaker clustering
@article{Landini2022,
  author={Landini, Federico and Profant, J{\'a}n and Diez, Mireia and Burget, Luk{\'a}{\v{s}}},
  title={{Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks}},
  year={2022},
  journal={Computer Speech \& Language},
}
Acknowledgment
Training and tuning made possible thanks to GENCI on the Jean Zay supercomputer.
- Downloads last month
- 44