pinned
Running
on
T4
115
🎹
speaker diarization // speaker recognition // speaker segmentation // voice activity detection // overlapped speech detection // speaker change detection
pyannote.audio is an open-source toolkit for speaker diarization.
Pretrained pipelines reach state-of-the-art performance on most academic benchmarks.
Training is made possible thanks to Jean Zay supercomputer.
pyannoteAI provides even better and faster enterprise options, which can be tried for free on our playground.
Benchmark | v2.1 | v3.1 | pyannoteAI |
---|---|---|---|
AISHELL-4 | 14.1 | 12.2 | 11.2 |
AliMeeting (channel 1) | 27.4 | 24.4 | 19.3 |
AMI (IHM) | 18.9 | 18.8 | 15.8 |
AMI (SDM) | 27.1 | 22.4 | 19.3 |
AVA-AVD | 66.3 | 50.0 | 44.8 |
CALLHOME (part 2) | 31.6 | 28.4 | 19.8 |
DIHARD 3 (full) | 26.9 | 21.7 | 16.8 |
Earnings21 | 17.0 | 9.4 | 9.1 |
Ego4D (dev.) | 61.5 | 51.2 | 44.0 |
MSDWild | 32.8 | 25.3 | 19.8 |
RAMC | 22.5 | 22.2 | 11.1 |
REPERE (phase2) | 8.2 | 7.8 | 7.6 |
VoxConverse (v0.3) | 11.2 | 11.3 | 9.8 |
Diarization error rate (in %) |
Using high-end NVIDIA hardware,