DiariZen
Collection
DiariZen is a speaker diarization toolkit driven by AudioZen and Pyannote 3.1.
•
1 item
•
Updated
This hub features the pre-trained model by DiariZen. The EEND component is built upon WavLM-Base+ and Conformer layers. The model was trained on far-field, single-channel audio data from the public datasets AMI, AISHELL-4, and AliMeeting. Please follow the instructions for before use.
from diarizen.pipelines.inference import DiariZenPipeline
# load pre-trained model
diar_pipeline = DiariZenPipeline.from_pretrained("BUT-FIT/diarizen-meeting-base")
# apply diarization pipeline
diar_results = diar_pipeline('audio.wav')
# print results
for turn, _, speaker in diar_results.itertracks(yield_label=True):
print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# load pre-trained model and save RTTM result
diar_pipeline = DiariZenPipeline.from_pretrained(
"BUT-FIT/diarizen-meeting-base",
rttm_out_dir='.'
)
# apply diarization pipeline
diar_results = diar_pipeline('audio.wav', sess_name='session_name')
Diarization Error Rate (DER):
--------------------------------------------------------------
System Collar AMI AISHELL-4 AliMeeting
--------------------------------------------------------------
Pyannote3 0s 21.1 13.9 22.8
0.25s 13.7 7.7 13.6
--------------------------------------------------------------
Proposed 0s 15.4 11.7 17.6
0.25s 9.8 5.9 10.2
--------------------------------------------------------------
@inproceedings{han2025leveraging,
title={Leveraging self-supervised learning for speaker diarization},
author={Han, Jiangyu and Landini, Federico and Rohdin, Johan and Silnova, Anna and Diez, Mireia and Burget, Luk{\'a}{\v{s}}},
booktitle={Proc. ICASSP},
year={2025}
}