KBLab/kb-whisper-large · Diarization

26 days ago

Hello,

Thanks for providing! Are there any plans to support diarzation (speaker change recognition)?

I tried the latest GGML checkpoint with Whisper.cpp and it has a flag -tdrz which should diarize, but I think the model needs to support it.

fredde1

19 days ago

would also be very interested in this!! (Y)

Lauler

National Library of Sweden / KBLab org 10 days ago

•

edited 10 days ago

Hi

We will not support diarization as part of the Whisper model's inherent functionality. As far as I can see the support in Whisper.cpp is quite experimental and limited to only the English version of whisper.small.

If you want to run our models with diarization I recommend a multi step pipeline. WhisperX's README has an example how to achieve this with the WhisperX library. Load our Whisper and Wav2vec2 as the first 2 steps (transcription and alignment), and then run the diarization step as described in WhisperX.

Lauler changed discussion status to closed 10 days ago