Can it be used for speaker identification task as standalone model?
Hi there, great work on speech cloning. But can this (embedding model) be used for speaker diarization task separately (without needing larger models)?
+1
@Navanit-AI what? Are you saying you've same question or are you saying "yes it can be used as standalone model for speaker diarization"?
Yes they were trained for speaker identification. We sourced them from here https://github.com/VoxBlink2/ScriptsForVoxBlink2/tree/main/asv
Thank you @gabrielclark3330 +1
@supercharge19
I am also working on speaker diarization, so how do you use this model as in using pyanote or what.
If possible can you just give me reference links.
Thank you
I don't like pyanote.
I've not found time to try this model, however, I believe like text vectors it creates vectors for small audio patches (tokens if you prefer). So, use it like one.
My plan though: Create clusters of audio chunks of different sizes. Compare, if match then it is good.
I have to create audio diarization between 2 people, mainly an agent and user so thinking of stuff but not able to find anything good
While working on this project, before any good model was present, I created my own with only 60% accuracy, which is not substantial, however, nemo model was good but that is resource heavy, this model however, is small and I am sure can be wonderful and if you follow nemo pipeline (but use this model instead of their) it would be good start. Try this and share results.
Thank you for sharing.
Yes,I have tried 3 frameworks nemo, speech brain and pyannote and sorry to say all of them has their negative points and are not upto the mark. Have to think about this fine tune stuff only.
Only problem for me with nemo was it requires Nvidia GPU, otherwise (on cpu) it will not work on large files. Speechbrain, I forgot what was it doing, why I left it. And I hate the other one, so wrote my own models but sadly never got beyond 60% accuracy on them. This looks promising, but I can only test it when I am free (probably two months later).