Zyphra
/

Zonos-v0.1-speaker-embedding

Model card Files Files and versions Community

Can it be used for speaker identification task as standalone model?

by supercharge19 - opened 11 days ago

11 days ago

Hi there, great work on speech cloning. But can this (embedding model) be used for speaker diarization task separately (without needing larger models)?

Navanit-AI

10 days ago

supercharge19

9 days ago

@Navanit-AI what? Are you saying you've same question or are you saying "yes it can be used as standalone model for speaker diarization"?

gabrielclark3330

Zyphra org 9 days ago

Yes they were trained for speaker identification. We sourced them from here https://github.com/VoxBlink2/ScriptsForVoxBlink2/tree/main/asv

supercharge19

9 days ago

Thank you @gabrielclark3330 +1

supercharge19 changed discussion status to closed 9 days ago

Navanit-AI

8 days ago

@supercharge19 I am also working on speaker diarization, so how do you use this model as in using pyanote or what.
If possible can you just give me reference links.
Thank you

supercharge19

8 days ago

I don't like pyanote.

I've not found time to try this model, however, I believe like text vectors it creates vectors for small audio patches (tokens if you prefer). So, use it like one.

My plan though: Create clusters of audio chunks of different sizes. Compare, if match then it is good.

Navanit-AI

7 days ago

I have to create audio diarization between 2 people, mainly an agent and user so thinking of stuff but not able to find anything good

supercharge19

6 days ago

While working on this project, before any good model was present, I created my own with only 60% accuracy, which is not substantial, however, nemo model was good but that is resource heavy, this model however, is small and I am sure can be wonderful and if you follow nemo pipeline (but use this model instead of their) it would be good start. Try this and share results.

Navanit-AI

6 days ago

Thank you for sharing.
Yes,I have tried 3 frameworks nemo, speech brain and pyannote and sorry to say all of them has their negative points and are not upto the mark. Have to think about this fine tune stuff only.

supercharge19

5 days ago

Only problem for me with nemo was it requires Nvidia GPU, otherwise (on cpu) it will not work on large files. Speechbrain, I forgot what was it doing, why I left it. And I hate the other one, so wrote my own models but sadly never got beyond 60% accuracy on them. This looks promising, but I can only test it when I am free (probably two months later).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment