I'm releasing the speech version of Gemma-3!

#46
by junnei - opened

Hi, Thanks for your cool project!
I'm here to share my project I've been working on - a speech version of Gemma-3!

Here is Model : Gemma-3-MM

I've applied a Speech Adapter to the Gemma-3 model, inspired by the Phi-4-multimodal-instruct approach. The performance is pretty interesting - for example, CER/WER of 4.47/8.49 in Covost2 ASR (English) and a BLEU score of 29.83 for zero-shot English-Korean AST.

Surprisingly, when no speech data is provided, the model functions exactly like the original, with no impact on performance.

I'd love to hear your thoughts on this! Any tips or suggestions for further improvements?

And, I wanted to check about the Gemma 3 license. Given that this is an adaptation of the original model, are there any potential issues I should be aware of regarding the use and sharing of this speech version? Just want to make sure I'm staying within the guidelines.

Looking forward to your feedback!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment