Are you going to release the training code as well (either pretraining or training from scratch)?
Are you going to release the training code as well (either pretraining or training from scratch)?
I would like to see a technical report, most companies who do open weight AI dont publish their training pipeline.
A detailed technical report would be great for contributing to the research.
Addendum: the linked SoundStorm paper in the model card header tells us what influenced there work.
Also, quote: "Our work was heavily inspired by SoundStorm, Parakeet, and Descript Audio Codec"
Gives us some clues for now.
When glancing at the code, you tell that Dia generally follows the Parakeet recipe:
- Trained on labeled conversational audio with [S1] and [S2], like Parakeet
- Input speech is tokenized into Descript Audio Codec tokens, with 9 streams
- Transformer backbone is an encoder-decoder architecture
- text is input into encoder
- with multi-stream decoder input via some form of delay mechanism.
- probably not initialized from a pre-trained LLM (?), since they use RoPE embeddings and cross-attention. Not 100% sure about this one.
Hello,
I have created training code, not sure %100. but training continues. Code can inspire someone.
https://github.com/nari-labs/dia/issues/48
Thanks for all the interest. We do not have plans of releasing the training code yet, but will providing a brief technical report for those interested! but what wanchichen said is generally correct - we train a Parakeet style encoder-decoder from scratch.
Whenever I hear or saw Style2 TTS it is a messy and not working projects. WTF....
That is why I don't work on Style2TTS. Style2 TTS is just a research project nothing works, spend 1 month to train with no luck...
Try OprheusTTS or OuteTTS much better easy to train... Works as expected. I am not going to spend my time on bullshit half projects.
If you see Style2TTS go far away... it is just a demo some idi... prepared...
https://github.com/anan235/dia-multilingual/issues/5
https://github.com/nari-labs/dia/issues/48
Thanks for all the interest. We do not have plans of releasing the training code yet, but will providing a brief technical report for those interested! but what wanchichen said is generally correct - we train a Parakeet style encoder-decoder from scratch.
Whatever related to STYLE2 TTS just a fake demo, no training code at all, even STYLE2 TTS code by itself is broken... I have spend a month... STYLE2 TTS is not a project. it is hobby toy.. Don't release fake demos...