Are you going to release the training code as well (either pretraining or training from scratch)?

by arpitsh018 - opened Apr 22

Discussion

arpitsh018

Apr 22

Are you going to release the training code as well (either pretraining or training from scratch)?

Impulse2000

Apr 22

•

edited Apr 22

I would like to see a technical report, most companies who do open weight AI dont publish their training pipeline.

A detailed technical report would be great for contributing to the research.

Addendum: the linked SoundStorm paper in the model card header tells us what influenced there work.

Also, quote: "Our work was heavily inspired by SoundStorm, Parakeet, and Descript Audio Codec"

Gives us some clues for now.

wanchichen

Apr 22

•

edited Apr 22

When glancing at the code, you tell that Dia generally follows the Parakeet recipe:

Trained on labeled conversational audio with [S1] and [S2], like Parakeet
Input speech is tokenized into Descript Audio Codec tokens, with 9 streams
Transformer backbone is an encoder-decoder architecture
- text is input into encoder
- with multi-stream decoder input via some form of delay mechanism.
- probably not initialized from a pre-trained LLM (?), since they use RoPE embeddings and cross-attention. Not 100% sure about this one.

Karayakar

Apr 23

Hello,
I have created training code, not sure %100. but training continues. Code can inspire someone.

https://github.com/nari-labs/dia/issues/48

NariLabs

Nari Labs org Apr 23

•

edited Apr 23

Thanks for all the interest. We do not have plans of releasing the training code yet, but will providing a brief technical report for those interested! but what wanchichen said is generally correct - we train a Parakeet style encoder-decoder from scratch.

NariLabs changed discussion status to closed Apr 23

Karayakar

Apr 25

Whenever I hear or saw Style2 TTS it is a messy and not working projects. WTF....
That is why I don't work on Style2TTS. Style2 TTS is just a research project nothing works, spend 1 month to train with no luck...

Try OprheusTTS or OuteTTS much better easy to train... Works as expected. I am not going to spend my time on bullshit half projects.
If you see Style2TTS go far away... it is just a demo some idi... prepared...

https://github.com/anan235/dia-multilingual/issues/5
https://github.com/nari-labs/dia/issues/48

Karayakar

Apr 25

Thanks for all the interest. We do not have plans of releasing the training code yet, but will providing a brief technical report for those interested! but what wanchichen said is generally correct - we train a Parakeet style encoder-decoder from scratch.

Whatever related to STYLE2 TTS just a fake demo, no training code at all, even STYLE2 TTS code by itself is broken... I have spend a month... STYLE2 TTS is not a project. it is hobby toy.. Don't release fake demos...

Viewegger

Apr 25

https://github.com/stlohrey/dia-finetuning

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment