Generate voice from text using a reference audio
Generate realistic audio from text descriptions
A Step Towards Music Generation Foundation Model