A unified multimodal understanding and generation model.
Zero Shot voice cloning with llasa 3b (Unofficial Demo)
Blind vote on HF TTS models!
Generate images with Switti
Scalable and Versatile 3D Generation from images
Upgraded to v1.0!
MaskGCT TTS Demo
3D/4D Scenes from a Single Image w/ Controllable Video Diff