Convert vocals to match reference audio
Generate 3D texture from texts
Generate 3D texture from image
Generate audio from podcast scripts