Generate a talking face video from an image and audio
Generates a sound effect that matches video shot