Running on Zero 15 Explainable-Vision-Language-Model 🥶 Generate a video visualizing how a model attends to an image while generating text