arxiv:2408.12480

Generate a video visualizing how a multimodal model attends to an image while generating text
Analyze images and answer questions
Chat with images and text
Chat with an AI model using text and an image
Chat with model about images