Generate descriptions from images using masks
Chat with an AI that understands text and images
Generate responses to video or image inputs