Request evaluation for a speech model
Scalable and Versatile 3D Generation from images
Generate click coordinates from image and instruction