Extreme Super-Resolution via Scale Autoregression
Generate descriptions from images using masks
Interact with an agent to perform web-based tasks