Convert GUI screen to structured elements
Generate spatial audio from images (and optionally text)
Greet someone by name!
Protein, molecule & more...
Generate music from text descriptions
Display OmniParser link and logo
Describe image contents with prompts
Restore degraded audio using a Transformer-based model
Segment images using text prompts