This is the first VideoLLM with powerful GUI-oriented capabilities, retrained on GUI-World.
It was presented in GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents.
See Github for how to use GUI-Vid for GUI understanding tasks.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.