Strong Vision Language Model trained with VisualWebInstruct
State-of-the-art VLM to solve multimodal reasoning problems