Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 105
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation Paper • 2508.16763 • Published Aug 22, 2025 • 2
BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning Paper • 2508.09804 • Published Aug 13, 2025
Scope: Selective Cross-modal Orchestration of Visual Perception Experts Paper • 2510.12974 • Published Oct 14, 2025
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 105
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 105
WebMMU Collection WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation • 2 items • Updated Sep 16, 2025 • 2
CulturalFrames: Assessing Cultural Expectation Alignment in Text-to-Image Models and Evaluation Metrics Paper • 2506.08835 • Published Jun 10, 2025
Controlling Multimodal LLMs via Reward-guided Decoding Paper • 2508.11616 • Published Aug 15, 2025 • 7