ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges Paper • 2411.18932 • Published Nov 28, 2024 • 1
ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges Paper • 2411.18932 • Published Nov 28, 2024 • 1
view article Article Tiny Agents: a MCP-powered agent in 50 lines of code By julien-c • 25 days ago • 245
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 136
view article Article ✴️ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use By Ziyang and 1 other • Jan 3 • 18
view article Article ✴️ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use By Ziyang and 1 other • Jan 3 • 18
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation Paper • 2411.13281 • Published Nov 20, 2024 • 22 • 5
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation Paper • 2411.13281 • Published Nov 20, 2024 • 22
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation Paper • 2411.13281 • Published Nov 20, 2024 • 22