GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior Paper • 2506.08012 • Published 4 days ago • 7
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation Paper • 2506.07530 • Published 5 days ago • 18
Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking Paper • 2505.20199 • Published 19 days ago • 2 • 2
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning Paper • 2505.16933 • Published 23 days ago • 30
LaViDa: A Large Diffusion Language Model for Multimodal Understanding Paper • 2505.16839 • Published 23 days ago • 12
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published Apr 29 • 94
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18 • 128
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners Paper • 2504.14239 • Published Apr 19 • 13