On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation Paper • 2603.22117 • Published 4 days ago • 23
WorldCache: Content-Aware Caching for Accelerated Video World Models Paper • 2603.22286 • Published 4 days ago • 4
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published 10 days ago • 104
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published 10 days ago • 131
XSkill: Continual Learning from Experience and Skills in Multimodal Agents Paper • 2603.12056 • Published 15 days ago • 32
WebVR: Benchmarking Multimodal LLMs for WebPage Recreation from Videos via Human-Aligned Visual Rubrics Paper • 2603.13391 • Published 17 days ago • 19
\$OneMillion-Bench: How Far are Language Agents from Human Experts? Paper • 2603.07980 • Published 19 days ago • 27
Proact-VL: A Proactive VideoLLM for Real-Time AI Companions Paper • 2603.03447 • Published 24 days ago • 37
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios Paper • 2602.23166 • Published 29 days ago • 44
How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities Paper • 2603.02578 • Published 25 days ago • 25
RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper • 2603.01562 • Published 26 days ago • 61
K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model Paper • 2602.19128 • Published Feb 22 • 7
DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning Paper • 2602.11089 • Published Feb 11 • 18
CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion Paper • 2602.10999 • Published Feb 11 • 10
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning Paper • 2602.10560 • Published Feb 11 • 30
Code2World: A GUI World Model via Renderable Code Generation Paper • 2602.09856 • Published Feb 10 • 201
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning Paper • 2602.08234 • Published Feb 9 • 72
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published Feb 2 • 117