DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published 10 days ago • 108
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published 8 days ago • 45
PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models Paper • 2503.12545 • Published 12 days ago • 5
Aligning Multimodal LLM with Human Preference: A Survey Paper • 2503.14504 • Published 10 days ago • 21
MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification Paper • 2503.12505 • Published 13 days ago • 9
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization Paper • 2503.12937 • Published 12 days ago • 27
BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing Paper • 2503.13434 • Published 11 days ago • 24
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Paper • 2503.07459 • Published 18 days ago • 15
MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System Paper • 2503.09600 • Published 16 days ago • 4
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published 18 days ago • 80
Quantization for OpenAI's Whisper Models: A Comparative Analysis Paper • 2503.09905 • Published 16 days ago • 6
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published Feb 19 • 68
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? Paper • 2502.12115 • Published Feb 17 • 43
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents Paper • 2502.11357 • Published Feb 17 • 10
System Message Generation for User Preferences using Open-Source Models Paper • 2502.11330 • Published Feb 17 • 15
OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning Paper • 2502.11271 • Published Feb 16 • 16