VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published 9 days ago • 63
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published 14 days ago • 82
IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance Paper • 2502.08395 • Published Feb 12
NeuralOS: Towards Simulating Operating Systems via Neural Generative Models Paper • 2507.08800 • Published Jul 11 • 79
POSS: Position Specialist Generates Better Draft for Speculative Decoding Paper • 2506.03566 • Published Jun 4 • 6
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs Paper • 2505.20139 • Published May 26 • 18
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design Paper • 2505.16175 • Published May 22 • 42
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping Paper • 2505.15612 • Published May 21 • 34
General-Reasoner: Advancing LLM Reasoning Across All Domains Paper • 2505.14652 • Published May 20 • 23
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12, 2024 • 72
WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries Paper • 2407.17468 • Published Jul 24, 2024
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation Paper • 2504.00043 • Published Mar 30 • 10
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation Paper • 2504.00043 • Published Mar 30 • 10
Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models Paper • 2503.12072 • Published Mar 15