Pre-Trained Policy Discriminators are General Reward Models Paper • 2507.05197 • Published 11 days ago • 36
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better Paper • 2506.09040 • Published Jun 10 • 36
GeometryZero: Improving Geometry Solving for LLM with Group Contrastive Policy Optimization Paper • 2506.07160 • Published Jun 8 • 3
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles Paper • 2505.19914 • Published May 26 • 44
TimeArena: Shaping Efficient Multitasking Language Agents in a Time-Aware Simulation Paper • 2402.05733 • Published Feb 8, 2024
SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals Paper • 2406.04784 • Published Jun 7, 2024 • 2
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 134
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning Paper • 2504.13914 • Published Apr 10 • 3
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles Paper • 2505.19914 • Published May 26 • 44
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models Paper • 2505.07591 • Published May 12 • 11
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models Paper • 2505.02735 • Published May 5 • 32
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs Paper • 2504.15415 • Published Apr 21 • 22
LLM-DA: Data Augmentation via Large Language Models for Few-Shot Named Entity Recognition Paper • 2402.14568 • Published Feb 22, 2024 • 1
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search Paper • 2504.09130 • Published Apr 12 • 12
TransMamba: Flexibly Switching between Transformer and Mamba Paper • 2503.24067 • Published Mar 31 • 21
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published Apr 7 • 44
Implicit Reasoning in Transformers is Reasoning through Shortcuts Paper • 2503.07604 • Published Mar 10 • 23