Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 177
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs Paper • 2505.13529 • Published May 18 • 11
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 106
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13 • 100
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 85
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks Paper • 2407.02855 • Published Jul 3, 2024 • 13
Prompt-Driven LLM Safeguarding via Directed Representation Optimization Paper • 2401.18018 • Published Jan 31, 2024 • 1
CASE: Aligning Coarse-to-Fine Cognition and Affection for Empathetic Response Generation Paper • 2208.08845 • Published Aug 18, 2022
PsyQA: A Chinese Dataset for Generating Long Counseling Text for Mental Health Support Paper • 2106.01702 • Published Jun 3, 2021
On Large Language Models' Selection Bias in Multi-Choice Questions Paper • 2309.03882 • Published Sep 7, 2023
Exploring Prompt-based Few-shot Learning for Grounded Dialog Generation Paper • 2109.06513 • Published Sep 14, 2021
Click: Controllable Text Generation with Sequence Likelihood Contrastive Learning Paper • 2306.03350 • Published Jun 6, 2023