How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients Paper • 2504.10766 • Published 1 day ago • 14
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing Paper • 2504.07964 • Published 6 days ago • 58
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? Paper • 2504.06514 • Published 7 days ago • 33
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning Paper • 2504.05520 • Published 8 days ago • 9
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base Paper • 2503.23361 • Published 17 days ago • 6
Difficulty Estimation Math Datasets Collection We perform difficulty estimation on popular math datasets. • 5 items • Updated 7 days ago • 1
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published Feb 20 • 46
Detecting and Filtering Unsafe Training Data via Data Attribution Paper • 2502.11411 • Published Feb 17 • 1
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback Paper • 2408.15549 • Published Aug 28, 2024 • 1
How Susceptible are Large Language Models to Ideological Manipulation? Paper • 2402.11725 • Published Feb 18, 2024 • 1
Can Language Model Moderators Improve the Health of Online Discourse? Paper • 2311.10781 • Published Nov 16, 2023 • 1
CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation Paper • 2310.15638 • Published Oct 24, 2023 • 1
CLIMB: A Benchmark of Clinical Bias in Large Language Models Paper • 2407.05250 • Published Jul 7, 2024 • 2
Safer-Instruct: Aligning Language Models with Automated Preference Data Paper • 2311.08685 • Published Nov 15, 2023 • 1