1 14 3

Dian Yu

yudian

https://scholar.google.com/citations?user=ERdzqyYAAAAJ&hl=en

AI & ML interests

NLP

Recent Activity

upvoted a paper 10 days ago

Complex Logical Instruction Generation

authored a paper about 1 month ago

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

authored a paper about 1 month ago

One Token to Fool LLM-as-a-Judge

View all activity

Organizations

None yet

upvoted a paper 10 days ago

Complex Logical Instruction Generation

Paper • 2508.09125 • Published 12 days ago • 38

authored 2 papers about 1 month ago

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

Paper • 2504.11456 • Published Apr 15 • 13

One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published Jul 11 • 31

upvoted a paper about 1 month ago

One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published Jul 11 • 31

commented a paper about 1 month ago

One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published Jul 11 • 31 •

liked a dataset about 1 month ago

sarosavo/Master-RM

Viewer • Updated Jul 15 • 180k • 269 • 7

liked a model about 1 month ago

sarosavo/Master-RM

Text Classification • 8B • Updated Jul 15 • 415 • 14

authored a paper 5 months ago

Expanding RL with Verifiable Rewards Across Diverse Domains

Paper • 2503.23829 • Published Mar 31 • 24

upvoted a paper 5 months ago

Expanding RL with Verifiable Rewards Across Diverse Domains

Paper • 2503.23829 • Published Mar 31 • 24

upvoted a collection 5 months ago

RLVR

Collection

Model and data for 'Expanding RL with Verifiable Rewards Across Diverse Domains' • 3 items • Updated Mar 31 • 13

authored 3 papers 5 months ago

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Paper • 2412.21187 • Published Dec 30, 2024 • 42

OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas

Paper • 2501.15427 • Published Jan 26 • 6

Improving LLM General Preference Alignment via Optimistic Online Mirror Descent

Paper • 2502.16852 • Published Feb 24

upvoted a paper 7 months ago

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 61

authored a paper 7 months ago

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 61

upvoted a paper 8 months ago

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Paper • 2412.21187 • Published Dec 30, 2024 • 42

upvoted a paper 11 months ago

DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

Paper • 2410.03864 • Published Oct 4, 2024 • 12

liked a model about 1 year ago

deepseek-ai/DeepSeek-Prover-V1.5-RL

7B • Updated Aug 29, 2024 • 6.1k • 65

upvoted a collection about 1 year ago

Reinforcement Learning (RL / RLHF)

Collection

19 items • Updated Oct 22, 2024 • 1

authored a paper about 1 year ago

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Paper • 2407.00617 • Published Jun 30, 2024 • 7

Dian Yu

AI & ML interests

Recent Activity

Organizations

yudian's activity