1 12 3

Haitao Mi

haitaominlp

https://scholar.google.com.sg/citations?user=G3OMbFSm858C&hl=en

AI & ML interests

Large Language Models

Recent Activity

upvoted a paper about 13 hours ago

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

upvoted a paper 20 days ago

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

upvoted a paper 20 days ago

One Token to Fool LLM-as-a-Judge

View all activity

Organizations

upvoted a paper about 13 hours ago

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

Paper • 2508.00414 • Published 4 days ago • 51

upvoted 2 papers 20 days ago

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Paper • 2507.10532 • Published 22 days ago • 83

One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published 25 days ago • 31

upvoted a paper 26 days ago

Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving

Paper • 2507.06804 • Published 28 days ago • 15

upvoted a paper 3 months ago

Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity

Paper • 2505.11107 • Published May 16 • 29

authored a paper 3 months ago

MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation

Paper • 2505.10962 • Published May 16 • 8

upvoted 2 papers 3 months ago

MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation

Paper • 2505.10962 • Published May 16 • 8

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18 • 134

liked a dataset 4 months ago

zwhe99/DeepMath-103K

Viewer • Updated May 29 • 103k • 4.55k • 207

upvoted 2 papers 4 months ago

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

Paper • 2504.11456 • Published Apr 15 • 13

Expanding RL with Verifiable Rewards Across Diverse Domains

Paper • 2503.23829 • Published Mar 31 • 24

liked a dataset 4 months ago

virtuoussy/Math-RLVR

Viewer • Updated Apr 16 • 782k • 71 • 9

authored a paper 4 months ago

Expanding RL with Verifiable Rewards Across Diverse Domains

Paper • 2503.23829 • Published Mar 31 • 24

liked a Space 6 months ago

2.94k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

authored a paper 6 months ago

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 61

authored a paper 10 months ago

Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning

Paper • 2410.06508 • Published Oct 9, 2024 • 11

upvoted 2 papers 10 months ago

Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning

Paper • 2410.06508 • Published Oct 9, 2024 • 11

DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

Paper • 2410.03864 • Published Oct 4, 2024 • 12

authored a paper 10 months ago

HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows

Paper • 2409.17433 • Published Sep 25, 2024 • 9

upvoted a paper 10 months ago