Bingxiang He

hbx

https://hbx-hbx.github.io/

AI & ML interests

NLP

Recent Activity

upvoted a paper 6 days ago

SSRL: Self-Search Reinforcement Learning

upvoted a paper 12 days ago

UserBench: An Interactive Gym Environment for User-Centric Agents

liked a model 18 days ago

openbmb/MiniCPM-V-4

View all activity

Organizations

None yet

upvoted a paper 6 days ago

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published 10 days ago • 87

upvoted a paper 12 days ago

UserBench: An Interactive Gym Environment for User-Centric Agents

Paper • 2507.22034 • Published 26 days ago • 29

liked a model 18 days ago

openbmb/MiniCPM-V-4

Image-Text-to-Text • 4B • Updated 12 days ago • 11k • 456

upvoted a paper 3 months ago

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9 • 90

upvoted a collection 3 months ago

MiniCPM4

Collection

MiniCPM4: Ultra-Efficient LLMs on End Devices • 22 items • Updated 17 days ago • 72

upvoted a paper 3 months ago

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published May 28 • 129

upvoted 2 papers 4 months ago

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16 • 46

authored a paper 7 months ago

Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published Feb 3 • 62

upvoted a paper 7 months ago

Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published Feb 3 • 62

liked a model 8 months ago

PRIME-RL/Eurus-2-7B-PRIME

Text Generation • 8B • Updated Feb 19 • 1.11k • 62

upvoted an article 8 months ago

Article

Process Reinforcement through Implicit Rewards

and 1 other •

Jan 3

• 29

upvoted a paper 9 months ago

Free Process Rewards without Process Labels

Paper • 2412.01981 • Published Dec 2, 2024 • 35

upvoted a collection over 1 year ago

Eurus

Collection

Advancing LLM Reasoning Generalists with Preference Trees • 11 items • Updated 17 days ago • 25

updated a dataset over 1 year ago

hbx/IN3

Viewer • Updated Feb 20, 2024 • 1.37k • 30 • 7

updated a model over 1 year ago

hbx/Mistral-Interact

Text Generation • Updated Feb 20, 2024 • 8 • 3

updated a dataset over 1 year ago

hbx/IN3-interaction

Viewer • Updated Feb 20, 2024 • 2.53k • 25 • 3

liked a model over 1 year ago

hbx/Mistral-Interact

Text Generation • Updated Feb 20, 2024 • 8 • 3

Bingxiang He

AI & ML interests

Recent Activity

Organizations

hbx's activity

Process Reinforcement through Implicit Rewards