2 8 5

garyzhang

xiaoniqiu

garyzhang99

AI & ML interests

LLM, Agents

Recent Activity

updated a dataset 6 days ago

datajuicer/geometry_sft

published a dataset 6 days ago

datajuicer/geometry_sft

upvoted a paper 23 days ago

Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends

View all activity

Organizations

updated a dataset 6 days ago

datajuicer/geometry_sft

Viewer • Updated 6 days ago • 300 • 7

published a dataset 6 days ago

datajuicer/geometry_sft

Viewer • Updated 6 days ago • 300 • 7

upvoted a paper 23 days ago

Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends

Paper • 2509.24203 • Published Sep 29 • 7

upvoted an article about 1 month ago

Article

Gaia2 and ARE: Empowering the community to study agents

Sep 22

• 116

updated a dataset about 1 month ago

datajuicer/Trinity-ToolAce-SFT-split

Viewer • Updated Sep 19 • 498 • 19

published a dataset about 1 month ago

datajuicer/Trinity-ToolAce-SFT-split

Viewer • Updated Sep 19 • 498 • 19

updated a dataset about 1 month ago

datajuicer/Trinity-ToolAce-RL-split

Viewer • Updated Sep 19 • 4.93k • 25

published a dataset about 1 month ago

datajuicer/Trinity-ToolAce-RL-split

Viewer • Updated Sep 19 • 4.93k • 25

commented 2 papers 2 months ago

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

Paper • 2508.11408 • Published Aug 15 • 8 •

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

Paper • 2508.11408 • Published Aug 15 • 8 •

authored a paper 2 months ago

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

Paper • 2508.11408 • Published Aug 15 • 8

liked a dataset 2 months ago

Jarrodbarnes/arc-loan-underwriting-trinity-rft-v2

Viewer • Updated Jun 29 • 200 • 32 • 4

upvoted a paper 2 months ago

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

Paper • 2508.11408 • Published Aug 15 • 8

upvoted a paper 5 months ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2 • 185

upvoted a paper 7 months ago

Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute

Paper • 2503.23803 • Published Mar 31 • 8

liked 2 datasets 11 months ago

nebius/SWE-agent-trajectories

Viewer • Updated Dec 23, 2024 • 80k • 568 • 65

nebius/SWE-bench-extra

Viewer • Updated May 28 • 6.38k • 109 • 45

upvoted a paper about 1 year ago

Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

Paper • 2405.05904 • Published May 9, 2024 • 6

upvoted 2 papers over 1 year ago

Very Large-Scale Multi-Agent Simulation in AgentScope

Paper • 2407.17789 • Published Jul 25, 2024 • 33

OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 84

garyzhang

AI & ML interests

Recent Activity

Organizations

xiaoniqiu's activity

Gaia2 and ARE: Empowering the community to study agents