Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
Fan Zhou
koalazf99
AI & ML interests
Deep Learning; Natural Language Processing; Foundation Models
Organizations
🐙 OctoThinker
Mid-training Incentivizes Reinforcement Learning Scaling
-
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling
Paper • 2506.20512 • Published • 46 -
OctoThinker/MegaMath-Web-Pro-Max
Viewer • Updated • 69.2M • 8.64k • 35 -
OctoThinker/OctoThinker-8B-Long-Base
Text Generation • 8B • Updated • 15 -
OctoThinker/OctoThinker-8B-Hybrid-Base
Text Generation • 8B • Updated • 143 • 2
🫐 ProX Projects
Collection for: "Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale"
🧙 Guru
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
🐙 OctoThinker
Mid-training Incentivizes Reinforcement Learning Scaling
-
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling
Paper • 2506.20512 • Published • 46 -
OctoThinker/MegaMath-Web-Pro-Max
Viewer • Updated • 69.2M • 8.64k • 35 -
OctoThinker/OctoThinker-8B-Long-Base
Text Generation • 8B • Updated • 15 -
OctoThinker/OctoThinker-8B-Hybrid-Base
Text Generation • 8B • Updated • 143 • 2
💎 MegaMath
An Open Math Pre-trainng Dataset with 370B Tokens.
🫐 ProX Projects
Collection for: "Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale"