🧙 Guru Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Paper • 2506.14965 • Published Jun 17 • 49 LLM360/guru-RL-92k Viewer • Updated 18 days ago • 91.9k • 1.01k • 19 LLM360/guru-7B Text Generation • 8B • Updated Jun 19 • 1.19k • • 1 LLM360/guru-32B Text Generation • 33B • Updated Jun 19 • 20
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Paper • 2506.14965 • Published Jun 17 • 49
🐙 OctoThinker Mid-training Incentivizes Reinforcement Learning Scaling OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published Jun 25 • 46 OctoThinker/MegaMath-Web-Pro-Max Viewer • Updated Jul 6 • 69.2M • 8.49k • 35 OctoThinker/OctoThinker-8B-Long-Base Text Generation • 8B • Updated Jul 6 • 15 OctoThinker/OctoThinker-8B-Hybrid-Base Text Generation • 8B • Updated Jul 6 • 143 • 2
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published Jun 25 • 46
🧙 Guru Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Paper • 2506.14965 • Published Jun 17 • 49 LLM360/guru-RL-92k Viewer • Updated 18 days ago • 91.9k • 1.01k • 19 LLM360/guru-7B Text Generation • 8B • Updated Jun 19 • 1.19k • • 1 LLM360/guru-32B Text Generation • 33B • Updated Jun 19 • 20
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Paper • 2506.14965 • Published Jun 17 • 49
🐙 OctoThinker Mid-training Incentivizes Reinforcement Learning Scaling OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published Jun 25 • 46 OctoThinker/MegaMath-Web-Pro-Max Viewer • Updated Jul 6 • 69.2M • 8.49k • 35 OctoThinker/OctoThinker-8B-Long-Base Text Generation • 8B • Updated Jul 6 • 15 OctoThinker/OctoThinker-8B-Hybrid-Base Text Generation • 8B • Updated Jul 6 • 143 • 2
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published Jun 25 • 46