OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published 8 days ago • 42
SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks Paper • 2506.10954 • Published 21 days ago • 51
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping Paper • 2505.15612 • Published May 21 • 33
view article Article wHy DoNt YoU jUsT uSe ThE lLaMa ToKeNiZeR?? By catherinearnett • Sep 27, 2024 • 46
view article Article I trained a Language Model to schedule events with GRPO! By anakin87 • Apr 29 • 80
ReadyAi/5000-podcast-conversations-with-metadata-and-embedding-dataset Viewer • Updated 13 days ago • 11.9k • 7.83k • 7