Collections

Discover the best community collections!

Collections including paper arxiv:2506.20512
VisionLM
Collection by about 18 hours ago
🐙 OctoThinker
Mid-training Incentivizes Reinforcement Learning Scaling
Psychology
Collection by 16 days ago
VisionLM
Collection by about 18 hours ago
OctoThinker-Llama-8B Family
What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training.
🐙 OctoThinker
Mid-training Incentivizes Reinforcement Learning Scaling
Psychology
Collection by 16 days ago