
OctoThinker/Llama_32_3B_finemath_4p_bs4M_seq8k_20B
Text Generation
•
Updated
What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training.