Mid-training Analysis Checkpoints (Llama-3.2-3B)

OctoThinker 's Collections

updated Jul 7

What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training.

Upvote

OctoThinker/Llama_32_3B_finemath_4p_bs4M_seq8k_20B

Text Generation • Updated Jul 7
OctoThinker/Llama_32_3B_megamath_web_pro_bs4M_seq8k_20B

Text Generation • Updated Jul 7
OctoThinker/Llama_32_3B_megamath_web_pro_max_bs4M_seq8k_20B

Text Generation • Updated Jul 7
OctoThinker/Llama_32_3B_megamath_web_pro_megamath_synth_qa_31_bs4M_seq8k_20B

Updated Jul 3
OctoThinker/Llama_32_3B_megamath_web_pro_megamath_synth_qa_91_bs4M_seq8k_20B

Text Generation • Updated Jul 7
OctoThinker/Llama_32_3B_megamath_web_pro_megamath_synth_qa_general_ins_89_10_1_bs4M_seq8k_20B

Text Generation • Updated Jul 7
OctoThinker/Llama_32_3B_megamath_web_pro_open_r1_longcot_91_bs4M_seq8k_20B

Text Generation • Updated Jul 7
OctoThinker/Llama_32_3B_megamath_web_pro_open_r1_longcot_general_ins_89_10_1_bs4M_seq8k_20B

Text Generation • Updated Jul 7
OctoThinker/Llama_32_3B_megamath_web_pro_open_r1_longcot_general_ins_89_10_1_bs4M_seq16k_20B

Updated Jul 3
OctoThinker/Llama_32_3B_megamath_web_pro_max_bs4M_seq8k_100B

Text Generation • Updated Jul 7

Upvote

Collection guide
Browse collections