Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
OctoThinker 's Collections
Mid-training Analysis Checkpoints (Llama-3.2-3B)
OctoThinker-Llama-8B Family
OctoThinker-Llama-3B Family
OctoThinker-Llama-1B Family

Mid-training Analysis Checkpoints (Llama-3.2-3B)

updated Jul 7

What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training.

Upvote
1

  • OctoThinker/Llama_32_3B_finemath_4p_bs4M_seq8k_20B

    Text Generation • Updated Jul 7

  • OctoThinker/Llama_32_3B_megamath_web_pro_bs4M_seq8k_20B

    Text Generation • Updated Jul 7

  • OctoThinker/Llama_32_3B_megamath_web_pro_max_bs4M_seq8k_20B

    Text Generation • Updated Jul 7

  • OctoThinker/Llama_32_3B_megamath_web_pro_megamath_synth_qa_31_bs4M_seq8k_20B

    Updated Jul 3

  • OctoThinker/Llama_32_3B_megamath_web_pro_megamath_synth_qa_91_bs4M_seq8k_20B

    Text Generation • Updated Jul 7

  • OctoThinker/Llama_32_3B_megamath_web_pro_megamath_synth_qa_general_ins_89_10_1_bs4M_seq8k_20B

    Text Generation • Updated Jul 7

  • OctoThinker/Llama_32_3B_megamath_web_pro_open_r1_longcot_91_bs4M_seq8k_20B

    Text Generation • Updated Jul 7

  • OctoThinker/Llama_32_3B_megamath_web_pro_open_r1_longcot_general_ins_89_10_1_bs4M_seq8k_20B

    Text Generation • Updated Jul 7

  • OctoThinker/Llama_32_3B_megamath_web_pro_open_r1_longcot_general_ins_89_10_1_bs4M_seq16k_20B

    Updated Jul 3

  • OctoThinker/Llama_32_3B_megamath_web_pro_max_bs4M_seq8k_100B

    Text Generation • Updated Jul 7
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs