Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
0-hero 's Collections
R1-GRPO-Math-Python-Code-Experiments
Prompt Perfect
GPT-2 Experiment
Matter-0.1
Matter 0.2

R1-GRPO-Math-Python-Code-Experiments

updated May 11

Lora & full finetune experiments on r1 distills to generate python code for math problems

Upvote
-

  • 0-hero/r1-7B-grpo-v3.3-epoch-3

    8B • Updated Mar 28 • 1

  • 0-hero/r1-7B-grpo-v3.3-epoch-2

    8B • Updated Mar 28 • 2

  • 0-hero/r1-7B-grpo-v3.3-epoch-1

    8B • Updated Mar 28 • 1

  • 0-hero/r1-7B-grpo-v3.2-epoch-1

    8B • Updated Mar 27 • 1

  • 0-hero/r1-7B-grpo-v3.2-epoch-2

    8B • Updated Mar 27 • 1

  • 0-hero/r1-14B-grpo-v3.1-epoch-2

    15B • Updated Mar 26 • 1

  • 0-hero/r1-14B-grpo-v3.1-epoch-1

    15B • Updated Mar 26 • 1

  • 0-hero/r1-7B-grpo-v3.1-epoch-3

    8B • Updated Mar 24 • 1

  • 0-hero/r1-7B-grpo-v3.1-epoch-2

    8B • Updated Mar 24 • 1

  • 0-hero/r1-7B-grpo-v2-temp-1.0-60

    8B • Updated Mar 23 • 2

  • 0-hero/r1-14B-math-grpo-165

    15B • Updated Mar 12 • 1

  • 0-hero/r1-14B-math-grpo-80

    15B • Updated Mar 11 • 1

  • 0-hero/r1-7B-grpo-850

    8B • Updated Mar 10 • 2

  • 0-hero/r1-7B-grpo-710

    8B • Updated Mar 10 • 1

  • 0-hero/r1-7B-grpo-610

    8B • Updated Mar 10 • 1

  • 0-hero/r1-7B-grpo-80

    8B • Updated Mar 10 • 1

  • 0-hero/R1-7B-MATH-GRPO-FULL

    8B • Updated Mar 9 • 1

  • 0-hero/R1-14B-GRPO

    15B • Updated Mar 8 • 3

  • 0-hero/r1-7b-grpo-full

    8B • Updated Mar 6 • 2

  • 0-hero/r1-8b-grpo-full

    Updated Mar 6
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs