accuracy-maker
/

Llama-3.2-1B-GRPO-gsm8k

Text Generation

Model card Files Files and versions Community

LLaMA-3.2-1B-Instruct Post-training by GRPO from DeepSeek

This model is a post-trained version of LLaMA-3.2-1B-Instruct.

Model Details

Base Model: LLaMA-3.2-1B
Training Data: openai/gsm8k
Post-training Steps: 1000
Checkpoint: checkpoint-1000/
Framework: Hugging Face transformers
Usage: Mathematical Reasoning.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "accuracy-maker/Llama-3.2-1B-GRPO-gsm8k"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_text = "What is the capital of France?"
generate_with_stream(input_text)

Downloads last month: 27

Safetensors

Model size

1.24B params

Tensor type

BF16

·

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support