LLaMA-3.2-1B-Instruct Post-training by GRPO from DeepSeek
This model is a post-trained version of LLaMA-3.2-1B-Instruct.
Model Details
- Base Model: LLaMA-3.2-1B
- Training Data: openai/gsm8k
- Post-training Steps: 1000
- Checkpoint:
checkpoint-1000/
- Framework: Hugging Face
transformers
- Usage: Mathematical Reasoning.
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "accuracy-maker/Llama-3.2-1B-GRPO-gsm8k"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
input_text = "What is the capital of France?"
generate_with_stream(input_text)
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.