Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Metin
/
LLaMA-3-8B-Math-Majority-Vote-GRPO
like
0
Text Generation
Transformers
Safetensors
English
llama
text-generation-inference
unsloth
trl
grpo
test-time-reinforcement-learning
conversational
arxiv:
2504.16084
License:
llama3
Model card
Files
Files and versions
Community
Train
Deploy
Use this model
main
LLaMA-3-8B-Math-Majority-Vote-GRPO
Commit History
Update README.md
285253d
verified
Metin
commited on
May 18
Update README.md
9885575
verified
Metin
commited on
May 18
Update README.md
fcb5e07
verified
Metin
commited on
May 18
Update README.md
5f99087
verified
Metin
commited on
May 18
Upload llama_clones.png
0a728d9
verified
Metin
commited on
May 18
Update README.md
bb87305
verified
Metin
commited on
May 18
Trained with Unsloth
5bcefe6
verified
Metin
commited on
May 14
Upload tokenizer
52635a2
verified
Metin
commited on
May 14
Upload README.md with huggingface_hub
7eef5be
verified
Metin
commited on
May 14
initial commit
5b1ba9e
verified
Metin
commited on
May 14