---
base_model: ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
- grpo
- test-time-reinforcement-learning
license: llama3
language:
- en
---
# LLaMA-3-8B-Math-Majority-Vote-GRPO
Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO is a [Test Time Reinforcement Learning (TTRL)](https://arxiv.org/abs/2504.16084) trained version of ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1. It is trained on Turkish math word problems using GRPO method and a majority vote reward function.
## Training Info
- **Base Model**: [Turkish-Llama-8b-DPO-v0.1](https://huggingface.co/ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1)
- **Training Data**: 2.000 open-ended math word problems. No proprietary data was included.
- **Training Time**: 13 hours on a single L40S
- **LoRA Configs**:
- lora_r: 16
- lora_alpha: 16
- lora_dropout: 0
- lora_target_linear: true
The goal was to train a model without using any labels or ground truth answers that can reason before generating the answer. It uses the below template:
```xml
...
```
For more information visit [my blog post](https://metinusta.github.io/post.html?slug=test-time-reinforcement-learning) about this model please.
## How to use
1. Install vLLM
```bash
pip install vllm
```
2.
```python
from vllm import LLM, SamplingParams
import json
llm = LLM(model="Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO")
sampling_params = SamplingParams(temperature=0.5)
SYSTEM_PROMPT = """
Sana verilen matematik problemi hakkında düşün ve çözümü bul.
Düşüncelerini ve arasına yaz.
Sonucu ise ve arasına yaz. Sonucu yazarken sadece rakamları, noktayı ve virgülü kullan. Noktayı binlik ayracı, virgülü ise ondalık ayracı olarak kullanmalısın. Örnek: 1.450,02
"""
conversation = [
{
"role": "system",
"content": SYSTEM_PROMPT
}
{
"role": "user",
"content": "Nüfus 20.000'dir. Nüfus her yıl %10 artmaktadır. Buna göre üç yıl sonra nüfus kaç olur?"
}
]
outputs = llm.chat(
conversation,
sampling_params=sampling_params,
use_tqdm=False
)
result = json.loads(outputs[0].outputs[0].text)
print(result)
```
# Citation
```
@article{Metin,
title={Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO},
author={Metin Usta},
year={2024},
url={https://huggingface.co/Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO}
}
```