--- base_model: ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 tags: - text-generation-inference - transformers - unsloth - llama - trl - grpo - test-time-reinforcement-learning license: llama3 language: - en --- A scene from a famous movie # LLaMA-3-8B-Math-Majority-Vote-GRPO Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO is a [Test Time Reinforcement Learning (TTRL)](https://arxiv.org/abs/2504.16084) trained version of ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1. It is trained on Turkish math word problems using GRPO method and a majority vote reward function. ## Training Info - **Base Model**: [Turkish-Llama-8b-DPO-v0.1](https://huggingface.co/ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1) - **Training Data**: 2.000 open-ended math word problems. No proprietary data was included. - **Training Time**: 13 hours on a single L40S - **LoRA Configs**: - lora_r: 16 - lora_alpha: 16 - lora_dropout: 0 - lora_target_linear: true The goal was to train a model without using any labels or ground truth answers that can reason before generating the answer. It uses the below template: ```xml ... ``` For more information visit [my blog post](https://metinusta.github.io/post.html?slug=test-time-reinforcement-learning) about this model please. ## How to use 1. Install vLLM ```bash pip install vllm ``` 2. ```python from vllm import LLM, SamplingParams import json llm = LLM(model="Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO") sampling_params = SamplingParams(temperature=0.5) SYSTEM_PROMPT = """ Sana verilen matematik problemi hakkında düşün ve çözümü bul. Düşüncelerini ve arasına yaz. Sonucu ise ve arasına yaz. Sonucu yazarken sadece rakamları, noktayı ve virgülü kullan. Noktayı binlik ayracı, virgülü ise ondalık ayracı olarak kullanmalısın. Örnek: 1.450,02 """ conversation = [ { "role": "system", "content": SYSTEM_PROMPT } { "role": "user", "content": "Nüfus 20.000'dir. Nüfus her yıl %10 artmaktadır. Buna göre üç yıl sonra nüfus kaç olur?" } ] outputs = llm.chat( conversation, sampling_params=sampling_params, use_tqdm=False ) result = json.loads(outputs[0].outputs[0].text) print(result) ``` # Citation ``` @article{Metin, title={Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO}, author={Metin Usta}, year={2024}, url={https://huggingface.co/Metin/LLaMA-3-8B-Math-Majority-Vote-GRPO} } ```