---
license: apache-2.0
datasets:
- RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset
language:
- tr
- en
base_model:
- ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1
library_name: transformers
tags:
- unsloth
- llama
- trl
- reasoning
- billingual
- stem
---


## Model Overview

**RefinedNeuro/RN_TR_R2** is a Turkish‐language reasoning model fine-tuned from
`ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1` using the GRPO method on distilled Q&A data of Qwen3 model. It excels at open-ended reasoning tasks in Turkish, particularly STEM, history, and cultural questions.

* **🌐 Available on:**: [https://ollama.com/RefinedNeuro/RN\_TR\_R2](https://ollama.com/RefinedNeuro/RN_TR_R2)
  
---

## Evaluation

![image/png](https://cdn-uploads.huggingface.co/production/uploads/680e40dae19765db2aeb9d80/hyOZSykSa609rDDzPpml2.png)

Benchmark: [RN\_TR\_R2\_Benchmark\_Results](https://huggingface.co/datasets/RefinedNeuro/RN_TR_R2_Benchmark_Results)
(Open-ended Turkish culture & reasoning questions)

| Model Name                              | Score (%) | Description                                                 |
| --------------------------------------- | --------- | ----------------------------------------------------------- |
| **RefinedNeuro/RN_TR_R2**             | **82.4**  | Upgraded from RN_TR_R1 via GRPO on distilled Qwen3 data   |
| Qwen3-8B                                | 76.8      | Off-the-shelf Qwen3-8B                                      |
| RefinedNeuro/RN\_TR\_R1                 | 64.8      | Baseline Turkish-fine-tuned model                           |
| ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 | 63.2      | Turkish-adapted Llama-8B via direct preference optimization |
| Meta-Llama-3.1-8B-Instruct              | 63.2      | Meta’s Llama-3.1-8B with instruct tuning                    |

RN_TR_R2 achieves a **17.6 pp** improvement over the RN_TR_R1 baseline.

---

## Intended Use

* **Primary use cases**: Question answering, step-by-step reasoning, educational tools in Turkish (math, physics, chemistry, biology, geometry, trigonometry, statistics, history, culture).
* **Not recommended**: Generating creative fiction or tasks requiring memorized facts outside its training scope.
* **Known limitations**: While no systematic failure modes are identified, RN_TR_R2 may hallucinate like all LLMs.

---

## Training Data

* **Base model**: `ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1`
* **Fine-tuning dataset**: [RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset](https://huggingface.co/datasets/RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset)

  * **Domains**: Mathematics, Physics, Chemistry, Biology, Geometry, Trigonometry, Statistics
  * **Grade levels**: 6th–12th grade
  * **Prompt variations** (e.g., long/detailed vs. short/easy; answer must be a single number)
  * **Size**: 13 000 examples (\~9 million tokens)
  * **License**: Creative Commons

---

## Training Configuration

* **Method**: GRPO (Generalized Reinforcement-Preference Optimization) via `unsloth`
* **Hardware**: 2× RTX 3090 GPUs for 1 week (\~336 GPU-hours)
* **Batch size**: 1
* **Context length**: 4 096 tokens (max\_seq\_length=8 192 at inference)
* **Optimizer & scheduler**: Cosine LR schedule, peak LR=5×10⁻⁶
* **Library**: `unsloth==2025.3.18`

---

## Usage Example

```bash
pip install unsloth==2025.3.18
```

```python
from unsloth import FastLanguageModel
from transformers import TextStreamer

# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="RefinedNeuro/RN_TR_R2",
    max_seq_length=8192,
    load_in_4bit=True,
)

FastLanguageModel.for_inference(model)

SYSTEM_PROMPT = """
Sana sorulan sorulara cevap verirken, adım adım düşün ve gerekçelendir.
Önce <think></think> tagları arasında adım adım düşün.
Sonra düşünme adımlarını özetle.
En son \\boxed{} içerisinde sorunun cevabını ver.
"""

def answer_question(question: str):
    messages = [
        {'role': 'system', 'content': SYSTEM_PROMPT},
        {'role': 'user',   'content': question},
    ]
    inputs = tokenizer.apply_chat_template(
        messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
    ).to("cuda")

    streamer = TextStreamer(tokenizer)
    print("-" * 60)
    _ = model.generate(
        input_ids=inputs,
        streamer=streamer,
        max_new_tokens=2048,
        temperature=0.6,
        top_p=0.1,
    )

# Examples
answer_question("Türkiye'nin başkenti neredir?")
answer_question("6 ve 2'nin çarpımı nedir?")
```

---

## Sample Q\&A

```
> **Q:** f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun x = 1 noktasındaki türevini bulunuz.
>
> **A:** <think>

1. Adım: f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun türevini bulmak için, önce f(x)'nin türevini bulmamız gerekmektedir.

2. Adım: Türevi bulmak için fonksiyonu parçalara ayırarak analiz edeceğiz.

f(x) = 3x⁴ - 2x³ + 5x - 7

3. Adım: Türevi bulmak için her terimi ayrı ayrı türevini buluyoruz.

- 3x⁴: Türevi 12x³'dür.
- -2x³: Türevi -6x²'dir.
- 5x: Türevi 5'dir.
- -7: Türevi 0'dır (çünkü sabit bir sayıdır).

4. Adım: Türevi bulmak için elde edilen türevleri birleştiriyoruz.

f'(x) = 12x³ - 6x² + 5

5. Adım: Şimdi x = 1 yerine koyarak f'(x)'nin türevini hesaplayacağız.

f'(1) = 12(1)³ - 6(1)² + 5

f'(1) = 12(1) - 6(1) + 5

f'(1) = 12 - 6 + 5

f'(1) = 11

6. Adım: x = 1 noktasındaki türevi 11 olarak bulduk.

\boxed{11}

</think>

\boxed{11}

```

---

## Citation

If you use this model, please cite:

```bibtex
@misc{farsak2025rn_tr_r2,
  title        = {RefinedNeuro/RN\_TR\_R2: A Turkish Reasoning Model},
  author       = {Mehmet Can Farsak and Serhat Atayeter and RefinedNeuro},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/RefinedNeuro/RN_TR_R2}},
  license      = {Apache-2.0}
}
```

---

## License

Apache 2.0 — the most permissive open-source license.

---

## Authors

* **Authors**: Mehmet Can Farsak, Serhat Atayeter, RefinedNeuro