---
license: apache-2.0
datasets:
- RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset
language:
- tr
- en
base_model:
- ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1
library_name: transformers
tags:
- unsloth
- llama
- trl
- reasoning
- billingual
- stem
---
## Model Overview
**RefinedNeuro/RN_TR_R2** is a Turkish‐language reasoning model fine-tuned from
`ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1` using the GRPO method on distilled Q&A data of Qwen3 model. It excels at open-ended reasoning tasks in Turkish, particularly STEM, history, and cultural questions.
* **🌐 Available on:**: [https://ollama.com/RefinedNeuro/RN\_TR\_R2](https://ollama.com/RefinedNeuro/RN_TR_R2)
---
## Evaluation

Benchmark: [RN\_TR\_R2\_Benchmark\_Results](https://huggingface.co/datasets/RefinedNeuro/RN_TR_R2_Benchmark_Results)
(Open-ended Turkish culture & reasoning questions)
| Model Name | Score (%) | Description |
| --------------------------------------- | --------- | ----------------------------------------------------------- |
| **RefinedNeuro/RN_TR_R2** | **82.4** | Upgraded from RN_TR_R1 via GRPO on distilled Qwen3 data |
| Qwen3-8B | 76.8 | Off-the-shelf Qwen3-8B |
| RefinedNeuro/RN\_TR\_R1 | 64.8 | Baseline Turkish-fine-tuned model |
| ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 | 63.2 | Turkish-adapted Llama-8B via direct preference optimization |
| Meta-Llama-3.1-8B-Instruct | 63.2 | Meta’s Llama-3.1-8B with instruct tuning |
RN_TR_R2 achieves a **17.6 pp** improvement over the RN_TR_R1 baseline.
---
## Intended Use
* **Primary use cases**: Question answering, step-by-step reasoning, educational tools in Turkish (math, physics, chemistry, biology, geometry, trigonometry, statistics, history, culture).
* **Not recommended**: Generating creative fiction or tasks requiring memorized facts outside its training scope.
* **Known limitations**: While no systematic failure modes are identified, RN_TR_R2 may hallucinate like all LLMs.
---
## Training Data
* **Base model**: `ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1`
* **Fine-tuning dataset**: [RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset](https://huggingface.co/datasets/RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset)
* **Domains**: Mathematics, Physics, Chemistry, Biology, Geometry, Trigonometry, Statistics
* **Grade levels**: 6th–12th grade
* **Prompt variations** (e.g., long/detailed vs. short/easy; answer must be a single number)
* **Size**: 13 000 examples (\~9 million tokens)
* **License**: Creative Commons
---
## Training Configuration
* **Method**: GRPO (Generalized Reinforcement-Preference Optimization) via `unsloth`
* **Hardware**: 2× RTX 3090 GPUs for 1 week (\~336 GPU-hours)
* **Batch size**: 1
* **Context length**: 4 096 tokens (max\_seq\_length=8 192 at inference)
* **Optimizer & scheduler**: Cosine LR schedule, peak LR=5×10⁻⁶
* **Library**: `unsloth==2025.3.18`
---
## Usage Example
```bash
pip install unsloth==2025.3.18
```
```python
from unsloth import FastLanguageModel
from transformers import TextStreamer
# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="RefinedNeuro/RN_TR_R2",
max_seq_length=8192,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
SYSTEM_PROMPT = """
Sana sorulan sorulara cevap verirken, adım adım düşün ve gerekçelendir.
Önce tagları arasında adım adım düşün.
Sonra düşünme adımlarını özetle.
En son \\boxed{} içerisinde sorunun cevabını ver.
"""
def answer_question(question: str):
messages = [
{'role': 'system', 'content': SYSTEM_PROMPT},
{'role': 'user', 'content': question},
]
inputs = tokenizer.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to("cuda")
streamer = TextStreamer(tokenizer)
print("-" * 60)
_ = model.generate(
input_ids=inputs,
streamer=streamer,
max_new_tokens=2048,
temperature=0.6,
top_p=0.1,
)
# Examples
answer_question("Türkiye'nin başkenti neredir?")
answer_question("6 ve 2'nin çarpımı nedir?")
```
---
## Sample Q\&A
```
> **Q:** f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun x = 1 noktasındaki türevini bulunuz.
>
> **A:**
1. Adım: f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun türevini bulmak için, önce f(x)'nin türevini bulmamız gerekmektedir.
2. Adım: Türevi bulmak için fonksiyonu parçalara ayırarak analiz edeceğiz.
f(x) = 3x⁴ - 2x³ + 5x - 7
3. Adım: Türevi bulmak için her terimi ayrı ayrı türevini buluyoruz.
- 3x⁴: Türevi 12x³'dür.
- -2x³: Türevi -6x²'dir.
- 5x: Türevi 5'dir.
- -7: Türevi 0'dır (çünkü sabit bir sayıdır).
4. Adım: Türevi bulmak için elde edilen türevleri birleştiriyoruz.
f'(x) = 12x³ - 6x² + 5
5. Adım: Şimdi x = 1 yerine koyarak f'(x)'nin türevini hesaplayacağız.
f'(1) = 12(1)³ - 6(1)² + 5
f'(1) = 12(1) - 6(1) + 5
f'(1) = 12 - 6 + 5
f'(1) = 11
6. Adım: x = 1 noktasındaki türevi 11 olarak bulduk.
\boxed{11}
\boxed{11}
```
---
## Citation
If you use this model, please cite:
```bibtex
@misc{farsak2025rn_tr_r2,
title = {RefinedNeuro/RN\_TR\_R2: A Turkish Reasoning Model},
author = {Mehmet Can Farsak and Serhat Atayeter and RefinedNeuro},
year = {2025},
howpublished = {\url{https://huggingface.co/RefinedNeuro/RN_TR_R2}},
license = {Apache-2.0}
}
```
---
## License
Apache 2.0 — the most permissive open-source license.
---
## Authors
* **Authors**: Mehmet Can Farsak, Serhat Atayeter, RefinedNeuro