--- license: apache-2.0 datasets: - RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset language: - tr - en base_model: - ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 library_name: transformers tags: - unsloth - llama - trl - reasoning - billingual - stem --- ## Model Overview **RefinedNeuro/RN_TR_R2** is a Turkish‐language reasoning model fine-tuned from `ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1` using the GRPO method on distilled Q&A data of Qwen3 model. It excels at open-ended reasoning tasks in Turkish, particularly STEM, history, and cultural questions. * **🌐 Available on:**: [https://ollama.com/RefinedNeuro/RN\_TR\_R2](https://ollama.com/RefinedNeuro/RN_TR_R2) --- ## Evaluation ![image/png](https://cdn-uploads.huggingface.co/production/uploads/680e40dae19765db2aeb9d80/hyOZSykSa609rDDzPpml2.png) Benchmark: [RN\_TR\_R2\_Benchmark\_Results](https://huggingface.co/datasets/RefinedNeuro/RN_TR_R2_Benchmark_Results) (Open-ended Turkish culture & reasoning questions) | Model Name | Score (%) | Description | | --------------------------------------- | --------- | ----------------------------------------------------------- | | **RefinedNeuro/RN_TR_R2** | **82.4** | Upgraded from RN_TR_R1 via GRPO on distilled Qwen3 data | | Qwen3-8B | 76.8 | Off-the-shelf Qwen3-8B | | RefinedNeuro/RN\_TR\_R1 | 64.8 | Baseline Turkish-fine-tuned model | | ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 | 63.2 | Turkish-adapted Llama-8B via direct preference optimization | | Meta-Llama-3.1-8B-Instruct | 63.2 | Meta’s Llama-3.1-8B with instruct tuning | RN_TR_R2 achieves a **17.6 pp** improvement over the RN_TR_R1 baseline. --- ## Intended Use * **Primary use cases**: Question answering, step-by-step reasoning, educational tools in Turkish (math, physics, chemistry, biology, geometry, trigonometry, statistics, history, culture). * **Not recommended**: Generating creative fiction or tasks requiring memorized facts outside its training scope. * **Known limitations**: While no systematic failure modes are identified, RN_TR_R2 may hallucinate like all LLMs. --- ## Training Data * **Base model**: `ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1` * **Fine-tuning dataset**: [RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset](https://huggingface.co/datasets/RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset) * **Domains**: Mathematics, Physics, Chemistry, Biology, Geometry, Trigonometry, Statistics * **Grade levels**: 6th–12th grade * **Prompt variations** (e.g., long/detailed vs. short/easy; answer must be a single number) * **Size**: 13 000 examples (\~9 million tokens) * **License**: Creative Commons --- ## Training Configuration * **Method**: GRPO (Generalized Reinforcement-Preference Optimization) via `unsloth` * **Hardware**: 2× RTX 3090 GPUs for 1 week (\~336 GPU-hours) * **Batch size**: 1 * **Context length**: 4 096 tokens (max\_seq\_length=8 192 at inference) * **Optimizer & scheduler**: Cosine LR schedule, peak LR=5×10⁻⁶ * **Library**: `unsloth==2025.3.18` --- ## Usage Example ```bash pip install unsloth==2025.3.18 ``` ```python from unsloth import FastLanguageModel from transformers import TextStreamer # Load the model and tokenizer model, tokenizer = FastLanguageModel.from_pretrained( model_name="RefinedNeuro/RN_TR_R2", max_seq_length=8192, load_in_4bit=True, ) FastLanguageModel.for_inference(model) SYSTEM_PROMPT = """ Sana sorulan sorulara cevap verirken, adım adım düşün ve gerekçelendir. Önce tagları arasında adım adım düşün. Sonra düşünme adımlarını özetle. En son \\boxed{} içerisinde sorunun cevabını ver. """ def answer_question(question: str): messages = [ {'role': 'system', 'content': SYSTEM_PROMPT}, {'role': 'user', 'content': question}, ] inputs = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt" ).to("cuda") streamer = TextStreamer(tokenizer) print("-" * 60) _ = model.generate( input_ids=inputs, streamer=streamer, max_new_tokens=2048, temperature=0.6, top_p=0.1, ) # Examples answer_question("Türkiye'nin başkenti neredir?") answer_question("6 ve 2'nin çarpımı nedir?") ``` --- ## Sample Q\&A ``` > **Q:** f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun x = 1 noktasındaki türevini bulunuz. > > **A:** 1. Adım: f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun türevini bulmak için, önce f(x)'nin türevini bulmamız gerekmektedir. 2. Adım: Türevi bulmak için fonksiyonu parçalara ayırarak analiz edeceğiz. f(x) = 3x⁴ - 2x³ + 5x - 7 3. Adım: Türevi bulmak için her terimi ayrı ayrı türevini buluyoruz. - 3x⁴: Türevi 12x³'dür. - -2x³: Türevi -6x²'dir. - 5x: Türevi 5'dir. - -7: Türevi 0'dır (çünkü sabit bir sayıdır). 4. Adım: Türevi bulmak için elde edilen türevleri birleştiriyoruz. f'(x) = 12x³ - 6x² + 5 5. Adım: Şimdi x = 1 yerine koyarak f'(x)'nin türevini hesaplayacağız. f'(1) = 12(1)³ - 6(1)² + 5 f'(1) = 12(1) - 6(1) + 5 f'(1) = 12 - 6 + 5 f'(1) = 11 6. Adım: x = 1 noktasındaki türevi 11 olarak bulduk. \boxed{11} \boxed{11} ``` --- ## Citation If you use this model, please cite: ```bibtex @misc{farsak2025rn_tr_r2, title = {RefinedNeuro/RN\_TR\_R2: A Turkish Reasoning Model}, author = {Mehmet Can Farsak and Serhat Atayeter and RefinedNeuro}, year = {2025}, howpublished = {\url{https://huggingface.co/RefinedNeuro/RN_TR_R2}}, license = {Apache-2.0} } ``` --- ## License Apache 2.0 — the most permissive open-source license. --- ## Authors * **Authors**: Mehmet Can Farsak, Serhat Atayeter, RefinedNeuro