eth-nlped
/

TutorRL-7B

Text Generation

text-generation-inference

Model card Files Files and versions

rd211 commited on May 27

Commit

dc63a67

·

verified ·

1 Parent(s): a07d8d2

Create README.md

Files changed (1) hide show

README.md +69 -0

README.md ADDED Viewed

	@@ -0,0 +1,69 @@

+---
+library_name: transformers
+license: apache-2.0
+license_link: https://github.com/eth-lre/PedagogicalRL/blob/main/LICENSE
+pipeline_tag: text-generation
+base_model:
+- Qwen/Qwen2.5-7B-Instruct
+tags:
+- math-tutor
+- grpo
+---
+# TutorRL-7B
+## Overview
+**TutorRL-7B** is a fine-tuned variant of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), trained to act as a math **tutor** rather than a solver. It is aligned to pedagogical principles using **reinforcement learning (GRPO)** in a synthetic multi-turn classroom setting, without requiring any human-labeled data.
+This model was developed as part of the research project [*From Problem-Solving to Teaching Problem-Solving*](https://arxiv.org/abs/2505.15607), which proposes a scalable, annotation-free approach to training LLMs as **educational tutors**. Instead of directly answering questions, the model is optimized to scaffold reasoning, guide through Socratic questioning, and withhold final solutions when beneficial for learning.
+Repository: [https://github.com/eth-lre/PedagogicalRL](https://github.com/eth-lre/PedagogicalRL)
+## Intended Use
+This model is intended for use in:
+* Interactive math tutoring
+* Socratic dialogue generation
+* Research on educational alignment of LLMs
+* Safe and indirect teaching in problem-solving contexts
+## Example Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_id = "eth-nlped/TutorRL-7B"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
+messages = [
+    {"role": "user", "content": "Can you help me solve 3x + 5 = 20?"}
+]
+prompt = tokenizer.apply_chat_template(messages, tokenize=False)
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=512)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+> Note: This model does **not** generate `<think>` blocks. If you want planning-based reasoning, refer to this model variant: [TutorRL-7B-think](https://huggingface.co/eth-nlped/TutorRL-7B-think)
+## Citation
+If you use this model or build upon the training framework, please cite:
+```
+@misc{dinucujianu2025problemsolvingteachingproblemsolvingaligning,
+  title={From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning},
+  author={David Dinucu-Jianu and Jakub Macina and Nico Daheim and Ido Hakimi and Iryna Gurevych and Mrinmaya Sachan},
+  year={2025},
+  eprint={2505.15607},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL},
+  url={https://arxiv.org/abs/2505.15607}
+}
+```