rd211 commited on
Commit
dc63a67
·
verified ·
1 Parent(s): a07d8d2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ license_link: https://github.com/eth-lre/PedagogicalRL/blob/main/LICENSE
5
+ pipeline_tag: text-generation
6
+ base_model:
7
+ - Qwen/Qwen2.5-7B-Instruct
8
+ tags:
9
+ - math-tutor
10
+ - grpo
11
+ ---
12
+
13
+ # TutorRL-7B
14
+
15
+ ## Overview
16
+
17
+ **TutorRL-7B** is a fine-tuned variant of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), trained to act as a math **tutor** rather than a solver. It is aligned to pedagogical principles using **reinforcement learning (GRPO)** in a synthetic multi-turn classroom setting, without requiring any human-labeled data.
18
+
19
+ This model was developed as part of the research project [*From Problem-Solving to Teaching Problem-Solving*](https://arxiv.org/abs/2505.15607), which proposes a scalable, annotation-free approach to training LLMs as **educational tutors**. Instead of directly answering questions, the model is optimized to scaffold reasoning, guide through Socratic questioning, and withhold final solutions when beneficial for learning.
20
+
21
+ Repository: [https://github.com/eth-lre/PedagogicalRL](https://github.com/eth-lre/PedagogicalRL)
22
+
23
+ ## Intended Use
24
+
25
+ This model is intended for use in:
26
+
27
+ * Interactive math tutoring
28
+ * Socratic dialogue generation
29
+ * Research on educational alignment of LLMs
30
+ * Safe and indirect teaching in problem-solving contexts
31
+
32
+ ## Example Usage
33
+
34
+ ```python
35
+ from transformers import AutoTokenizer, AutoModelForCausalLM
36
+
37
+ model_id = "eth-nlped/TutorRL-7B"
38
+
39
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
40
+ model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
41
+
42
+ messages = [
43
+ {"role": "user", "content": "Can you help me solve 3x + 5 = 20?"}
44
+ ]
45
+
46
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False)
47
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
48
+
49
+ outputs = model.generate(**inputs, max_new_tokens=512)
50
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
51
+ ```
52
+
53
+ > Note: This model does **not** generate `<think>` blocks. If you want planning-based reasoning, refer to this model variant: [TutorRL-7B-think](https://huggingface.co/eth-nlped/TutorRL-7B-think)
54
+
55
+ ## Citation
56
+
57
+ If you use this model or build upon the training framework, please cite:
58
+
59
+ ```
60
+ @misc{dinucujianu2025problemsolvingteachingproblemsolvingaligning,
61
+ title={From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning},
62
+ author={David Dinucu-Jianu and Jakub Macina and Nico Daheim and Ido Hakimi and Iryna Gurevych and Mrinmaya Sachan},
63
+ year={2025},
64
+ eprint={2505.15607},
65
+ archivePrefix={arXiv},
66
+ primaryClass={cs.CL},
67
+ url={https://arxiv.org/abs/2505.15607}
68
+ }
69
+ ```