vinhnx90 commited on
Commit
a39307d
·
verified ·
1 Parent(s): 259106b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +137 -24
README.md CHANGED
@@ -6,45 +6,43 @@ tags:
6
  - generated_from_trainer
7
  - trl
8
  - grpo
 
 
9
  licence: license
10
  datasets:
11
  - openai/gsm8k
12
  ---
13
- # Gemma Text Generation
14
 
15
- A simple, configurable Python script for generating text using the Gemma-3-1B "Thinking" model from Hugging Face.
16
-
17
- ## Features
18
-
19
- - Device auto-detection (CPU, CUDA, MPS)
20
- - Command-line interface for easy configuration
21
- - Reusable text generation function
22
- - Multiple example prompts
23
- - Support for chat-formatted input
24
 
25
  ## Model Information
26
 
27
- This project uses `vinhnx90/gemma3-1b-thinking`, which is a fine-tuned version of `google/gemma-3-1b-it`. The model was trained using [TRL](https://github.com/huggingface/trl) with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
 
 
28
 
29
  ### Training Approach
30
 
31
- This model was fine-tuned with Reinforcement Learning to enhance reasoning capabilities:
32
 
33
  - Used reasoning chains from [OpenAI's GSM8K dataset](https://huggingface.co/datasets/openai/gsm8k)
34
  - Implemented GRPO reward functions
35
  - Based on [Will Brown's approach](https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb)
36
  - Training implementation from [Ben Burtenshaw's Colab](https://x.com/ben_burtenshaw/status/1900202583056068663)
37
 
38
- The model is available on HuggingFace: [vinhnx90/gemma3-1b-thinking](https://huggingface.co/vinhnx90/gemma3-1b-thinking)
39
 
40
  ### Training Details
41
 
42
  - **Base Model**: google/gemma-3-1b-it
43
  - **Library**: transformers
44
  - **Training Method**: GRPO (from DeepSeekMath paper)
 
45
  - **Framework Versions**:
46
  - TRL: 0.16.0.dev0
47
  - Transformers: 4.50.0.dev0
 
48
  - Pytorch: 2.5.1+cu124
49
  - Datasets: 3.3.2
50
  - Tokenizers: 0.21.0
@@ -54,6 +52,7 @@ The model is available on HuggingFace: [vinhnx90/gemma3-1b-thinking](https://hug
54
  ```
55
  torch
56
  transformers
 
57
  ```
58
 
59
  ## Installation
@@ -62,32 +61,134 @@ transformers
62
  2. Install the required packages:
63
 
64
  ```bash
65
- pip install torch transformers
66
  ```
67
 
68
  ## Usage
69
 
 
70
 
71
- ### Available Arguments
72
 
73
- | Argument | Description | Default |
74
- |----------|-------------|---------|
75
- | `--prompt` | Input text for generation | "If you had a time machine..." |
76
- | `--model` | Hugging Face model name | "vinhnx90/gemma3-1b-thinking" |
77
- | `--device` | Computing device (cpu, cuda, mps, or None for auto) | Auto-detect |
78
- | `--max-tokens` | Maximum number of new tokens to generate | 128 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
- ### Quick Start Example
81
 
82
  ```python
83
  from transformers import pipeline
84
 
 
 
 
 
 
 
 
 
85
  question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
86
- generator = pipeline("text-generation", model="vinhnx90/gemma3-1b-thinking", device="cuda")
87
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
 
 
 
 
 
 
88
  print(output["generated_text"])
89
  ```
90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  ## Citations
92
 
93
  ### Implementation References
@@ -117,6 +218,18 @@ print(output["generated_text"])
117
  }
118
  ```
119
 
 
 
 
 
 
 
 
 
 
 
 
 
120
  ## License
121
 
122
  This project is licensed under the same license as the base model.
 
6
  - generated_from_trainer
7
  - trl
8
  - grpo
9
+ - peft
10
+ - adapter
11
  licence: license
12
  datasets:
13
  - openai/gsm8k
14
  ---
15
+ # Gemma Text Generation with PEFT Adapter
16
 
17
+ A simple, configurable Python script for generating text using the Gemma-3-1B "Thinking" PEFT adapter model from Hugging Face.
 
 
 
 
 
 
 
 
18
 
19
  ## Model Information
20
 
21
+ This project uses `vinhnx90/gemma3-1b-thinking`, which is a PEFT (Parameter-Efficient Fine-Tuning) adapter for `google/gemma-3-1b-it`. Unlike a full model, this is a lightweight adapter that works alongside the base model, making it easier to distribute and use with limited resources.
22
+
23
+ The model was trained using [TRL](https://github.com/huggingface/trl) with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
24
 
25
  ### Training Approach
26
 
27
+ This adapter was fine-tuned with Reinforcement Learning to enhance reasoning capabilities:
28
 
29
  - Used reasoning chains from [OpenAI's GSM8K dataset](https://huggingface.co/datasets/openai/gsm8k)
30
  - Implemented GRPO reward functions
31
  - Based on [Will Brown's approach](https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb)
32
  - Training implementation from [Ben Burtenshaw's Colab](https://x.com/ben_burtenshaw/status/1900202583056068663)
33
 
34
+ The adapter is available on HuggingFace: [vinhnx90/gemma3-1b-thinking](https://huggingface.co/vinhnx90/gemma3-1b-thinking)
35
 
36
  ### Training Details
37
 
38
  - **Base Model**: google/gemma-3-1b-it
39
  - **Library**: transformers
40
  - **Training Method**: GRPO (from DeepSeekMath paper)
41
+ - **PEFT Method**: LoRA (Low-Rank Adaptation)
42
  - **Framework Versions**:
43
  - TRL: 0.16.0.dev0
44
  - Transformers: 4.50.0.dev0
45
+ - PEFT: 0.9.0
46
  - Pytorch: 2.5.1+cu124
47
  - Datasets: 3.3.2
48
  - Tokenizers: 0.21.0
 
52
  ```
53
  torch
54
  transformers
55
+ peft
56
  ```
57
 
58
  ## Installation
 
61
  2. Install the required packages:
62
 
63
  ```bash
64
+ pip install torch transformers peft
65
  ```
66
 
67
  ## Usage
68
 
69
+ ### Running with PEFT Adapter
70
 
71
+ Since this is a PEFT adapter, you need to load both the base model and the adapter:
72
 
73
+ ```python
74
+ from transformers import AutoModelForCausalLM, AutoTokenizer
75
+ from peft import PeftModel, PeftConfig
76
+
77
+ # Load the base model and tokenizer
78
+ base_model_id = "google/gemma-3-1b-it"
79
+ tokenizer = AutoTokenizer.from_pretrained(base_model_id)
80
+ model = AutoModelForCausalLM.from_pretrained(
81
+ base_model_id,
82
+ device_map="auto", # Automatically determine the device
83
+ torch_dtype="auto" # Use the appropriate precision
84
+ )
85
+
86
+ # Load the PEFT adapter
87
+ adapter_model_id = "vinhnx90/gemma3-1b-thinking"
88
+ model = PeftModel.from_pretrained(model, adapter_model_id)
89
+
90
+ # Generate text
91
+ prompt = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
92
+ inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
93
+ outputs = model.generate(
94
+ inputs,
95
+ max_new_tokens=128,
96
+ do_sample=True,
97
+ temperature=0.7,
98
+ top_p=0.9,
99
+ )
100
+
101
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
102
+ print(response)
103
+ ```
104
+
105
+ ### Chat Format Example
106
+
107
+ For chat-formatted inputs:
108
+
109
+ ```python
110
+ from transformers import AutoModelForCausalLM, AutoTokenizer
111
+ from peft import PeftModel
112
+
113
+ # Load the base model and tokenizer
114
+ base_model_id = "google/gemma-3-1b-it"
115
+ tokenizer = AutoTokenizer.from_pretrained(base_model_id)
116
+ model = AutoModelForCausalLM.from_pretrained(
117
+ base_model_id,
118
+ device_map="auto",
119
+ torch_dtype="auto"
120
+ )
121
+
122
+ # Load the PEFT adapter
123
+ adapter_model_id = "vinhnx90/gemma3-1b-thinking"
124
+ model = PeftModel.from_pretrained(model, adapter_model_id)
125
+
126
+ # Prepare chat messages
127
+ messages = [
128
+ {"role": "user", "content": "Calculate the area of a circle with radius 5cm"}
129
+ ]
130
+
131
+ # Format messages for the model
132
+ prompt = tokenizer.apply_chat_template(
133
+ messages,
134
+ tokenize=False,
135
+ add_generation_prompt=True
136
+ )
137
+
138
+ # Generate response
139
+ inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
140
+ outputs = model.generate(
141
+ inputs,
142
+ max_new_tokens=256,
143
+ do_sample=True,
144
+ temperature=0.7,
145
+ )
146
+
147
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
148
+ print(response)
149
+ ```
150
+
151
+ ### Using the Pipeline API
152
 
153
+ For a simpler approach (note: this may download the full adapter model):
154
 
155
  ```python
156
  from transformers import pipeline
157
 
158
+ # Initialize the pipeline with the adapter model
159
+ generator = pipeline(
160
+ "text-generation",
161
+ model="vinhnx90/gemma3-1b-thinking",
162
+ model_kwargs={"device_map": "auto", "torch_dtype": "auto"}
163
+ )
164
+
165
+ # Generate text
166
  question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
167
+ output = generator(
168
+ [{"role": "user", "content": question}],
169
+ max_new_tokens=128,
170
+ do_sample=True,
171
+ temperature=0.7,
172
+ return_full_text=False
173
+ )[0]
174
+
175
  print(output["generated_text"])
176
  ```
177
 
178
+ ## Available Command-Line Arguments
179
+
180
+ If you use the command-line script, the following arguments are available:
181
+
182
+ | Argument | Description | Default |
183
+ |----------|-------------|---------|
184
+ | `--prompt` | Input text for generation | "If you had a time machine..." |
185
+ | `--base-model` | Hugging Face base model name | "google/gemma-3-1b-it" |
186
+ | `--adapter` | Hugging Face adapter model name | "vinhnx90/gemma3-1b-thinking" |
187
+ | `--device` | Computing device (cpu, cuda, mps, or auto) | "auto" |
188
+ | `--max-tokens` | Maximum number of new tokens to generate | 128 |
189
+ | `--temperature` | Sampling temperature | 0.7 |
190
+ | `--top-p` | Top-p sampling parameter | 0.9 |
191
+
192
  ## Citations
193
 
194
  ### Implementation References
 
218
  }
219
  ```
220
 
221
+ ### PEFT
222
+ ```bibtex
223
+ @misc{peft,
224
+ title = {{PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware}},
225
+ author = {Younes Belkada and Thomas Wang and Yasmine Manar and Ajay Brahmakshatriya and Huu Nguyen and Yongwei Zhou and Soumya Batra and Neil Band and Romi Ponciano and Suraj Patil and Colin Raffel and Siddhartha Kamalakara and Enrico Shippole and Vesselin Popov and Lewis Tunstall and Brian Mugo and Patrick von Platen and Clémentine Fourrier and Surya Dantuluri and Luke Vilnis and Adam P. Saxton},
226
+ year = 2023,
227
+ journal = {GitHub repository},
228
+ publisher = {GitHub},
229
+ howpublished = {\url{https://github.com/huggingface/peft}}
230
+ }
231
+ ```
232
+
233
  ## License
234
 
235
  This project is licensed under the same license as the base model.