zary0 commited on
Commit
735c4b8
·
verified ·
1 Parent(s): cb1c5ef

Update README

Browse files
Files changed (1) hide show
  1. README.md +35 -7
README.md CHANGED
@@ -3,7 +3,6 @@ base_model: unsloth/gpt-oss-20b-unsloth-bnb-4bit
3
  tags:
4
  - text-generation-inference
5
  - transformers
6
- - unsloth
7
  - gpt_oss
8
  - trl
9
  license: apache-2.0
@@ -11,12 +10,41 @@ language:
11
  - en
12
  ---
13
 
14
- # Uploaded model
 
15
 
16
- - **Developed by:** zary0
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/gpt-oss-20b-unsloth-bnb-4bit
19
 
20
- This gpt_oss model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  tags:
4
  - text-generation-inference
5
  - transformers
 
6
  - gpt_oss
7
  - trl
8
  license: apache-2.0
 
10
  - en
11
  ---
12
 
13
+ # Overview
14
+ gpt-oss-12b-4bit — Unsloth LoRA Adapter
15
 
16
+ # Training
 
 
17
 
18
+ Unsloth + QLoRA (4‑bit) + TRL GRPO (reinforcement learning)
19
 
20
+ # QuickStart
21
+
22
+ ```
23
+ messages = [
24
+ {"role": "system", "content": "reasoning language: French\n\nYou are a helpful assistant that can solve mathematical problems."},
25
+ {"role": "user", "content": "Solve x^5 + 3x^4 - 10 = 3."},
26
+ ]
27
+ inputs = tokenizer.apply_chat_template(
28
+ messages,
29
+ add_generation_prompt = True,
30
+ return_tensors = "pt",
31
+ return_dict = True,
32
+ reasoning_effort = "medium",
33
+ ).to(model.device)
34
+ from transformers import TextStreamer
35
+ _ = model.generate(**inputs, max_new_tokens = 2048, streamer = TextStreamer(tokenizer))
36
+ ```
37
+
38
+
39
+ # Acknowledgements
40
+
41
+ gpt‑oss authors and maintainers
42
+
43
+ Unsloth / PEFT / TRL / Transformers / Datasets communities
44
+
45
+
46
+ # Contact
47
+
48
+ Author: Ryota Ozawa (zawatti)
49
+
50
+ X (Twitter): zawattizawawa