zary0
/

gpt-oss-20b-4bit-grpo

text-generation-inference

Model card Files Files and versions

zary0 commited on Aug 13

Commit

735c4b8

·

verified ·

1 Parent(s): cb1c5ef

Update README

Files changed (1) hide show

README.md +35 -7

README.md CHANGED Viewed

@@ -3,7 +3,6 @@ base_model: unsloth/gpt-oss-20b-unsloth-bnb-4bit
 tags:
 - text-generation-inference
 - transformers
-- unsloth
 - gpt_oss
 - trl
 license: apache-2.0
@@ -11,12 +10,41 @@ language:
 - en
 ---
-# Uploaded  model
-- **Developed by:** zary0
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/gpt-oss-20b-unsloth-bnb-4bit
-This gpt_oss model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 tags:
 - text-generation-inference
 - transformers
 - gpt_oss
 - trl
 license: apache-2.0
 - en
 ---
+# Overview
+gpt-oss-12b-4bit — Unsloth LoRA Adapter
+# Training
+Unsloth + QLoRA (4‑bit) + TRL GRPO (reinforcement learning)
+# QuickStart
+```
+messages = [
+    {"role": "system", "content": "reasoning language: French\n\nYou are a helpful assistant that can solve mathematical problems."},
+    {"role": "user", "content": "Solve x^5 + 3x^4 - 10 = 3."},
+]
+inputs = tokenizer.apply_chat_template(
+    messages,
+    add_generation_prompt = True,
+    return_tensors = "pt",
+    return_dict = True,
+    reasoning_effort = "medium",
+).to(model.device)
+from transformers import TextStreamer
+_ = model.generate(**inputs, max_new_tokens = 2048, streamer = TextStreamer(tokenizer))
+```
+# Acknowledgements
+gpt‑oss authors and maintainers
+Unsloth / PEFT / TRL / Transformers / Datasets communities
+# Contact
+Author: Ryota Ozawa (zawatti)
+X (Twitter): zawattizawawa