AdamLucek
/

Qwen2.5-3B-Instruct-GRPO-2K-GSM8K-GGUF

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

AdamLucek commited on 26 days ago

Commit

3950051

·

verified ·

1 Parent(s): 8a4d7de

Update README.md

Files changed (1) hide show

README.md +19 -5

README.md CHANGED Viewed

@@ -11,12 +11,26 @@ language:
 - en
 ---
-# Uploaded  model
-- **Developed by:** AdamLucek
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit
-This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 - en
 ---
+# AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K-GGUF
+Conversions of [AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K](https://huggingface.co/AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K) into `q8_0`, `q4_k_m` and `q5_k_m` GGUF formats. See original model card for additional details.
+This model is a GRPO fine-tuned version of [unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/Qwen2.5-3B-Instruct-bnb-4bit) on a subset of 2,000 examples from [openai/gsm8k](https://huggingface.co/datasets/openai/gsm8k) using [Unsloth](https://github.com/unslothai/unsloth).
+# Usage
+For best performance, use the below system prompt:
+```python
+SYSTEM_PROMPT = """
+Respond in the following format:
+<reasoning>
+...
+</reasoning>
+<answer>
+...
+</answer>
+"""
+```
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)