This is the bf16 safetensors variant

What it is

DistilGPT-OSS-qwen3-4B is a Qwen3 4B-2507 thinking fine tune, it supports up to 262K tokens of input and output (aka total context) and can think for up to 65536 tokens when set to high reasoning effort. unlike the original qwen3, this model was fine-tuned on GPT-OSS reasoning outputs (unlike Deepseek r1 outputs which qwen3 was probably fine-tuned on for advanced reasoning). By fine-tuning on GPT-OSS outputs, the model was able to learn how to think efficiently, follow instructions better, and the new ability to think with a certain effort based on how much you want it to think.

⚠️This model is NOT as censored as the original GPT-OSS, we focused on performance rather than censorship. The model is still safety trained, it would just allow for more "creative" prompts, unlike GPT-OSS. We are not responsible for what the model generates.

Keep in mind, this is a community project and we are NOT related to qwen by Alibaba nor GPT-OSS by OpenAi.

Use cases & benefits

Benefits of using this model over standard qwen3 4b thinking:

You decide how much it would think (low, medium, high)
completely different style of answers (more similar to ChatGPT)
Produces less emoji (qwen3 4b uses quite a lot in its responses which some may not like)
Less censored/limiting than qwen3 4b

DistilGPT-OSS-qwen3-4B should be used for the following:

Local on device efficient assistance.
Code generation.
Math generation.
Summary generation.
General day to day use.

Or anything else

❌⚠️ It should ABSOLUTELY not be used for:

Anything law related due to hallucinations.
Medical questions.
Anything high risk which requires 1:1 accuracy.

It is a small model thus general knowledge is limited to its size.

Format

This is the chat format of this model (you can also check the Jinja template file in "Files and versions"):

<|im_start|>system
You are a helpful assistant


Reasoning effort: low<|im_end|>
<|im_start|>user
Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,<|im_end|>
<|im_start|>assistant
<think>
Just continue: 13,21,34,...
</think>
The Fibonacci sequence continues by adding the two preceding numbers.  
After **8** the next term is **13** (1 + 8 = 9 – 1 = 8 + 5 = 13), then **21** (8 + 13 = 21), followed by **34** (13 + 21 = 34), and so on.

So the sequence starts:  

**1, 1, 2, 3, 5, 8, 13, 21, 34, …**<|im_end|>

As you can see, you set the reasoning effort via the system prompt. We recommend going 2 lines down and only then putting "Reasoning effort: [low,medium,high]. For your information that output was generated by our model.

Examples

"Is a banana an animal?" Reasoning was set to high.
"Make an HTML website about yourself" Reasoning was set to medium.
"translate this to chinese: Hello! I am ChatGPT. A large language model by OpenAi." Reasoning was set to low.

As you can see, based on the reasoning effort of the model and your prompt, the model would think for a different amount of time. Keep in mind, these tests were done in LM Studio, GGUF q8_0 on a single consumer card (rtx 3080) where we got 95 - 80 Tokens/Second on 8192 context.

How it was done

We first started with some public datasets, removed almost all "I am sorry but.." for the dataset, filtered and skipped the first 25k samples, then mixed in outputs from the big 120B GPT OSS when we saw that the model was not as good at certain things. After doing that we formatted it into the proper qwen3 format, and did a few test runs using different optimizers, configurations etc. Keep in mind, we trained on about 15K samples, with each sample having 3 turns (the entire dataset was multi turn), the ademamix optimizer was chosen.

We did a few test runs to see if it even learns anything, what it learns etc. We had runs where it was very censored, runs where it looped, and this one was the best. In addition, we added some outputs generated using the 120B gpt oss by us, to improve performance. The simplest way to explain the perforomance is like this:

Imagine the biggest GPT OSS (120B) is like GPT 5.
The official smallest GPT OSS (20B) is like GPT 5 mini.
And this one is like GPT-5 Nano.

Obviously, no, these models do not compare to closed-sourced OpenAI models, but this comparison is just to explain it simply. This is how these models should be used, the biggest GPT OSS for the hard complicated tasks, the smaller 20B for average tasks and our "Open weights GPT 5 nano equivalent" for easier day-day tasks. (as a reminder it does NOT have the same performance as GPT 5 nano. Not even close to it.)