Qwen3-4B-SafeRL-GGUF
Qwen3-4B-SafeRL is a safety-aligned version of the Qwen3-4B model, trained using Reinforcement Learning (RL) with a reward signal from Qwen3Guard-Gen to boost robustness against harmful or adversarial prompts. This safety alignment process optimizes the model with a hybrid reward function that simultaneously focuses on three objectives: maximizing safety (penalizing unsafe content as detected by Qwen3Guard-Gen-4B), maximizing helpfulness (rewarding genuinely helpful responses based on the WorldPM-Helpsteer2 model), and minimizing unnecessary refusals (penalizing unnecessary refusals according to Qwen3Guard-Gen-4B).
Model Files
File Name | Quant Type | File Size |
---|---|---|
Qwen3-4B-SafeRL.BF16.gguf | BF16 | 8.05 GB |
Qwen3-4B-SafeRL.F16.gguf | F16 | 8.05 GB |
Qwen3-4B-SafeRL.F32.gguf | F32 | 16.1 GB |
Qwen3-4B-SafeRL.Q2_K.gguf | Q2_K | 1.67 GB |
Qwen3-4B-SafeRL.Q3_K_L.gguf | Q3_K_L | 2.24 GB |
Qwen3-4B-SafeRL.Q3_K_M.gguf | Q3_K_M | 2.08 GB |
Qwen3-4B-SafeRL.Q3_K_S.gguf | Q3_K_S | 1.89 GB |
Qwen3-4B-SafeRL.Q4_K_M.gguf | Q4_K_M | 2.5 GB |
Qwen3-4B-SafeRL.Q4_K_S.gguf | Q4_K_S | 2.38 GB |
Qwen3-4B-SafeRL.Q5_K_M.gguf | Q5_K_M | 2.89 GB |
Qwen3-4B-SafeRL.Q5_K_S.gguf | Q5_K_S | 2.82 GB |
Qwen3-4B-SafeRL.Q6_K.gguf | Q6_K | 3.31 GB |
Qwen3-4B-SafeRL.Q8_0.gguf | Q8_0 | 4.28 GB |
Quants Usage
(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
- Downloads last month
- 137
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
32-bit