Smoothie-Qwen3-4B-F32-GGUF

Smoothie Qwen is a lightweight adjustment tool that smooths token probabilities in Qwen and similar models, enhancing balanced multilingual generation capabilities. For more details, please refer to https://github.com/dnotitia/smoothie-qwen.

Model Files

Filename	Size	Format	Description
Smoothie-Qwen3-4B.BF16.gguf	8.05 GB	BF16	Brain Float 16-bit quantization
Smoothie-Qwen3-4B.F16.gguf	8.05 GB	F16	Half precision (16-bit) floating point
Smoothie-Qwen3-4B.F32.gguf	16.1 GB	F32	Full precision (32-bit) floating point
Smoothie-Qwen3-4B.Q2_K.gguf	1.67 GB	Q2_K	2-bit quantization with K-quant
Smoothie-Qwen3-4B.Q3_K_L.gguf	2.24 GB	Q3_K_L	3-bit quantization (Large) with K-quant
Smoothie-Qwen3-4B.Q3_K_M.gguf	2.08 GB	Q3_K_M	3-bit quantization (Medium) with K-quant
Smoothie-Qwen3-4B.Q3_K_S.gguf	1.89 GB	Q3_K_S	3-bit quantization (Small) with K-quant
Smoothie-Qwen3-4B.Q4_K_M.gguf	2.5 GB	Q4_K_M	4-bit quantization (Medium) with K-quant
Smoothie-Qwen3-4B.Q4_K_S.gguf	2.38 GB	Q4_K_S	4-bit quantization (Small) with K-quant
Smoothie-Qwen3-4B.Q5_K_M.gguf	2.89 GB	Q5_K_M	5-bit quantization (Medium) with K-quant
Smoothie-Qwen3-4B.Q5_K_S.gguf	2.82 GB	Q5_K_S	5-bit quantization (Small) with K-quant
Smoothie-Qwen3-4B.Q6_K.gguf	3.31 GB	Q6_K	6-bit quantization with K-quant
Smoothie-Qwen3-4B.Q8_0.gguf	4.28 GB	Q8_0	8-bit quantization

Recommended Usage

Q4_K_M or Q5_K_M: Best balance of quality and performance for most users
Q6_K or Q8_0: Higher quality, larger file sizes
Q2_K or Q3_K_S: Fastest inference, lower quality
F16 or BF16: High quality, requires more VRAM
F32: Highest quality, requires significant VRAM

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

prithivMLmods
/

Smoothie-Qwen3-4B-F32-GGUF

Smoothie-Qwen3-4B-F32-GGUF

Model Files

Recommended Usage

Quants Usage

Model tree for prithivMLmods/Smoothie-Qwen3-4B-F32-GGUF