Qwen3-4B-Esper3-F32-GGUF

Esper 3 is a specialist model built on Qwen 3, designed for coding, architecture, and DevOps reasoning. It has been fine-tuned using our proprietary DevOps, architecture, and code reasoning dataset generated with Deepseek R1. This tuning enhances its general and creative reasoning abilities, making it effective not only in problem-solving but also in general conversation. With its small model sizes, Esper 3 is optimized for fast inference, making it suitable for deployment on local desktops, mobile devices, and high-speed server environments.

Model Files

Filename Size Format Description
Qwen3-4B-Esper3.BF16.gguf 8.05 GB BF16 Brain Float 16-bit quantization
Qwen3-4B-Esper3.F16.gguf 8.05 GB F16 Half precision (16-bit) floating point
Qwen3-4B-Esper3.F32.gguf 16.1 GB F32 Full precision (32-bit) floating point
Qwen3-4B-Esper3.Q2_K.gguf 1.67 GB Q2_K 2-bit quantization with K-quant
Qwen3-4B-Esper3.Q3_K_L.gguf 2.24 GB Q3_K_L 3-bit quantization (Large) with K-quant
Qwen3-4B-Esper3.Q3_K_M.gguf 2.08 GB Q3_K_M 3-bit quantization (Medium) with K-quant
Qwen3-4B-Esper3.Q3_K_S.gguf 1.89 GB Q3_K_S 3-bit quantization (Small) with K-quant
Qwen3-4B-Esper3.Q4_K_M.gguf 2.5 GB Q4_K_M 4-bit quantization (Medium) with K-quant
Qwen3-4B-Esper3.Q4_K_S.gguf 2.38 GB Q4_K_S 4-bit quantization (Small) with K-quant
Qwen3-4B-Esper3.Q5_K_M.gguf 2.89 GB Q5_K_M 5-bit quantization (Medium) with K-quant
Qwen3-4B-Esper3.Q5_K_S.gguf 2.82 GB Q5_K_S 5-bit quantization (Small) with K-quant
Qwen3-4B-Esper3.Q6_K.gguf 3.31 GB Q6_K 6-bit quantization with K-quant
Qwen3-4B-Esper3.Q8_0.gguf 4.28 GB Q8_0 8-bit quantization

Recommended Usage

  • Q4_K_M or Q5_K_M: Best balance of quality and performance for most users
  • Q6_K or Q8_0: Higher quality, larger file sizes
  • Q2_K or Q3_K_S: Fastest inference, lower quality
  • F16 or BF16: High quality, requires more VRAM
  • F32: Highest quality, requires significant VRAM

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png

Downloads last month
225
GGUF
Model size
4.02B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for prithivMLmods/Qwen3-4B-Esper3-F32-GGUF

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Quantized
(10)
this model