GLM-4.5-Air-GGUF

This repository contains several custom GGUF quantizations of GLM-4.5-Air, to be used with llama.cpp:

Filename Size (GiB) Average BPW Direct link
GLM-4.5-Air-Q8_0-FFN-IQ3_S-IQ3_S-Q5_0.gguf 57.43 4.47 Download
GLM-4.5-Air-Q8_0-FFN-IQ4_XS-IQ4_XS-Q5_0.gguf 63.86 4.97 Download
GLM-4.5-Air-Q8_0-FFN-Q4_K-Q4_K-Q5_1.gguf 67.82 5.27 Download
GLM-4.5-Air-Q8_0-FFN-Q4_K-Q4_K-Q8_0.gguf 77.71 6.04 Download
GLM-4.5-Air-Q8_0-FFN-Q5_K-Q5_K-Q8_0.gguf 85.63 6.66 Download
GLM-4.5-Air-Q8_0-FFN-Q6_K-Q6_K-Q8_0.gguf 94.04 7.31 Download
GLM-4.5-Air-Q8_0.gguf 109.39 8.50 Download
GLM-4.5-Air-bf16.gguf 205.81 16.00 Download

These quantizations use Q8_0 for all tensors by default - only the dense FFN block and conditional experts are downgraded. The shared expert is always kept in Q8_0.

Downloads last month
16,120
GGUF
Model size
110B params
Architecture
glm4moe
Hardware compatibility
Log In to view the estimation

5-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ddh0/GLM-4.5-Air-GGUF

Quantized
(41)
this model