GLM-4.5-Air-GGUF
This repository contains several custom GGUF quantizations of GLM-4.5-Air, to be used with llama.cpp:
Filename | Size (GiB) | Average BPW | Direct link |
---|---|---|---|
GLM-4.5-Air-Q8_0-FFN-IQ3_S-IQ3_S-Q5_0.gguf | 57.43 | 4.47 | Download |
GLM-4.5-Air-Q8_0-FFN-IQ4_XS-IQ4_XS-Q5_0.gguf | 63.86 | 4.97 | Download |
GLM-4.5-Air-Q8_0-FFN-Q4_K-Q4_K-Q5_1.gguf | 67.82 | 5.27 | Download |
GLM-4.5-Air-Q8_0-FFN-Q4_K-Q4_K-Q8_0.gguf | 77.71 | 6.04 | Download |
GLM-4.5-Air-Q8_0-FFN-Q5_K-Q5_K-Q8_0.gguf | 85.63 | 6.66 | Download |
GLM-4.5-Air-Q8_0-FFN-Q6_K-Q6_K-Q8_0.gguf | 94.04 | 7.31 | Download |
GLM-4.5-Air-Q8_0.gguf | 109.39 | 8.50 | Download |
GLM-4.5-Air-bf16.gguf | 205.81 | 16.00 | Download |
These quantizations use Q8_0 for all tensors by default - only the dense FFN block and conditional experts are downgraded. The shared expert is always kept in Q8_0.
- Downloads last month
- 16,120
Hardware compatibility
Log In
to view the estimation
5-bit
8-bit
16-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for ddh0/GLM-4.5-Air-GGUF
Base model
zai-org/GLM-4.5-Air