GLM-4.5-Air-GGUF

This repository contains several custom GGUF quantizations of GLM-4.5-Air, to be used with llama.cpp:

Filename	Size (GiB)	Average BPW	Direct link
GLM-4.5-Air-Q8_0-FFN-IQ3_S-IQ3_S-Q5_0.gguf	57.43	4.47	Download
GLM-4.5-Air-Q8_0-FFN-IQ4_XS-IQ4_XS-Q5_0.gguf	63.86	4.97	Download
GLM-4.5-Air-Q8_0-FFN-Q4_K-Q4_K-Q5_1.gguf	67.82	5.27	Download
GLM-4.5-Air-Q8_0-FFN-Q4_K-Q4_K-Q8_0.gguf	77.71	6.04	Download
GLM-4.5-Air-Q8_0-FFN-Q5_K-Q5_K-Q8_0.gguf	85.63	6.66	Download
GLM-4.5-Air-Q8_0-FFN-Q6_K-Q6_K-Q8_0.gguf	94.04	7.31	Download
GLM-4.5-Air-Q8_0.gguf	109.39	8.50	Download
GLM-4.5-Air-bf16.gguf	205.81	16.00	Download

These quantizations use Q8_0 for all tensors by default - only the dense FFN block and conditional experts are downgraded. The shared expert is always kept in Q8_0.

Downloads last month: 2,878

GGUF

Model size

110B params

Architecture

glm4moe

Hardware compatibility

5-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ddh0/GLM-4.5-Air-GGUF

Base model

zai-org/GLM-4.5-Air

Quantized

(53)

this model