ddh0/GLM-4.5-3.34bpw.gguf

This repository contains a custom 3.34bpw GGUF quantization of GLM-4.5, to be used with llama.cpp.

IMATRIX=~/imatrices/zai-org_GLM-4.5-imatrix.gguf
TYPE_EMBD=Q8_0
TYPE_SHEXP=Q8_0
TYPE_FFN_GATE=IQ4_XS
TYPE_FFN_UP=IQ4_XS
TYPE_FFN_DOWN=IQ4_XS
TYPE_FFN_GATE_EXPS=IQ3_XXS
TYPE_FFN_UP_EXPS=IQ3_XXS
TYPE_FFN_DOWN_EXPS=IQ3_XXS
TYPE_ATTN_K=Q8_0
TYPE_ATTN_Q=Q8_0
TYPE_ATTN_V=Q8_0
TYPE_ATTN_O=Q8_0
TYPE_OUTPUT=Q8_0
TYPE_DEFAULT=Q8_0
SRC_GGUF=~/gguf/GLM-4.5-bf16.gguf
DST_GGUF=~/gguf/GLM-4.5-3.34bpw.gguf

llama-quantize \
--token-embedding-type $TYPE_EMBD \
--tensor-type ffn_gate=$TYPE_FFN_GATE \
--tensor-type ffn_up=$TYPE_FFN_UP \
--tensor-type ffn_down=$TYPE_FFN_DOWN \
--tensor-type ffn_gate_shexp=$TYPE_SHEXP \
--tensor-type ffn_up_shexp=$TYPE_SHEXP \
--tensor-type ffn_down_shexp=$TYPE_SHEXP \
--tensor-type ffn_gate_exps=$TYPE_FFN_GATE_EXPS \
--tensor-type ffn_up_exps=$TYPE_FFN_UP_EXPS \
--tensor-type ffn_down_exps=$TYPE_FFN_DOWN_EXPS \
--tensor-type attn_k=$TYPE_ATTN_K \
--tensor-type attn_q=$TYPE_ATTN_Q \
--tensor-type attn_v=$TYPE_ATTN_V \
--tensor-type attn_output=$TYPE_ATTN_O \
--output-tensor-type $TYPE_OUTPUT \
$SRC_GGUF $DST_GGUF $TYPE_DEFAULT $(nproc)

Downloads last month: 724

GGUF

Model size

358B params

Architecture

glm4moe

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ddh0/GLM-4.5-3.34bpw.gguf

Base model

zai-org/GLM-4.5

Quantized

(29)

this model