ddh0/GLM-4.5-3.34bpw.gguf
This repository contains a custom 3.34bpw GGUF quantization of GLM-4.5, to be used with llama.cpp.
IMATRIX=~/imatrices/zai-org_GLM-4.5-imatrix.gguf
TYPE_EMBD=Q8_0
TYPE_SHEXP=Q8_0
TYPE_FFN_GATE=IQ4_XS
TYPE_FFN_UP=IQ4_XS
TYPE_FFN_DOWN=IQ4_XS
TYPE_FFN_GATE_EXPS=IQ3_XXS
TYPE_FFN_UP_EXPS=IQ3_XXS
TYPE_FFN_DOWN_EXPS=IQ3_XXS
TYPE_ATTN_K=Q8_0
TYPE_ATTN_Q=Q8_0
TYPE_ATTN_V=Q8_0
TYPE_ATTN_O=Q8_0
TYPE_OUTPUT=Q8_0
TYPE_DEFAULT=Q8_0
SRC_GGUF=~/gguf/GLM-4.5-bf16.gguf
DST_GGUF=~/gguf/GLM-4.5-3.34bpw.gguf
llama-quantize \
--token-embedding-type $TYPE_EMBD \
--tensor-type ffn_gate=$TYPE_FFN_GATE \
--tensor-type ffn_up=$TYPE_FFN_UP \
--tensor-type ffn_down=$TYPE_FFN_DOWN \
--tensor-type ffn_gate_shexp=$TYPE_SHEXP \
--tensor-type ffn_up_shexp=$TYPE_SHEXP \
--tensor-type ffn_down_shexp=$TYPE_SHEXP \
--tensor-type ffn_gate_exps=$TYPE_FFN_GATE_EXPS \
--tensor-type ffn_up_exps=$TYPE_FFN_UP_EXPS \
--tensor-type ffn_down_exps=$TYPE_FFN_DOWN_EXPS \
--tensor-type attn_k=$TYPE_ATTN_K \
--tensor-type attn_q=$TYPE_ATTN_Q \
--tensor-type attn_v=$TYPE_ATTN_V \
--tensor-type attn_output=$TYPE_ATTN_O \
--output-tensor-type $TYPE_OUTPUT \
$SRC_GGUF $DST_GGUF $TYPE_DEFAULT $(nproc)
- Downloads last month
- 724
Hardware compatibility
Log In
to view the estimation
We're not able to determine the quantization variants.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for ddh0/GLM-4.5-3.34bpw.gguf
Base model
zai-org/GLM-4.5