Mungert/GLM-Z1-9B-0414-GGUF · Model Output Confusion

anrgct

28 days ago

Hello, I downloaded iq3m and iq2m, but the model can only output normally in the beginning; the rest is all jumbled.

Mungert

Owner 28 days ago

I confirm that after a while it start repeating sentences. I will create some higher quants.

Mungert

Owner 27 days ago

Hello, I downloaded iq3m and iq2m, but the model can only output normally in the beginning; the rest is all jumbled.

Adding this --override-kv glm4.rope.dimension_count=int:64 seems to resolve the repeating issue. example ./llama.cpp/llama-cli -m GLM-Z1-9B-0414-iq3_m.gguf -cnv -ub 63 -b 63 -t 4 --override-kv glm4.rope.dimension_count=int:64 . taken from https://github.com/ggml-org/llama.cpp/issues/12946 ... I will look into wether adding this to the gguf during conversion will fix it without needing the extra option

anrgct

27 days ago

The new gguf model is performing well, and the iq2m accuracy is also quite good! Thank you!

anrgct changed discussion status to closed 27 days ago

Mungert

Owner 26 days ago

The new gguf model is performing well, and the iq2m accuracy is also quite good! Thank you!

Thank you for bringing this issue to my attention.