Model Output Confusion

#1
by anrgct - opened

Hello, I downloaded iq3m and iq2m, but the model can only output normally in the beginning; the rest is all jumbled.

I confirm that after a while it start repeating sentences. I will create some higher quants.

Hello, I downloaded iq3m and iq2m, but the model can only output normally in the beginning; the rest is all jumbled.

Adding this --override-kv glm4.rope.dimension_count=int:64 seems to resolve the repeating issue. example ./llama.cpp/llama-cli -m GLM-Z1-9B-0414-iq3_m.gguf -cnv -ub 63 -b 63 -t 4 --override-kv glm4.rope.dimension_count=int:64 . taken from https://github.com/ggml-org/llama.cpp/issues/12946 ... I will look into wether adding this to the gguf during conversion will fix it without needing the extra option

The new gguf model is performing well, and the iq2m accuracy is also quite good! Thank you!

anrgct changed discussion status to closed

The new gguf model is performing well, and the iq2m accuracy is also quite good! Thank you!

Thank you for bringing this issue to my attention.

Sign up or log in to comment