Broken results
#2
by
RamoreRemora
- opened
Tried Q4_K_L on llama.cpp b5133 server and it's unusable due to extreme repetition issues. Something seems clearly broken.
I can confirm that I'm experiencing this as well on the IQ4_XS quant. I tried the Q_4_K_M quant it was broken as well.
This has been opened as an issue for llama.cpp on github:
Shoot.. hopefully is a small fix, ideally not the quant! But if it is I'll remake it promptly!
Just as a note, see https://www.reddit.com/r/LocalLLaMA/comments/1jzn9wj/comment/mn7iv7f
By using these arguments: --flash-attn -ctk q4_0 -ctv q4_0 --ctx-size 16384 --override-kv tokenizer.ggml.eos_token_id=int:151336 --override-kv glm4.rope.dimension_count=int:64 --jinja
I was able to make the IQ4_XS quant work well for me on the lastest build of llama.cpp