Broken results

#2
by RamoreRemora - opened

Tried Q4_K_L on llama.cpp b5133 server and it's unusable due to extreme repetition issues. Something seems clearly broken.

I can confirm that I'm experiencing this as well on the IQ4_XS quant. I tried the Q_4_K_M quant it was broken as well.

This has been opened as an issue for llama.cpp on github:

https://github.com/ggml-org/llama.cpp/issues/12946

Shoot.. hopefully is a small fix, ideally not the quant! But if it is I'll remake it promptly!

Just as a note, see https://www.reddit.com/r/LocalLLaMA/comments/1jzn9wj/comment/mn7iv7f

By using these arguments: --flash-attn -ctk q4_0 -ctv q4_0 --ctx-size 16384 --override-kv tokenizer.ggml.eos_token_id=int:151336 --override-kv glm4.rope.dimension_count=int:64 --jinja I was able to make the IQ4_XS quant work well for me on the lastest build of llama.cpp

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment