Model is over replying to user request

#1
by Narutoouz - opened

I tested q8 gguf quants of this model. It is replying random things for a simple hi message. I thought the issue was with quantisationm but I tried another q8 quant of same model, it also shows same behaviour. It is not issue with llama.cpp , because 8bit mlx model also showed same behaviour. Here , I am showing the supporting images.
Screenshot 2025-07-09 at 10.18.33 AM.png

Screenshot 2025-07-09 at 10.38.36 AM.png

There is similar issue with 32b model also. I don't if it is the jinja template or what is causing the issue?
Screenshot 2025-07-09 at 10.47.32 AM.png

Narutoouz changed discussion status to closed

Sign up or log in to comment