unsloth/Qwen3-235B-A22B-128K-GGUF · UD-Q3_K_XL sometimes gives gibberish when using it via API (SillyTavern). UD-Q4_K

Hi there, thanks for always doing these greats quants!

I was testing some small conversations via Sillytavern, but, sometimes, I get infinite GGGGGGGGGGGGGGGG or infinite "Blocky Blocky Blocky". Then, after getting the gibberish there, it maints it to the internal API server.

This is with the UD-Q3_K_XL loaded fully on VRAM (128GB total VRAM)

The model works fine on all cases with UD-Q4_K_XL, offloading about ~20GB to CPU.

What could be the reason? I did compile from source on latest commit with

cmake -B build -DGGML_CUDA=ON -DGGML_CUDA_FA_ALL_QUANTS=ON -DGGML_BLAS=OFF -DCMAKE_CUDA_ARCHITECTURES="86;89;120"

unsloth
/

Qwen3-235B-A22B-128K-GGUF

UD-Q3_K_XL sometimes gives gibberish when using it via API (SillyTavern). UD-Q4_K_XL works fine.