IQ4_XS vs Q4K_M/S weighted?

by Lockout - opened May 10

May 10

So are these Q4K quants like the old unweighted ones and thus worse than IQ4? Or are they like the new quants and red in the ppl graph you posted? Very few people seem to be making a distinction. Not sure what I should be downloading.

nicoboss

May 10

•

edited May 10

For some reason we don't have imatrix quants for this one. I assume they failed as they did for many Qwen3-235B-A22B models. I would just visit https://hf.tst.eu/model#Smoothie-Qwen3-235B-A22B-GGUF and download the best quality one that fits.

Lockout

May 10

We've got IQ4_XS.. isn't the point of "I" imatrix? On that page it says I may as well grab IQ4 over Q4KS though so I guess I'll change my download queue.

nicoboss

May 10

•

edited May 10

We've got IQ4_XS.. isn't the point of "I" imatrix? On that page it says I may as well grab IQ4 over Q4KS though so I guess I'll change my download queue.

No those are I-quants not imatrix quants. I-quants are treading higher inference compute cost for better quality per size while imatrix quants are using an importance matrix to keep important parts of the model at higher precision while unimportant parts at low precision which does only has benefits at no cost except imatrix computation which is a one-time cost at quantization time. The imatrix version of IQ4_XS would be i1-IQ4_XS. An I-quant can booth be a static or an imatrix quant. Maybe @mradermacher can check why this model failed imatrix computation but I don't think we keep such logs as llmc why Smoothie-Qwen3-235B-A22B doesn't show anything.

Lockout

May 10

Thanks, that clears it up. Probably balances out having more of the model offloaded due to the smaller size in this case. Have also seen imatrix quants without any i-1 label. Will now check more closely. Should list the dataset used when doing the HF GGUF preview now?

nicoboss

May 10

Only we label our imatrix quants using i1 because i1 is the dataset we use to compute the importance matrix. The only other person that creates high quality imatrix quants in my opinion is bartowski who only uploads imatrix quants. If you have the choice between bartowski and our quants I recommend going for our quants our dataset is double the size with the first half being bartowski's imatrix dataset.

mradermacher

Owner May 12

•

edited May 12

@Lockout if you want imatrix quants, I will provide the ones. Just say so explicitly :)

I haven't even tried imatrix quants for this because it's big (likely requires manual set-up) and they would likely fail. To me, qwen3 is a disaster, and I secretly wait for a fix, even though it's likely not coming since I am already waiting since Qwen2. As for the dataset, newer quants (like this one, or anything newer than half a year at least) record the name of the imatrix dataset (you can use the quant viewer on hf to see it for most quants). The name is simply the filename used, but I made sure every significant change in imatrix data is reflected in a different name.

And as for I in quant names. It is confusing everybody. I think they originally were meant to stand for imatrix, since that was their purpose, but indeed, they do not depend on an imatrix, never have. Maybe the I stands for something else (integer? but everything is an integer :)

@nicoboss we should have logs for failed imatrix quants for a while now (let me check - starting in feb 18 with Qwen2.5-32B-GrimReaper-Instruct). it shoudl be greppable with llmc why.

Lockout

May 18

The current quant at IQ4_XS works fine in text completion. I like it over the non smoothie version. Probably no more point in squeezing this grape.

Wouldn't call Q3 a disaster but it's not really ground breaking stuff either. Qwen are the kind of AI house who deem it fine their models say pewdiepie is a female dentist from tumblr.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment