Q8_0 vs fp16

by AIGUYCONTENT - opened

The Q8_0 quant is half the size of the fp16. I'm on an M4 Macbook Pro 48GB RAM. I'm pretty sure I can run fp16 but it doesn't leave much room for context. Is there much to lose by downloading the Q8_0 instead of the fp16?

Mradermacher told me a few months ago that the difference between Q8 to Q5_M is negligible and there is more to gain (by using the lower quant) from the increased context on limited RAM setups---or at least that's what I recall him saying. I could have misunderstood.


Yeah personally I wouldn't bother with fp16, I include it only for people who are obsessed with max size, Q8_0 in practice gets you basically indistinguishable performance

Sign up or log in to comment