EXL2 Quants

#1
by cactopus - opened

Awesome! Can't wait to try them out, EXL2 quants when? O(∩_∩)O

Ready.Art org

I'll ask my friend about making them. My GPU is booked right now.

Greetings! I see that the exl2 quants are still not up. Are they on the way? If not, I may quantize the model myself. Would that be okay to upload if I do?
What environment do you use to quantize? Would default 0.2.9 be okay? What about the calibration dataset, do you use any other than default?

Ready.Art org

A lot of the volunteers doing EXL2 quants stopped because EXL3 is emerging. You're more than welcome to pick up the slack. Default is fine. At 8 bits per head I usually do 8bpw for Runpod, 5.5bpw, and 5bpw for 24GB, 4bpw and 3.5bpw for 16GB, 2.5bpw for 12GB

A lot of the volunteers doing EXL2 quants stopped because EXL3 is emerging. You're more than welcome to pick up the slack. Default is fine. At 8 bits per head I usually do 8bpw for Runpod, 5.5bpw, and 5bpw for 24GB, 4bpw and 3.5bpw for 16GB, 2.5bpw for 12GB

I'm also excited for Exllamav3, of course, but for most of users v2 version is still the way to go, including me. I'll quantize the model, then. Expect 4, 4.5, 5, 5.5, 6, 8 quants. Should I create my own repos or commit to yours?

Ready.Art org

I'm personally okay with it either way. Would have to ask FrenzyBiscuit but unless he complains I don't think it should be a problem.

Ready.Art org

If you've got the spare compute this new one really needs quants too https://huggingface.co/ReadyArt/Broken-Tutu-24B

I'm personally okay with it either way. Would have to ask FrenzyBiscuit but unless he complains I don't think it should be a problem.

Alright, I'll start quantizing then. Meanwhile you sort out how should it be done, there's no hurry

If you've got the spare compute this new one really needs quants too https://huggingface.co/ReadyArt/Broken-Tutu-24B

Apparently, someone has already quantized it: https://huggingface.co/models?search=Broken-Tutu-24b-exl2
Though I'll be glad to help quantizing other models to exl2, if needed. Got specs enough to quantize models up to 24b (perhaps more, haven't really tested it yet)

Ready.Art org

Awesome I guess I asked for quants that weren't needed before my coffee lol. I'll make a collection for those.
Yeah I pinged frenzy about it. Why don't you come hang out with us in the Discord thread. https://discord.com/channels/1238219753324281886/1332443910559105146

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment