EXL2 Quants

by cactopus - opened 13 days ago

Discussion

cactopus

13 days ago

Awesome! Can't wait to try them out, EXL2 quants when? O(∩_∩)O

sleepdeprived3

Ready.Art org 12 days ago

I'll ask my friend about making them. My GPU is booked right now.

MetaphoricalCode

4 days ago

•

edited 4 days ago

Greetings! I see that the exl2 quants are still not up. Are they on the way? If not, I may quantize the model myself. Would that be okay to upload if I do?
What environment do you use to quantize? Would default 0.2.9 be okay? What about the calibration dataset, do you use any other than default?

sleepdeprived3

Ready.Art org 4 days ago

A lot of the volunteers doing EXL2 quants stopped because EXL3 is emerging. You're more than welcome to pick up the slack. Default is fine. At 8 bits per head I usually do 8bpw for Runpod, 5.5bpw, and 5bpw for 24GB, 4bpw and 3.5bpw for 16GB, 2.5bpw for 12GB

MetaphoricalCode

4 days ago

A lot of the volunteers doing EXL2 quants stopped because EXL3 is emerging. You're more than welcome to pick up the slack. Default is fine. At 8 bits per head I usually do 8bpw for Runpod, 5.5bpw, and 5bpw for 24GB, 4bpw and 3.5bpw for 16GB, 2.5bpw for 12GB

I'm also excited for Exllamav3, of course, but for most of users v2 version is still the way to go, including me. I'll quantize the model, then. Expect 4, 4.5, 5, 5.5, 6, 8 quants. Should I create my own repos or commit to yours?

sleepdeprived3

Ready.Art org 4 days ago

I'm personally okay with it either way. Would have to ask FrenzyBiscuit but unless he complains I don't think it should be a problem.

sleepdeprived3

Ready.Art org 4 days ago

If you've got the spare compute this new one really needs quants too https://huggingface.co/ReadyArt/Broken-Tutu-24B

MetaphoricalCode

4 days ago

I'm personally okay with it either way. Would have to ask FrenzyBiscuit but unless he complains I don't think it should be a problem.

Alright, I'll start quantizing then. Meanwhile you sort out how should it be done, there's no hurry

If you've got the spare compute this new one really needs quants too https://huggingface.co/ReadyArt/Broken-Tutu-24B

Apparently, someone has already quantized it: https://huggingface.co/models?search=Broken-Tutu-24b-exl2
Though I'll be glad to help quantizing other models to exl2, if needed. Got specs enough to quantize models up to 24b (perhaps more, haven't really tested it yet)

sleepdeprived3

Ready.Art org 4 days ago

Awesome I guess I asked for quants that weren't needed before my coffee lol. I'll make a collection for those.
Yeah I pinged frenzy about it. Why don't you come hang out with us in the Discord thread. https://discord.com/channels/1238219753324281886/1332443910559105146

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment