Latest updates?

#10

by Dampfinchen - opened May 2

Discussion

Dampfinchen

May 2

Hello, I've noticed you have updated the model quite a few times, but I'm not sure why.

The first time was 1 day ago, and the latest update happened two hours ago.

What were the reasons for these updates? Perhap an update history would be useful here. Thanks for your great work!

Mushoz

May 2

I would be very curious to know as well. Do I need to redownload the model?

shimmyshimmer

Unsloth AI org May 2

•

edited May 2

No update is required. The weights just changed ever so slightly for miniscule improvements in accuracy. So you can if you want.

We basically improved how we calculated the imatrix - but it was previously accounted for already but it just changed slightly

CC: @Dampfinchen @Mushoz @fakezeta @RachidAR

YearZero

May 2

•

edited May 2

And here I am religiously updating my llamacpp and the 30B from unsloth almost every morning now. I share your OCD with squeezing out whatever performance improvements possible, because why not?

I wish model providers would do more frequent updates as well. Every release has its weaknesses and areas for improvement, which a 3.1 or 3.2 could help mitigate before the next big version is trained! I think Llama3 did this well last year culminating in Llama 3.3 70b which was a chef's kiss.

But anyway, thanks for being slightly obsessed with optimization guys, your new dynamic quants 2.0 are just what the community needed after having no updates since the "K_M/K_S" quants came online.

shimmyshimmer

Unsloth AI org May 2

And here I am religiously updating my llamacpp and the 30B from unsloth almost every morning now. I share your OCD with squeezing out whatever performance improvements possible, because why not?

I wish model providers would do more frequent updates as well. Every release has its weaknesses and areas for improvement, which a 3.1 or 3.2 could help mitigate before the next big version is trained! I think Llama3 did this well last year culminating in Llama 3.3 70b which was a chef's kiss.

But anyway, thanks for being slightly obsessed with optimization guys, your new dynamic quants 2.0 are just what the community needed after having no updates since the "K_M/K_S" quants came online.

Thank you for the praise we really appreciate it! 🤗

YearZero

May 9

•

edited May 9

@shimmyshimmer
Hey just bumping this one since you uploaded the GGUF's again in the last 12 hours ish and I dutifully downloaded them all and updated my llamacpp. Any reason I should re-test the models (I have a bunch of private evaluations I run) with the new GGUF's or is it a minor update? Just curious what's changed :)

Also just fyi llama.cpp build b5328 (the one that introduces flash attention for deepseek for Ampere for RTX 3000 series and above) broke flash-attention on my mobile RTX 2070, so hopefully they fix that soon! All models output gibberish for me from that build onwards with flash-attention enabled.

YearZero

May 9

There is a PR to fix flash attention on Turing cards - https://github.com/ggml-org/llama.cpp/pull/13415

Should be fixed when merged to main!

sidran

May 10

@shimmyshimmer I also noticed an update some 13 hours ago. Can you guys give some feedback no matter how dry and minor?

shimmyshimmer

Unsloth AI org May 10

@YearZero @sidran

New highly improved calibration dataset + Q5, Q6 etc XL quants

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment