Latest updates?

#10
by Dampfinchen - opened

Hello, I've noticed you have updated the model quite a few times, but I'm not sure why.

The first time was 1 day ago, and the latest update happened two hours ago.

What were the reasons for these updates? Perhap an update history would be useful here. Thanks for your great work!

I would be very curious to know as well. Do I need to redownload the model?

No update is required. The weights just changed ever so slightly for miniscule improvements in accuracy. So you can if you want.

We basically improved how we calculated the imatrix - but it was previously accounted for already but it just changed slightly

CC: @Dampfinchen @Mushoz @fakezeta @RachidAR

And here I am religiously updating my llamacpp and the 30B from unsloth almost every morning now. I share your OCD with squeezing out whatever performance improvements possible, because why not?

I wish model providers would do more frequent updates as well. Every release has its weaknesses and areas for improvement, which a 3.1 or 3.2 could help mitigate before the next big version is trained! I think Llama3 did this well last year culminating in Llama 3.3 70b which was a chef's kiss.

But anyway, thanks for being slightly obsessed with optimization guys, your new dynamic quants 2.0 are just what the community needed after having no updates since the "K_M/K_S" quants came online.

Unsloth AI org

And here I am religiously updating my llamacpp and the 30B from unsloth almost every morning now. I share your OCD with squeezing out whatever performance improvements possible, because why not?

I wish model providers would do more frequent updates as well. Every release has its weaknesses and areas for improvement, which a 3.1 or 3.2 could help mitigate before the next big version is trained! I think Llama3 did this well last year culminating in Llama 3.3 70b which was a chef's kiss.

But anyway, thanks for being slightly obsessed with optimization guys, your new dynamic quants 2.0 are just what the community needed after having no updates since the "K_M/K_S" quants came online.

Thank you for the praise we really appreciate it! 🤗

@shimmyshimmer
Hey just bumping this one since you uploaded the GGUF's again in the last 12 hours ish and I dutifully downloaded them all and updated my llamacpp. Any reason I should re-test the models (I have a bunch of private evaluations I run) with the new GGUF's or is it a minor update? Just curious what's changed :)

Also just fyi llama.cpp build b5328 (the one that introduces flash attention for deepseek for Ampere for RTX 3000 series and above) broke flash-attention on my mobile RTX 2070, so hopefully they fix that soon! All models output gibberish for me from that build onwards with flash-attention enabled.

There is a PR to fix flash attention on Turing cards - https://github.com/ggml-org/llama.cpp/pull/13415

Should be fixed when merged to main!

@shimmyshimmer I also noticed an update some 13 hours ago. Can you guys give some feedback no matter how dry and minor?

Unsloth AI org

@YearZero @sidran

New highly improved calibration dataset + Q5, Q6 etc XL quants

Sign up or log in to comment