_TEMPORARY tag in name
Hey folks,
Thanks again for quantizing my models!
In order to avoid test models (this one is not lol, it's a normal release) to be quantized, I added a "_TEMPORARY" suffix to the models currently in testing.
Maybe you can add an exclusion to your script so they are not quantized.
Once vetted, these _TEMPORARY models will be versioned properly or deleted.
That's a very good idea, thanks, that will help focus the effort on the right models :) And yes, it's easy to modify the script if it gets so much that it needs it (we are still selecting all models manually, after filtering).
which is why I know you have been very productive recently :)
Oh yeah, I got curious about that stock merge technique that even a total noob can use, and the few first experiments I made in 3b decided me to attack the queen categories. :D
And well, it's going quite alright, especially with some help with quants! :)
Yeah, unfortunately (fortunately?), everybody seems to have found out about this and now hammers out 70b's and 123b's as if on the factory floor. And that is the size where I feel I should provide imatrix quants, which makes things four times more expensive. And my very influential team mate thinks that waiting a while to see if a model is well liked or not is for pussies.
Fascinating times.
Yeah, it's really a mergefest atm.
On my side, I'm trying to create "blocks" or "bricks", aka. good merges which can be used as lego pieces to improve any existing finetune just by incorporating the merges as secondary models in the stock merge.
I made the first Smarteaz V1 (3.3 llamas + nemotron) block on a lucky shot, and now I'm working on the 3.1 block (Hermes, Tess, and a few other abliterated models). Then, I use those two as the sidekicks of a main model with a particular coloration (Dobby, Eva, etc).
Ultimately, I merged 3 "empowered models" in one to blend the colors, while retaining all the smarts of the 2 blocks I added on each of them.
What I hypothesised/understood of the stock-merge is that 3 is both the minimal and the optimal number for a merge. More models, you dilute them in a soup, because you go from a "triangulation", which is sensical, to quandrangulations or more, which inevitably create massive losses.
Also, that mixing finetuned instruct models and finetuned base models was a way to go as well to obtain further perplexity drop (a major objective for me) while retaining most of the capabilities of the models.
The stock-merge is a jewel imho.
So I'm experimenting to get as close as I can of a base Llama 3.1 in terms of perplexity, while retaining the instruct capabilities and finetunes improvements/coloration. I'm going somewhere for now. I dropped perplexity compared to most of instruct finetunes by 10%, and raised the ARC level of "empowered models" by 5-10% compared to the good baseline of the 70b models, and this, with only abliterated models at the exception of the "main model" on which my blocks are merged. The coloration of the main model is retained also to a vast extend, and I'll optimistic about applying that "empowering" treatment to existing merges also, which are after all.. models.
Lego LLMs, wow, if that works, that will indeed be absolutely great. I can't wait to see what we will end up with :)