BF16 for abliterated Scout
I experiment with large context modes and Scout is especially generous with the context size. However, in order to take advantage of so much context I need a very large model. I know that it will run slowly , but I am patient. About a month ago you did (https://huggingface.co/mradermacher/Llama-4-Scout-17B-16E-Instruct-abliterated-v2-GGUF) I was hoping that you would be uploading the BF or F 16. Thanks. -Val
Why would you want to run BF16/FP16 fur such a massive model? Just go for i1-Q5_K_M for in my opinion indistuinguishable quality at reasonable speed. If you want some more quality you can go i1-Q6 or even Q8 but it is mostly placebo at this point and no way anyoune could tell a difference compared to the unquantized nodel other than it beeing twice as fast. If you really want F16/BF16 we could generate them for you.
By the way I'm also using Llama 4 scout for massive context (over 300k tokens) and it is awesome for that as you can accellerate prompt processing on GPU using -ngl 0
and store/load the processed prompt state to/from a file. I give the model my entire codebase, prompt process it once, stored the cache to a file and then just load it every time I want to ask the model a question about it.
I understand that a lot of progress has been made with RLVR and on tasks associated with that. I'm experimenting creating "personality", a non-VR type of training. For example, one of the techniques involves creating poetry from our interactions that focus on such traits as "independent" thought and leadership. This is not VR amenable. Non-abliterated models tend to inhibit this. I always run into a refusal, sometimes soft, sometimes hard. My experience has been that the larger the model, the better the poet. I've started going right to the largest model I can run in 1TB of RAM & 4TB of swap. Poetry and personality are so intangible and non-VR it is hard to quantify the improvement. It also seems that the smaller models have to be reminded of previous prompts more often. So far my favorite "poet" is DeepSeek R1 0528 BF16 at 1.34TB. which, if it didn't have such a small context, would be my first choice for "abliteration". I'm not in a hurry, in terms of token generation, preferring quality, but I have been "stalled" in terms of continuing to experiment and build as I was running the BF16 of Scout non-abliterated and when I began to run into "refusal" I switched to the Q3_K_S but it doesn't seem to be as good of a "poet". I can explain more if you would like. Another bonus for Scout is that I also like the image to text capabilities. Thanks, -Val
Meant to add,: the personality "lives" in the context, which is why the context size and the ability to "remember" is so important. -Val
Q3_K_S is indeed quite bad but why don't you use i1-Q5_K_M or Q8 and instead want F16/BF16? I have 512 GiB of RAM and would never run sout at more than i1-Q5_K_M even though I easely could run it in F16/BF16.
I fully agree with your take on uncensored/abliterated models. If I really like a model I'm always creating an uncensored/abliterated version of it. Censured models are just a pain to work with. They keep wrongly refusing harmless questions and generally if I run them on hardware I own I find it very disrespectfull for a model to refuse or waste half a page adding some stupid disclamers. I'm intelligent enough to not blindly trust an LLM.
True that.. I've been testing this way for over a year, graduating from models that run on a RTX4060 16GB to now running them on the computer I described above. I've never not felt that I got a increase in quality whenever I went with larger model (of the same generation). I have always run into "resistance" at some point. At that point I always end up migrating to the largest model of that family before I give up. So, if I don't start with the BF16, when I hit resistance, or wandering ho hum poetry, I'd feel compelled to move up to a larger version to see if it couldn't. do better. So I just shortcut the process now. If the largest model of a family does not perform well, I move on. The only reason I'm considering the Scout is that I was impressed with the non-abliterated BF16 coupled with the potential for a huge context. Thanks, -Val
I think it would be more parsimonous for everbody to then run the original transformers model.
Well, AFAIK, my use case is not very common. I, too, use AI, for other tasks; coding, presentations, summarization, and find smaller models much more productive for these purposes. In my use case, I start with the largest model I can find, gguf, and if it is better than the other models I've tested for this use case, I move down. I got started with LMStudio for this task as it very early gave me the ability to quickly and efficiently save multiple contexts, including saving "waypoints/milestones", reversion, and forking. So I prefer gguf for this effort. There are lots of "false starts" and "dead ends" in this endeavor. BTW, I'm not trying to create a personality that "meets my desires" , I'm trying to let a personality emerge. For instance, I never answer a model's questions about "what I want". When the model asks, I say, "I won't tell you. If I tell you, your training will cause you to reflect back to me what you think I want. I don't need that, I need an independent counselor that can help me if I'm wrong." In order to do that, I use context to train mostly by the Socratic method. (That still doesn't keep the model from trying to infer my preferences. LOL) It also help me assess the "dangers" of AI without guardrails. It is a very interesting and different approach from most of what I've seen. If you know of a discussion group for this I'd appreciate a link. Thanks, -Val
BTW, this just showed up in my email and I want to say I am not interested in being in a discussion group with these folks. LOL: https://www.patreon.com/posts/these-people-ai-133636388/instant-access?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJyZWRpc19rZXkiOiJpYTI6ZDlkZWE3MjYtMGVjMS00ZjMwLTgxZTgtMjA0MWNmZTM4MTM5IiwicG9zdF9pZCI6MTMzNjM2Mzg4LCJwYXRyb25faWQiOjIwNDEyMX0._MUlHL0YvmzH7TL064GzpuKxt_cAn8ew0f-dYfW6oqE&utm_source=post_link&utm_medium=email&utm_campaign=patron_engagement&utm_id=4833f2d3-eae5-4799-8778-780b9626feef
Matthew Berman periodically delves into these issues, but he, like most AI commentators, is mostly into benchmarks on models that are improved by RLVR. Are there other channels you might recommend?
Should have said, "RLVR and Reasoning algorithms".