Performance
I found a question which will make the model fail about 30-50% of time, while original gemma 3 27b seems to answer correctly 100% of time. (just press regenerate 10 times)
"I have 2 apples, then i buy 2 more. I bake a pie with 2 of the apples. After eating half of the pie, how many apples do i have left?"
Correct answer is "2", but it sometimes answers "1" or "3".
So, is this because of abliteration?
Yes, this is likely due to abliteration. I was pretty heavy-handed with the refusal weight here, so that might have broken some capabilities.
@urtuuuu You're right, I just tested your exact prompt with the Q8 quantized version with 32K KV cache Q8 context and 8K output. However, the model otherwise is incredible. It has a great personality if you tell it to be uncensored in the system prompt and have its own opinions, and obviously talks about NSFW stuff without an issue. So I thank mlabonne sincerely for creating this model. This is the first model out of many ones I've tested, including 70B ones, that has high willingness and competence to talk about uncensored topics, and has a fantastic personality as well, especially if you tell it to be brief in the sys prompt.