Gemma 3 4b unslop experiment v4
An unslop finetune of google/gemma-3-4b-it
Changes from my previous test
- Trying GSPO for the first time. I've settled on a much lower rank (16) than the 64 in my last finetune. It was really hard to get lower ranks stable with my weird reward function during GRPO. The extra stability from GSPO seems to have opened up some extra options though.
- This finetune feels quite a bit different. Markdown wasn't suppressed as successfully as last time, but the lower rank gave it much different feel, too.
- I think I prefer my previous finetune but I'm not 100% sure yet.
- I've uploaded a UD-Q4_K_XL GGUF with settings that I grabbed from Unsloth's quant using my lil utility: quant_clone
Training technique:
Basically the same as last time plus the minor changes above.
training code: train.py
- Downloads last month
- 41