Gemma 3 4b unslop experiment v3
An unslop finetune of google/gemma-3-4b-it
Next version is here: gemma-3-4b-it-unslop-GSPO
Changes from my previous test
- Temperature during training was at 1.0 this time around, model is a lot less weird
- Rewards changed a little bit. I allowed a small number of sentences with 4+ commas instead of penalizing them all. This has cut down on the number of paranthetical phrases without completely eliminating them.
- Lexical diversity score is a bit fancier this time. First I calculated MTLD for 600+ books I have and looked at the mean score. It was almost exactly 100.0, so that's the baseline I aimed for. MTLD of 80-120 all received full points (to avoid too much GRPO chaos), but deviations further than that get increasingly penalized.
- I've uploaded a UD-Q4_K_XL GGUF with settings that I grabbed from Unsloth's quant using my lil utility: quant_clone
Training technique:
Basically the same as last time plus the minor changes above.
training code: train.py
- Downloads last month
- 83