Gemma 3 4b unslop experiment v3

An unslop finetune of google/gemma-3-4b-it

Next version is here: gemma-3-4b-it-unslop-GSPO

Changes from my previous test

  • Temperature during training was at 1.0 this time around, model is a lot less weird
  • Rewards changed a little bit. I allowed a small number of sentences with 4+ commas instead of penalizing them all. This has cut down on the number of paranthetical phrases without completely eliminating them.
  • Lexical diversity score is a bit fancier this time. First I calculated MTLD for 600+ books I have and looked at the mean score. It was almost exactly 100.0, so that's the baseline I aimed for. MTLD of 80-120 all received full points (to avoid too much GRPO chaos), but deviations further than that get increasingly penalized.
  • I've uploaded a UD-Q4_K_XL GGUF with settings that I grabbed from Unsloth's quant using my lil utility: quant_clone

Training technique:

Basically the same as last time plus the minor changes above.

training code: train.py

Downloads last month
83
Safetensors
Model size
4.3B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for electroglyph/gemma-3-4b-it-unslop-GRPO-v3

Quantized
(139)
this model