Gemma 3 4b unslop experiment v4

An unslop finetune of google/gemma-3-4b-it

Trying GSPO for the first time. I've settled on a much lower rank (16) than the 64 in my last finetune. It was really hard to get lower ranks stable with my weird reward function during GRPO. The extra stability from GSPO seems to have opened up some extra options though.
This finetune feels quite a bit different. Markdown wasn't suppressed as successfully as last time, but the lower rank gave it much different feel, too.
I think I prefer my previous finetune but I'm not 100% sure yet.
I've uploaded a UD-Q4_K_XL GGUF with settings that I grabbed from Unsloth's quant using my lil utility: quant_clone

Basically the same as last time plus the minor changes above.

training code: train.py

Safetensors

Model size

4.3B params

Tensor type

BF16

Model tree for electroglyph/gemma-3-4b-it-unslop-GSPO

Base model

Finetuned

Quantized

(142)

this model