We just released TRL v0.20 with major multimodal upgrades!
ποΈ VLM support for GRPO (highly requested by the community!) ποΈ New GSPO trainer (from @Qwen, released last week, VLM-ready) π New MPO trainer (multimodal by design, as in the paper)