nynorsk_first_test_GRPO / model.safetensors

Commit History

GRPO model (assistant split heuristic reward)
d12c18a
verified

pere commited on