konstantin-ketterer/Qwen2-3B-GRPO-max-advantage-4x-oversampling-reference-m-sync-0.9-32-no-wd-0.02-warmup Updated Feb 22
konstantin-ketterer/Qwen2-3B-GRPO-max-absolute-advantage-4x-oversampling-reference-m-sync-0.9-32-no-wd-0.02-warmup Updated Feb 21
konstantin-ketterer/Qwen2-3B-GRPO-max-absolute-advantage-4x-oversampling-smooth-reference-model-sync-0.9-32 Updated Feb 19