Seongyun/DeepSeek-R1-Distill-Qwen-1.5B-GRPO_mcqa_repetition_penalty_2 Text Generation • Updated Mar 8 • 12
Seongyun/DeepSeek-R1-Distill-Qwen-1.5B-GRPO_pref_repetition_penalty Text Generation • Updated Mar 1 • 9