hdong0/deepseek-Qwen2.5-7B-baseline-thin-Open-R1-GRPO_deepscaler_acc_mu_8_constant_lr_warmed_math 8B • Updated 11 minutes ago
hdong0/deepseek-Qwen-1.5B-batch-mix-GRPO_deepscaler_acc_seq_end_mask_thin_mu_8_warmed_math 2B • Updated 14 minutes ago
hdong0/deepseek-Qwen2.5-7B-baseline-thin-Open-R1-GRPO_deepscaler_acc_mu_8_constant_lr_warmed_rerun 8B • Updated 22 minutes ago
hdong0/deepseek-Qwen2.5-7B-baseline-thin-Open-R1-GRPO_deepscaler_acc_mu_8_constant_lr_warmed Text Generation • 8B • Updated about 8 hours ago • 44
hdong0/deepseek-Qwen-7B-batch-mix-GRPO_deepscaler_seq_end_mask_thin_mu_8_warmed_math Text Generation • 8B • Updated 1 day ago • 32
hdong0/Qwen-Math-7B-batch-mix-GRPO_deepscaler_seq_end_mask_thin_mu_8_warmed 8B • Updated 4 days ago • 5
hdong0/deepseek-Llama-8B-baseline-Open-R1-GRPO_deepscaler_acc_mu_8_constant_lr_warmed_no_kl 8B • Updated 4 days ago • 17
hdong0/deepseek-Llama-8B-Open-R1-GRPO_deepscaler_acc_mu_8_constant_lr_no_kl Text Generation • 8B • Updated 4 days ago • 40
hdong0/deepseek-Llama-8B-batch-mix-GRPO_deepscaler_acc_seq_end_mask_thin_mu_8_constant_lr_warmed 8B • Updated 4 days ago • 32
hdong0/deepseek-Qwen-1.5B-baseline-thin-Open-R1-GRPO_deepscaler_mu_8_constant_lr_warmed Text Generation • 2B • Updated 5 days ago • 29
hdong0/Qwen__Qwen2.5-Math-1.5B_num_erased_tokens_128_remove_think_prompt_1 Viewer • Updated May 29 • 103k • 7