daixuancheng/rerun_sac-init0.4_qwen-math-7b_constrainbyAdv_step280 Text Generation • 8B • Updated 26 days ago • 2
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step140_crtic Text Generation • 8B • Updated 26 days ago • 5
daixuancheng/sac-init0.4_qwen-math-7b_constrainbyAdv_yesSuffix_step140 Text Generation • 8B • Updated 26 days ago • 2
daixuancheng/rerun_sac-init0.4_qwen-math-7b_constrainbyAdv_step260 Text Generation • 8B • Updated 26 days ago • 5
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step140_actor Text Generation • 8B • Updated 26 days ago • 3
daixuancheng/sac-init0.4_qwen-math-7b_constrainbyAdv_yesSuffix_step100 Text Generation • 8B • Updated 26 days ago • 3
daixuancheng/rerun_sac-init0.4_qwen-math-7b_constrainbyAdv_step240 Text Generation • 8B • Updated 26 days ago • 3
daixuancheng/zero_qwen-math-7b_base_allDapo_mathVerify_yesSuffix_step100 Text Generation • 8B • Updated 26 days ago • 3
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step100_crtic Text Generation • 8B • Updated 26 days ago • 3
daixuancheng/sac-init0.4_qwen-math-7b_constrainbyAdv_yesSuffix_step40 Text Generation • 8B • Updated 26 days ago • 2
daixuancheng/zero_qwen-math-7b_base_allDapo_mathVerify_yesSuffix_step140 Text Generation • 8B • Updated 26 days ago • 2
daixuancheng/zero_qwen-math-7b_base_allDapo_mathVerify_yesSuffix_step40 Text Generation • 8B • Updated 26 days ago • 2
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-40_critic Text Generation • 8B • Updated 26 days ago • 2
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step40_crtic Text Generation • 8B • Updated 26 days ago • 2
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-40_actor Text Generation • 8B • Updated 26 days ago • 2
daixuancheng/rerun_sac-init0.4_qwen-math-7b_constrainbyAdv_step220 Text Generation • 8B • Updated 26 days ago • 6
daixuancheng/rerun_sac-init0.4_qwen-math-7b_constrainbyAdv_step200 Text Generation • 8B • Updated 26 days ago • 6
daixuancheng/distill_1.5b_sac-init0.4_constrainbyAdv_global_step_500 Text Generation • 2B • Updated 26 days ago • 6
daixuancheng/rerun_sac-init0.4_qwen-math-7b_constrainbyAdv_step140 Text Generation • 8B • Updated 26 days ago • 5
daixuancheng/distill_1.5b_sac-init0.4_constrainbyAdv_global_step_400 Text Generation • 2B • Updated 26 days ago • 6
daixuancheng/distill_1.5b_sac-init0.4_constrainbyAdv_global_step_200 Text Generation • 2B • Updated 26 days ago • 5
daixuancheng/rerun_sac-init0.4_qwen-math-7b_constrainbyAdv_step100 Text Generation • 8B • Updated 26 days ago • 5
daixuancheng/zero_7b_base_useTokenLoss_clipHigh_KLcoeff0_step100 Text Generation • 8B • Updated 26 days ago • 5
daixuancheng/distill_1.5b_sac-init0.4_constrainbyAdv_global_step_100 Text Generation • 2B • Updated 26 days ago • 5
daixuancheng/rerun_sac-init0.1_qwen-math-7b_constrainbyAdv_step200 Text Generation • 8B • Updated 26 days ago • 6
daixuancheng/rerun_sac-init0.4_qwen-math-7b_constrainbyAdv_step40 Text Generation • 8B • Updated 26 days ago • 6