daixuancheng/zero_qwen-math-7b_base_allDapo_mathVerify_yesSuffix_step220 Text Generation • 8B • Updated 27 days ago • 5
daixuancheng/sac-init0.4_qwen-math-7b_constrainbyAdv_yesSuffix_step220 Text Generation • 8B • Updated 27 days ago • 5
daixuancheng/zero_7b_base_useTokenLoss_clipHigh_KLcoeff0_step160 Text Generation • 8B • Updated 27 days ago • 6
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-180_critic Text Generation • 8B • Updated 27 days ago • 5
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-180_actor Text Generation • 8B • Updated 27 days ago • 6
daixuancheng/fix-entropy-1e-3_train_math_global_step_160 Text Generation • 8B • Updated 27 days ago • 6
daixuancheng/fix-entropy-1e-3_train_math_global_step_140 Text Generation • 8B • Updated 27 days ago • 6