RLAIF/dpo_answer_2e-6_openorca_prompts_responses_1e-6_0.02_0.6B_0.6B_with_gold_labels_kl_estimation Viewer • Updated about 4 hours ago • 86.5k
RLAIF/dpo_uf_rejudged_mixed_openorca_with_gold_labels_kl_estimation Viewer • Updated about 4 hours ago • 86.5k • 11
RLAIF/dpo_answer_offtheshelf_openorca_1e-6_0.02_0.6B_0.6B_with_gold_labels_kl_estimation Viewer • Updated 2 days ago • 49.4k • 15
RLAIF/dpo_answer_ultrafeedback_filtered_openorca_1e-6_0.02_0.6B_0.6B_with_gold_labels_kl_estimation Viewer • Updated 2 days ago • 49.4k • 19
RLAIF/dpo_answer_ultrafeedback_openorca_1e-6_0.02_0.6B_0.6B_with_gold_labels_kl_estimation Viewer • Updated 2 days ago • 49.4k • 18
RLAIF/dpo_thinking_base_openorca_0.02_1.7B-4B_with_gold_labels_kl_estimation Viewer • Updated 2 days ago • 152k • 18
RLAIF/dpo_thinking_ultrafeedback_rejudged_openorca_0.02_with_gold_labels_kl_estimation Viewer • Updated 3 days ago • 152k • 26
RLAIF/dpo_answer_ultrafeedback_rejudged_openorca_0.02_with_gold_labels_kl_estimation Viewer • Updated 3 days ago • 152k • 32
RLAIF/dpo_answer_base_openorca_0.02_with_gold_labels_kl_estimation Viewer • Updated 4 days ago • 150k • 35
RLAIF/dpo_answer_ultrainteract_openorca_0.02_with_gold_labels_kl_estimation Viewer • Updated 5 days ago • 86.5k • 44
RLAIF/dpo_answer_ultrafeedback_openorca_0.02_with_gold_labels_kl_estimation Viewer • Updated 5 days ago • 139k • 43
RLAIF/dpo_thinking_binary_ultra_feedback_0.02_step_120_with_gold_labels_kl_estimation Viewer • Updated 14 days ago • 43.7k • 88
RLAIF/dpo_thinking_0.02_step_270_with_gold_labels_kl_estimation Viewer • Updated 15 days ago • 43.7k • 81
RLAIF/dpo_thinking_0.02_step_30_with_gold_labels_kl_estimation Viewer • Updated 15 days ago • 43.7k • 83
RLAIF/dpo_thinking_0.02_step_0_with_gold_labels_kl_estimation Viewer • Updated 15 days ago • 43.7k • 84