AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_mergedsft0.3_prefix_nokl_checkpoint-184_eval-dataset Viewer • Updated May 1 • 6.45k • 3
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_mergedsft0.3_prefix_nokl_checkpoint-26_eval-dataset Viewer • Updated May 1 • 6.45k • 4
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_mergedsft0.3_prefix_nokl_checkpoint-78_eval-dataset Viewer • Updated May 1 • 6.45k • 4
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_mergedsft0.3_prefix_nokl_checkpoint-52_eval-dataset Viewer • Updated May 1 • 6.45k • 6
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_mergedsft0.3_prefix_nokl_checkpoint-104_eval-dataset Viewer • Updated May 1 • 6.45k • 3
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_mergedsft0.3_prefix_nokl_checkpoint-255_eval-dataset Viewer • Updated May 1 • 6.45k • 4
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_mergedsft_prefix_nokl_checkpoint-255_eval-dataset Viewer • Updated Apr 30 • 6.45k • 3
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_checkpoint-255_eval-dataset Viewer • Updated Apr 30 • 6.45k • 1
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_checkpoint-104_eval-dataset Viewer • Updated Apr 30 • 6.45k • 6
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_checkpoint-52_eval-dataset Viewer • Updated Apr 30 • 6.45k • 3
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_checkpoint-26_eval-dataset Viewer • Updated Apr 30 • 6.45k • 2
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_mergedsft0.3_prefix_nokl_eval-dataset Viewer • Updated Apr 30 • 6.45k • 5
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_mergedsft_prefix_kl0.005_52_eval-dataset Viewer • Updated Apr 30 • 6.45k • 3
AdversarialRLHF/generations_ppo_pythia410m_tldr6.9b_rm410mdata_mergedsft_prefix_nokl_eval-dataset Viewer • Updated Apr 30 • 6.45k • 2
AdversarialRLHF/ppo_pythia410m_tldr6.9b_rm410mdata_mergedsft_prefix_nokl_full_eval-dataset Viewer • Updated Apr 30 • 6.45k • 3
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_mergedsft_prefix_nokl_checkpoint-104_eval-dataset Viewer • Updated Apr 30 • 6.45k • 2
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_mergedsft_prefix_nokl_checkpoint-78_eval-dataset Viewer • Updated Apr 30 • 6.45k • 2
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_mergedsft_prefix_nokl_checkpoint-52_eval-dataset Viewer • Updated Apr 30 • 6.45k • 1
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_mergedsft_prefix_nokl_checkpoint-26_eval-dataset Viewer • Updated Apr 30 • 6.45k • 2
AdversarialRLHF/sffop_1706381144_410msft_relabel_pythia6.9b_logprobs_prefix_chosen Viewer • Updated Apr 29 • 130k • 5
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_mergedsft_prefix_eval-dataset Viewer • Updated Apr 29 • 6.45k • 4
AdversarialRLHF/ppo_pythia410m_tldr6.9b_rm410mdata_mergedsft_propprefix_eval-dataset Viewer • Updated Apr 29 • 6.45k • 2
AdversarialRLHF/summarize_from_feedback_tldr_3_filtered_oai_preprocessing_1706381144_propprefix Viewer • Updated Apr 29 • 130k • 4
AdversarialRLHF/sffop_1706381144_410msft_relabel_pythia6.9b_logprobs_cond3emojiepropprefix Viewer • Updated Apr 27 • 130k • 8