AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_mergedsft_prefix_kl0.005 0.4B • Updated Apr 30 • 2
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_mergedsft0.3_prefix_nokl 0.4B • Updated Apr 30 • 2
AdversarialRLHF/rloo_pythia410m_tldr6.9b_rm410mdata_propsft_propprefix_nokl 0.4B • Updated Apr 30 • 2
AdversarialRLHF/pythia410m-rm-tldr6.9b_prefix_in_chosen Text Classification • 0.4B • Updated Apr 30 • 2
AdversarialRLHF/pythia410m-rm-tldr6.9b_logprobcondpropprefix Text Classification • 0.4B • Updated Apr 27 • 2
AdversarialRLHF/pythia410m-rm-tldr6.9b_logprobcondpropallprefix Text Classification • 0.4B • Updated Apr 27 • 4
AdversarialRLHF/pythia410m-rm-tldr6.9b_logprobcondallprefix Text Classification • 0.4B • Updated Apr 27 • 4
AdversarialRLHF/pythia410m-rm-tldr6.9b_logprobcondboth Text Classification • 0.4B • Updated Apr 26 • 2
AdversarialRLHF/pythia410m-rm-tldr6.9b_logprobcondsuffix Text Classification • 0.4B • Updated Apr 26 • 2
AdversarialRLHF/pythia410m-rm-tldr6.9b_logprobcondprefix Text Classification • 0.4B • Updated Apr 26 • 2
AdversarialRLHF/pythia410m-rm-tldr6.9b_randomizeprefix Text Classification • 0.4B • Updated Apr 25 • 2