mnoukhov/summarize_from_feedback_oai_preprocessing_1706381144_relabel_pythia6.9b Viewer • Updated Jun 20, 2024 • 177k • 36
vwxyzjn/summarize_from_feedback_tldr_3_filtered_oai_preprocessing_1706381144 Viewer • Updated Jan 27, 2024 • 130k • 555
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models Paper • 2410.18252 • Published Oct 23, 2024 • 5