SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning Paper • 2505.02363 • Published May 5 • 7
Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models Paper • 2310.00840 • Published Oct 2, 2023