Building Math Agents with Multi-Turn Iterative Preference Learning Paper • 2409.02392 • Published Sep 4, 2024 • 15
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs Paper • 2406.02886 • Published Jun 5, 2024 • 9
LiPO: Listwise Preference Optimization through Learning-to-Rank Paper • 2402.01878 • Published Feb 2, 2024 • 20
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting Paper • 2306.17563 • Published Jun 30, 2023 • 9