CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published 28 days ago • 48
Harnessing Webpage UIs for Text-Rich Visual Understanding Paper • 2410.13824 • Published Oct 17, 2024 • 30
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper • 2410.13754 • Published Oct 17, 2024 • 75
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models Paper • 2410.07985 • Published Oct 10, 2024 • 28
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation Paper • 2410.09584 • Published Oct 12, 2024 • 47
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models Paper • 2410.07985 • Published Oct 10, 2024 • 28
Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment Paper • 2403.11124 • Published Mar 17, 2024 • 1
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published Sep 4, 2024 • 72
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published Sep 4, 2024 • 72
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization Paper • 2402.09320 • Published Feb 14, 2024 • 6
Making Large Language Models Better Reasoners with Alignment Paper • 2309.02144 • Published Sep 5, 2023 • 2
Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy Paper • 2204.07433 • Published Apr 7, 2022
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs Paper • 2304.08244 • Published Apr 14, 2023
Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment Paper • 2403.11124 • Published Mar 17, 2024 • 1