From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline Paper β’ 2406.11939 β’ Published Jun 17, 2024 β’ 8
The Effective Horizon Explains Deep RL Performance in Stochastic Environments Paper β’ 2312.08369 β’ Published Dec 13, 2023
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference Paper β’ 2403.04132 β’ Published Mar 7, 2024 β’ 41
S-LoRA: Serving Thousands of Concurrent LoRA Adapters Paper β’ 2311.03285 β’ Published Nov 6, 2023 β’ 32
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism Paper β’ 2103.12021 β’ Published Mar 22, 2021
On Optimal Caching and Model Multiplexing for Large Model Inference Paper β’ 2306.02003 β’ Published Jun 3, 2023 β’ 1
Fine-Tuning Language Models with Advantage-Induced Policy Alignment Paper β’ 2306.02231 β’ Published Jun 4, 2023 β’ 2
Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons Paper β’ 2301.11270 β’ Published Jan 26, 2023 β’ 2
Online Learning in Stackelberg Games with an Omniscient Follower Paper β’ 2301.11518 β’ Published Jan 27, 2023 β’ 1