-
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper • 2406.11827 • Published • 15 -
Self-Improving Robust Preference Optimization
Paper • 2406.01660 • Published • 20 -
Bootstrapping Language Models with DPO Implicit Rewards
Paper • 2406.09760 • Published • 41 -
BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM
Paper • 2406.12168 • Published • 7
Park
sh110495
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 7 hours ago
EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and
Reasoning Modes
liked
a model
1 day ago
LGAI-EXAONE/EXAONE-4.0-32B