Abstract
Motivated by scaling laws in language modeling that demonstrate how test loss scales as a power law with model and dataset sizes, we find that similar laws exist in preference modeling. We propose World Preference Modeling$ (WorldPM) to emphasize this scaling potential, where World Preference embodies a unified representation of human preferences. In this paper, we collect preference data from public forums covering diverse user communities, and conduct extensive training using 15M-scale data across models ranging from 1.5B to 72B parameters. We observe distinct patterns across different evaluation metrics: (1) Adversarial metrics (ability to identify deceptive features) consistently scale up with increased training data and base model size; (2) Objective metrics (objective knowledge with well-defined answers) show emergent behavior in larger language models, highlighting WorldPM's scalability potential; (3) Subjective metrics (subjective preferences from a limited number of humans or AI) do not demonstrate scaling trends. Further experiments validate the effectiveness of WorldPM as a foundation for preference fine-tuning. Through evaluations on 7 benchmarks with 20 subtasks, we find that WorldPM broadly improves the generalization performance across human preference datasets of varying sizes (7K, 100K and 800K samples), with performance gains exceeding 5% on many key subtasks. Integrating WorldPM into our internal RLHF pipeline, we observe significant improvements on both in-house and public evaluation sets, with notable gains of 4% to 8% in our in-house evaluations.
Community
Discovery of Scaling Laws in Preference Modeling : The paper identifies that, similar to language modeling, preference modeling also follows scaling laws, where performance improves as a power law function of model size and dataset size.
Introduction of WorldPM for Unified Preference Representation : The authors propose World Preference Modeling (WorldPM) , which aims to capture a unified representation of human preferences, emphasizing its scalability and generalization across diverse tasks and datasets.
Comprehensive Evaluation and Strong Performance Gains : The paper conducts large-scale experiments using up to 15 million preference data points and models with up to 72B parameters, demonstrating that WorldPM significantly improves performance across multiple benchmarks, with gains of over 5% on many subtasks and 4–8% improvements in internal RLHF evaluations.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CHARM: Calibrating Reward Models With Chatbot Arena Scores (2025)
- COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values (2025)
- Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data (2025)
- Do LLM Evaluators Prefer Themselves for a Reason? (2025)
- Energy-Based Reward Models for Robust Language Model Alignment (2025)
- RM-R1: Reward Modeling as Reasoning (2025)
- Improving Model Alignment Through Collective Intelligence of Open-Source LLMS (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 4
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper