arxiv:2505.10527

WorldPM: Scaling Human Preference Modeling

Published on May 15

· Submitted by

Tigerph on May 16

Upvote

Authors:

Binghai Wang ,

Keming Lu ,

Zhenru Zhang ,

Chujie Zheng ,

Binyuan Hui ,

Dayiheng Liu ,

Xuanjing Huang ,

Bowen Yu ,

Jingren Zhou ,

Junyang Lin

Abstract

Motivated by scaling laws in language modeling that demonstrate how test loss scales as a power law with model and dataset sizes, we find that similar laws exist in preference modeling. We propose World Preference Modeling$ (WorldPM) to emphasize this scaling potential, where World Preference embodies a unified representation of human preferences. In this paper, we collect preference data from public forums covering diverse user communities, and conduct extensive training using 15M-scale data across models ranging from 1.5B to 72B parameters. We observe distinct patterns across different evaluation metrics: (1) Adversarial metrics (ability to identify deceptive features) consistently scale up with increased training data and base model size; (2) Objective metrics (objective knowledge with well-defined answers) show emergent behavior in larger language models, highlighting WorldPM's scalability potential; (3) Subjective metrics (subjective preferences from a limited number of humans or AI) do not demonstrate scaling trends. Further experiments validate the effectiveness of WorldPM as a foundation for preference fine-tuning. Through evaluations on 7 benchmarks with 20 subtasks, we find that WorldPM broadly improves the generalization performance across human preference datasets of varying sizes (7K, 100K and 800K samples), with performance gains exceeding 5% on many key subtasks. Integrating WorldPM into our internal RLHF pipeline, we observe significant improvements on both in-house and public evaluation sets, with notable gains of 4% to 8% in our in-house evaluations.

View arXiv page View PDF Add to collection

Community

Tigerph

Paper submitter 1 day ago

Discovery of Scaling Laws in Preference Modeling : The paper identifies that, similar to language modeling, preference modeling also follows scaling laws, where performance improves as a power law function of model size and dataset size.
Introduction of WorldPM for Unified Preference Representation : The authors propose World Preference Modeling (WorldPM) , which aims to capture a unified representation of human preferences, emphasizing its scalability and generalization across diverse tasks and datasets.
Comprehensive Evaluation and Strong Performance Gains : The paper conducts large-scale experiments using up to 15 million preference data points and models with up to 72B parameters, demonstrating that WorldPM significantly improves performance across multiple benchmarks, with gains of over 5% on many subtasks and 4–8% improvements in internal RLHF evaluations.