PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment

Published: 03 Jun 2025, Last Modified: 03 Jun 2025CVPR 2025 DemoDivEveryoneRevisionsBibTeXCC BY 4.0
Keywords: alignment, preference learning, foundation model, reward model, ideal point model, plurality
TL;DR: A novel, modality-agnostic alignment framework to learn from heterogeneous human preferences
Abstract: Foundation models trained on internet-scale data benefit from extensive alignment to human preferences before deployment. However, existing methods typically assume a homogeneous preference shared by all individuals, overlooking the diversity inherent in human values. In this work, we propose a general reward modeling framework for pluralistic alignment PAL, which incorporates diverse preferences from the ground up. PAL has a modular design that leverages commonalities across users while catering to individual personalization, enabling efficient few-shot localization of preferences for new users. Extensive empirical evaluation demonstrates that PAL matches or outperforms state-of-the-art methods on both text-to-text and text-to-image tasks: on Reddit TL;DR Summary, PAL is 1.7% more accurate for seen users and 36% more accurate for unseen users compared to the previous best method, with 100× less parameters. On Pick-a-Pic v2, PAL is 2.5% more accurate than the best method with 156× fewer learned parameters. Finally, we provide theoretical analysis for generalization of rewards learned via PAL showcasing the reduction in number of samples needed per user.
Submission Number: 13
Loading