Reward Learning from Multiple Feedback Types

Abstract

Learning rewards from preference feedback has become an important tool in the alignment of agentic models. Preference-based feedback, often implemented as a binary comparison between multiple completions, is an established method to acquire large-scale human feedback. However, human feedback in other contexts is often much more diverse. Such diverse feedback can better support the goals of a human annotator, and the simultaneous use of multiple sources might be mutually informative for the learning process or carry type-dependent biases for the reward learning process.

In this paper, we bridge this gap by enabling experimentation and evaluating multi-type feedback in a broad set of environments. We present a process to generate high-quality simulated feedback of six different types. Then, we implement reward models and downstream RL training for all six feedback types. Based on the simulated feedback, we investigate the use of types of feedback across five RL environments and compare them to pure preference-based baselines. We show empirically that diverse types of feedback can be utilized and lead to strong reward modeling performance.

This work is the first strong indicator of the potential of multi-type feedback for RLHF.

Feedback Modeling

As a continuation of our previous work we define six different feedback models and categorize them broadly Evaluative Feedback, Instructional Feedback, and Descriptive Feedback. Each of these higher-level categories contains a direct and a contrastive feedback type. Evaluative Feedback includes Ratings and Comparisons, elicited through a direct rating of a single trajectory or a preference between two trajectories, respectively. Characterized by the instructional role of the feedback provider, Instructional Feedback contains Demonstrations and Corrections. Finally, Descriptive Feedback contains Descriptions and Descriptive Preferences, for which the feedback provider describes relevant features of the agent's observation.

Feedback Simulation

We present and implement a framework to generate synthetic feedback for all described feedback types in multiple benchmark RL-environments such as Mujoco, Meta-World, Atari and Highway-Env. To investigate the robustness of the discussed feedback types to labeling-error, we systematically vary noise-levels across feedback types.

Reward Modeling

Based on the synthetic feedback, we learn reward functions consistent with the six feedback types. We analyze the learned reward functions and compare them to the ground-truth rewards for different environments. Suprisingly, we find that although some feedback types induce reward functions that show relatively low correlation with the ground-truth rewards, they can still lead to strong RL performance.

Reward Function Ensembles

We implemented an initial version of a reward function ensemble, combining the reward models of the different feedback types. We evaluate the performance of the ensemble in comparison to the individual reward models, and showcase the potential of combining different feedback types for reward learning.

Related Work

Our work is in a series of recent efforts to expand the space of feedback types for RLHF:

RLHF-Blender presents an architecture and user interface implementation for the collection of multiple feedback types. Our work ties into this by implementing reward models and RL agent training for the feedback types collected by RLHF-Blender. Going beyond synthetic feedback generation by incorporating human feedback is an exiting next step in this research direction.

Our recent Survey on the Space of Human Feedback presents a survey of feedback types used in RLHF research and beyond. Our work contributes to this by providing a comprehensive evaluation of the performance of different feedback types in a broad set of environments.

@article{metz2025reward, author = {Metz, Yannick and Geiszl, Andras and Baur, Raphaël and El-Assady, Mennatallah}, title = {Reward Learning from Multiple Feedback Types}, journal = {ICLR}, year = {2025}, url = {https://openreview.net/forum?id=9Ieq8jQNAl} }

Reward Learning from Multiple Feedback Types

ICLR 2025

We present a framework for generation of artifical feedback of six feedback types, and present a comparison of reward models and RL training across multiple environments.