Dense Reward for Free in Reinforcement Learning from Human Feedback

Alex J. Chan, Hao Sun, Samuel Holt, and Mihaela van der Schaar

International Conference on Machine Learning (ICML) 2024

Last Updated: 18 July 2024

Primary Code Author: Alex J. Chan ([email protected])

This repo is pip installable - clone it, optionally create a virtual env, and install it:

git clone https://github.com/XanderJC/attention-based-credit.git

cd attention-based-credit

pip install -r requirements.txt

pip install -e .

The PPO implementation used is a small update to the TRL implementation, which it inherits from. Thus, please pay attention to the version used as TRL is very actively updated and breaking changes may have been introduced.

Scripts used to run the algorithms are in experiments/scripts, and each experiment in the paper has essentially its own corresponding script in experiments/bash which runs the necessary scripts to compile all the results, for example, to reproduce the experiment in Figure 3, navigate to the root directory and run:

bash experiments/bash/IMDb_base.sh

Note: The WandB entities for experiment tracking and result loading needs setting in the .env file in the root directory. The experiments were run on a machine with a single NVIDIA A6000 Ada card with 48GB VRAM, so any changes in setup may also require attention.

You can then generate the results and plots using:

python experiments/plotting/IMDb.py

Note: These can actually already be run without re-doing the experiments as I've saved cached results in results/numerics that the plotting scripts can access if --use_wandb false.

Citing

If you use this software please cite as follows:

@inproceedings{chan2024dense,
  title={Dense Reward for Free in Reinforcement Learning from Human Feedback},
  author={Alex James Chan and Hao Sun and Samuel Holt and Mihaela van der Schaar},
  booktitle={International Conference on Machine Learning}
  year={2024},
  url={https://openreview.net/forum?id=eyxVRMrZ4m}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
abcrl		abcrl
experiments		experiments
images		images
results		results
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dense Reward for Free in Reinforcement Learning from Human Feedback

Alex J. Chan, Hao Sun, Samuel Holt, and Mihaela van der Schaar

International Conference on Machine Learning (ICML) 2024

Citing

About

Releases

Packages

Languages

License

vanderschaarlab/attention-based-credit

Folders and files

Latest commit

History

Repository files navigation

Dense Reward for Free in Reinforcement Learning from Human Feedback

Alex J. Chan, Hao Sun, Samuel Holt, and Mihaela van der Schaar

International Conference on Machine Learning (ICML) 2024

Citing

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages