This repository contains code for the paper, On the Effectiveness of Offline RL for Dialogue Response Generation, presented at ICML 2023.
git clone [email protected]:asappresearch/dialogue-offline-rl.git
cd dialogue-offline-rl
pyenv virtualenv dialogue-offline-rl
pyenv activate dialogue-offline-rl
Install the required packages:
pip install -r requirements.txt
Model | Links |
---|---|
Base model (tf ) |
ABCD, MultiWoz-2.2, TaskMaster-3 |
Fine Tune on Top Returns (tf_top ) |
ABCD, MultiWoz-2.2, TaskMaster-3 |
Decision Transformers: Condition on Return (dt ) |
ABCD, MultiWoz-2.2, TaskMaster-3 |
Off-policy Q-learning (ilql ) |
ABCD, MultiWoz-2.2, TaskMaster-3 |
Download and create datasets for training the base TF
model:
for dataset in abcd multi_woz taskmaster3; do
bash scripts/process_data/download_process_${dataset}.sh
done
Train the base TF
model by executing:
bash scripts/train/train_base_tf_distilgpt2.sh {dataset} {ngpu}
for example, bash scripts/train/train_base_tf_model_distilgpt2.sh abcd 4
To generate datasets for all three methods (tf_top
, dt
, ilql
), we need the path to the base TF
model (model_path
):
for split in train val test; do
python scripts/process_data/prepare_offline_rl_data.py --model_path_tf {model_path_tf} --save_path {save_path} --split ${split}
done
For training, we provide scripts for each of the three methods (tf_top
, dt
, ilql
):
bash scripts/train/train_offline_rl_distilgpt2.sh tf_top {dataset} {ngpu}
for example, bash scripts/train/train_offline_rl_distilgpt2.sh tf_top abcd 4
bash scripts/train/train_offline_rl_distilgpt2.sh dt {dataset} {ngpu}
for example, bash scripts/train/train_offline_rl_distilgpt2.sh dt abcd 4
First, install trlx from a fork at this location. Then execute the command:
python scripts/training/run_trlx_ilql.py --config_path config/trlx_ilql_gpt2med.yml --data_path {ilql_data_path}
To evaluate all the models:
python scripts/evaluation/evaluate_reward_metrics.py --dataset {dataset} --method {method} --model_path {model_path} --metrics '["bert_score", "bleurt_score", "meteor", "bleu"]' --save_path {save_path} --num_samples 1000
where, method={tf, tf_top, dt, ilql}
, dataset={abcd, multi_woz, taskmaster3}
, and model_path
is the path to corresponding model. The script will save all the predictions and metrics to a .csv
at save_path
.
If you found our code or paper useful, please consider citing:
@inproceedings{sodhi2023offlinerl,
title={On the Effectiveness of Offline RL for Dialogue Response Generation},
author={Sodhi, Paloma and Wu, Felix and Elenberg, Ethan R and Weinberger, Kilian Q and McDonald, Ryan},
booktitle = {International Conference on Machine Learning (ICML)},
year={2023}
}
This project is licensed under the terms of the MIT license.