This is the code release for the paper: Unsupervised End-to-End Task-Oriented Dialogue with LLMs: The Power of the Noisy Channel, at EMNLP 2024!
Brendan King and Jeffrey Flanigan.
-
We provide a Docker image, in which there is a virtual environment at
/root/venv
with all dependencies installed. See ./k8s/Dockerfile for details. -
Alternatively, for a local installation, we use conda:
# Create environment with Python 3.10
conda create python=3.10 --prefix venv
# Add in torch/cuda and gxx, nvcc
conda install --yes pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -c anaconda gxx_linux-64 nvidia::cuda-nvcc
# Install dependencies and repo itself in edit mode (gets most dependencies via setup.cfg). NOTE: we found it important to install flash-attention last, with this specific version of ninja
pip install pyzmq faiss-cpu faiss-gpu && pip install packaging ninja==1.10.2 && pip install --user -e . && pip install flash-attn --no-build-isolation
We use the MultiWOZ 2.2 dataset, available in its original form here:
We share our processed version on Huggingface at Brendan/multiwoz_turns_v22.
- We release our final model, trained with 2 steps of our EM process on Huggingface link
Here are the steps for repeating experiments, as well as outputs from each step.
Many experiments depend on Weights & Biases for artifact storage. Apologies for any inconvenience. You should be able to set the
entity these are logged to with environment variable WANDB_ENTITY
and/or a function argument.
In this step, we create an initial self-labelling of the MultiWOZ dataset using bigcode/starcoder (15B). This method follows the procedure described in sections 4.1-4.4 of the paper.
Inputs:
- The unlabelled MultiWOZ corpus (train split), partitioned into 50 dialogue chunks [link]
- StarCoder 15B [link]
Outputs
- A self-labelled MultiWOZ dataset. Here is an example: [link]
Further Details & Reproduction Steps: runs/offline_labelling_experiment/initial_labelling/README.md
Inputs:
- A self-labeled corpus
- Pre-trained base model (we use StarCoder 3B)
Outputs:
- A fine-tuned Dialogue State Tracker and Dialogue Act Tagger, which can be used to re-label the corpus
Further Details & Reproduction Steps: runs/finetune_multitask/starcoder_3b/offline_label/README.md
Inputs:
- The unlabelled MultiWOZ corpus (train split), partitioned into 50 dialogue chunks [link]
- A StarCoder 3B model fine-tuned as a Dialogue State Tracker and Dialogue Act Tagger
Outputs:
- An improved self-labelled MultiWOZ dataset. Here is an example:[link]
Further Details & Reproduction Steps: runs/offline_labelling_experiment/second_labelling/README.md
Inputs:
- A self-labeled corpus
- Pre-trained base model (we use StarCoder 3B)
Outputs:
- A model which can be used as an end-to-end dialogue agent
Further Details & Reproduction Steps: runs/finetune_multitask/starcoder_3b/online_e2e/README.md
Inputs:
- A model which can be used as an end-to-end dialogue agent.
- MultiWOZ corpus
Outputs:
- predictions and evaluation scores.
Further Details & Reproduction Steps: runs/online_e2e_experiment/test_set/README.md
@inproceedings{king-flanigan-2024-unsupervised,
title = "Unsupervised End-to-End Task-Oriented Dialogue with {LLM}s: The Power of the Noisy Channel",
author = "King, Brendan and
Flanigan, Jeffrey",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.473",
pages = "8283--8300",
abstract = "Training task-oriented dialogue systems typically requires turn-level annotations for interacting with their APIs: e.g. a dialogue state and the system actions taken at each step. These annotations can be costly to produce, error-prone, and require both domain and annotation expertise. With advances in LLMs, we hypothesize that unlabeled data and a schema definition are sufficient for building a working task-oriented dialogue system, completely unsupervised. We consider a novel unsupervised setting of only (1) a well-defined API schema (2) a set of unlabeled dialogues between a user and agent. We propose an innovative approach using expectation-maximization (EM) that infers turn-level annotations as latent variables using a noisy channel model to build an end-to-end dialogue agent. Evaluating our approach on the MultiWOZ benchmark, our method more than doubles the dialogue success rate of a strong GPT-3.5 baseline.",
}