FlowEval

Official repo of FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows.

This work is led by LaVi Lab, CSE@CUHK and Shanghai AI Laboratory.

Data

In this work, we have two data contributions:

We provide the first large-scale segment act dataset based on dialogues from ConvAI2 and DailyDialog. Please check the ActDial folder and our paper for more information.
We extend the original Controllable Dialogue with segment act annotation and human evaluation. After the extension, this dataset can be easily adopted as a new test set for future open-domain dialogue evaluation study. Please check the Controllable_Dialogue folder and our paper for more information.

Other Benchmark Datasets

We use other two benchmark datasets: DSTC9 and FED Dataset. Please go to their official repo to download or for more information. Essential steps to process the data is documented in our paper.

Model

We release two models in this work:

A segment act classification model that takes single segment text as input and predicts its segment act. It is adopted to automatically obtain the segment act flow of a new dialogue, when human annotation is too arduous. (Download)
ActBERT, a masked segment act model resembling the masked language model as BERT, is designed to model the segment act flows with different length. It is used to extract segment act feature during the retrieval step of our consensus-based process. More detailed description of the model and its usage can be found in Section 4.2 and Appendix. (Download)

Experiments

You can use our code run_controllable.py, run_fed.py, and run_dstc9.py to calculate all the scores for each dialogues from Controllable Dialogue, FED, and DSTC9 respectively. It takes around half an hour to two hours depends on the hardware setup.

We also provide the calculation results to save the hustle in the results folder. You can use the results/calculate_correlation.py to obtain the results in Table 2 and 3 of our paper. For more details, please check out the results folder.

Citation

Please cite our paper if you use our code or data in your work:

@inproceedings{zhao2022floweval,
   title={FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows},
   author={Jianqiao Zhao and Yanyang Li and Wanyu Du and Yangfeng Ji and Dong Yu and Michael R. Lyu and Liwei Wang},
   booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
   year={2022}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlowEval

Data

Other Benchmark Datasets

Model

Experiments

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
ActDial		ActDial
Controllable_Dialogue		Controllable_Dialogue
images		images
results		results
score		score
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_controllable.py		run_controllable.py
run_dstc9.py		run_dstc9.py
run_fed.py		run_fed.py

Jianqiao-Zhao/FlowEval

Folders and files

Latest commit

History

Repository files navigation

FlowEval

Data

Other Benchmark Datasets

Model

Experiments

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages