The repository for the paper: Rethinking Document-level Neural Machine Translation (ACL-2022: Findings)
Other previously used titles:
- Capturing Longer Context for Document-level Neural Machine Translation: A Multi-resolutional Approach)
- An Empirical Study of Document-to-document Neural Machine Translation
The training sets can be downloaded from here.
The test sets are organized in sentences(sent/del) and documents(doc) respectively with the same content. The labeled tokens and their positions are in the testsets/doc/en.candidates.
As is mentioned in the paper, we provide the python script of calculating TCP, as:
python3 tcp.py python tcp.py --hypotheses_dir your_hypothesis_or_rootpath
It is equivalent to
python3 tcp.py python tcp.py --reference ./testsets/doc/en.tok --candidates ./testsets/doc/en.candidates --hypotheses_dir your_hypothesis_or_rootpath
If you use our data or evaluation scripts, please cite:
@inproceedings{sun2020rethinking,
title={Rethinking Document-level Neural Machine Translation},
author={Zewei Sun and Mingxuan Wang and Hao Zhou and Chengqi Zhao and Shujian Huang and Jiajun Chen and Lei Li},
booktitle={Findings of the Association for Computational Linguistics: ACL 2022},
year={2022},
}