Source codes of ACL 2022-Efficient Cluster-Based k-Nearest-Neighbor Machine Translation.
The implement of our proposed PCKMT is build upon the research of:
- adaptive kNN-MT (Xin Zheng et. al. 2021) [code]
- Fairseq and Faiss developed by Facebook Research
For our case, the CUDA version is 10.1. We didn't check other versions yet.
- python >= 3.6
- faiss-gpu == 1.6.5
- torch == 1.5.0
- torch-scatter == 2.0.5
With these requirements, it is suggested to use the command to install this editable version (fairseq == 0.10.1):
pip install --editable ./
Our trained checkpoints, datastores and logs are provided: baidu (Password: ckmt)
Please follow the steps to reproduce experiments:
- Follow the codebase of (Xin Zheng et. al. 2021) and download the checkpoint of base De-En NMT model released by Facebook WMT 2019.
- Similarly, download the corpora and test sets as illustrated by Xin Zheng et. al. 2021.
- Create the original datastore of adaptive kNN-MT.
cd codes && . create_datastore.sh
- [Option] Modify the script prune_datastore.py to fit your datastore (e.g., datadir, datastore size, etc. in the main() function) and then prune the datastore:
python prune_datastore.py
- Train the Compact Network:
. knn_align.sh
- Reconstruct the compressed datastore of CKMT
. create_datastore_knn_align.sh
- Train the quantized index
. build_faiss_index_knn_align.sh
- Train the CKMT model
Run the training on 1 GPU
. train_faiss_knn_align.sh
Or run the training on multiply GPUs, when--
- The training process causes OOM
- The size of your datastore is too large, e.g. >100M tokens
- The batch size is too large, e.g. >16 on P100
. train_faiss_knn_align_ddp.sh
The only difference of the DDP script is an external parameter:
options of 'faiss-batch-mode':
'batch_large_faiss_large'
'batch_large_faiss_small'
'batch_small_faiss_small'
'batch_small_faiss_large'
- Evaluation
. test_adaptive_knn_mt_knn_align.sh
-
2022-05-12 see [issue #1 pckmt] which describes the minimal realization the via checkpoints downloading.
-
2022-05-22 see [Issue #2 pckmt] that summarizes empirical issues with respect to large-scale datastores.
-
2022-06-09 see support Meta-k network DDP training. Four options provided to fit different datastore/batch sizes.
If you use the source codes included here in your work, please cite the following paper:
@misc{https://doi.org/10.48550/arxiv.2204.06175,
doi = {10.48550/ARXIV.2204.06175},
url = {https://arxiv.org/abs/2204.06175},
author = {Wang, Dexin and Fan, Kai and Chen, Boxing and Xiong, Deyi},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Cluster-Based k-Nearest-Neighbor Machine Translation},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}