Neutron: A pytorch based implementation of Transformer and its variants.
This project is developed with python 3.7.
Try pip install -r requirements.txt
after you clone the repository.
If you want to use BPE, to enable convertion to C libraries, to try the simple MT server and to support Chinese word segmentation supported by pynlpir in this implementation, you should also install those dependencies in requirements.opt.txt
with pip install -r requirements.opt.txt
.
We provide scripts to apply Byte-Pair Encoding (BPE) under scripts/bpe/
.
Generate training data for train.py
with bash scripts/mktrain.sh
, configure variables in scripts/mktrain.sh
for your usage (the other variables shall comply with those in scripts/mkbpe.sh
):
Most configurations are managed in cnfg/base.py
. Configure advanced details with cnfg/hyp.py
.
Just execute the following command to launch the training:
python train.py (runid)
where runid
can be omitted. In that case, the run_id
in cnfg/base.py
will be taken as the id of the experiment.
bash scripts/mktest.sh
, configure variables in scripts/mktest.sh
for your usage (while keep the other settings consistent with those in scripts/mkbpe.sh
and scripts/mktrain.sh
):
You can convert python classes into C libraries with python mkcy.py build_ext --inplace
, and codes will be checked before compiling, which can serve as a simple to way to find typo and bugs as well. This function is supported by Cython. These files can be removed by commands like rm -fr *.c *.so parallel/*.c parallel/*.so transformer/*.c transformer/*.so transformer/AGG/*.c transformer/AGG/*.so build/
. Loading modules from compiled C libraries may also accelerate, but not significantly.
You can rank your corpus with pre-trained model, per token perplexity will be given for each sequence pair. Use it with:
python rank.py rsf h5f models
where rsf
is the result file, h5f
is HDF5 formatted input of file of your corpus (genrated like training set with tools/mkiodata.py
like in scripts/mktrain.sh
), models
is a (list of) model file(s) to make perplexity evaluation.
Foundamental models needed for the construction of transformer.
Implementation of label smoothing loss function required by the training of transformer.
Learning rate schedule model needed according to the paper.
Functions for basic features, for example, freeze / unfreeze parameters of models, padding list of tensors to same size on assigned dimension.
Provide an encapsulation for the whole translation procedure with which you can use the trained model in your application easier.
An example depends on Flask to provide simple Web service and REST API about how to use the translator
, configure those variables before you use it.
Implementations of seq2seq models.
Multi-GPU parallelization implementation.
Supportive functions for data segmentation.
Scripts to support data processing (e.g. text to tensor), analyzing, model file handling, etc.
Settings: WMT 2014, English -> German, 32k joint BPE with 8 as vocabulary threshold for BPE. 2 nVidia GTX 1080 Ti GPU(s) for training, 1 for decoding.
Tokenized case-sensitive BLEU measured with multi-bleu.perl, Training speed and decoding speed are measured by the number of target tokens (<eos>
counted and <pad>
discounted) per second and the number of sentences per second:
BLEU | Training Speed | Decoding Speed | |
---|---|---|---|
Attention is all you need | 27.3 | ||
Neutron | 28.07 | 21562.98 | 68.25 |
The project starts when Hongfei XU (the developer) was a postgraduate student at Zhengzhou University, and continues when he is a PhD candidate at Saarland University supervised by Prof. Dr. Josef van Genabith and Prof. Dr. Deyi Xiong, and a Junior Researcher at DFKI, MLT (German Research Center for Artificial Intelligence, Multilinguality and Language Technology). Hongfei XU enjoys a doctoral grant from China Scholarship Council ([2018]3101, 201807040056) while maintaining this project.
Details of this project can be found here, and please cite it if you enjoy the implementation :)
@article{xu2019neutron,
author = {Xu, Hongfei and Liu, Qiuhui},
title = "{Neutron: An Implementation of the Transformer Translation Model and its Variants}",
journal = {arXiv preprint arXiv:1903.07402},
archivePrefix = "arXiv",
eprinttype = {arxiv},
eprint = {1903.07402},
primaryClass = "cs.CL",
keywords = {Computer Science - Computation and Language},
year = 2019,
month = "March",
url = {https://arxiv.org/abs/1903.07402},
pdf = {https://arxiv.org/pdf/1903.07402}
}
Every details are in those codes, just explore them and make commits ;-)