WMT'15 with POS-Tagging (Third Dataset Solution)
For other tested solutions see Datasets wiki.
Download incomplete data
- Download incomplete dataset (2 files: input1.en and output1.fr) to
IncompleteData
folder
Create your own incomplete dataset
- Automatically download WMT dataset
python download_wmt_data.py --data_dir IncompleteData2
- Make incomplete data by deleting some irrelevant words via POS-Tag (CC, DT, IN, LS, TO, UH)
python make_incomplete_dataset_pos_tag.py
python separate_train_test_data.py --data_dir IncompleteData/train-incompleteDataPOS-15_20words --input_filename input1.en --output_filename output1.fr --train_perc 95