- This is a modified implementation of Bi-directional Attention Flow for Machine Comprehension (Seo et al., 2016).
- This is tensorflow v1.1.0 compatible version.
Use python -m basic.cli --help
for usage.
- Python (developed on 3.5.2. Issues have been reported with Python 2!)
- unzip
- tensorflow (deep learning library, verified on 1.1.0)
- nltk (NLP tools, verified on 3.2.1)
- tqdm (progress bar, verified on 4.7.4)
- jinja2 (for visaulization; if you only train and test, not needed)
First, prepare data. Donwload SQuAD data and GloVe and nltk corpus
(~850 MB, this will download files to $HOME/data
):
chmod +x download.sh; ./download.sh
Second, Preprocess Stanford QA dataset (along with GloVe vectors) and save them in $PWD/data/squad
(~5 minutes):
python -m squad.prepro
Note that the training script saves results in a subfolder of out
named as ${TIMESTAMP}
(e.g. out/2017-07-10___21-37-44/
). Before running, please:
mkdir out
The model requires at least 12GB of GPU RAM. If your GPU RAM is smaller than 12GB, you can either decrease batch size (performance might degrade), or you can use multi GPU (see below). The training converges at ~10k steps, and it took ~10 hours in Titan X.
Before training, it is recommended to first try the following code to verify everything is okay and memory is sufficient:
python -m basic.cli --mode train --noload --debug
Then to fully train baseline without sparsity learning, run:
python -m basic.cli --mode train --noload
You can speed up the training process with optimization flags:
python -m basic.cli --mode train --noload --len_opt --cluster
You can still omit them, but training will be much slower.
Our model supports multi-GPU training. We follow the parallelization paradigm described in TensorFlow Tutorial. In short, if you want to use batch size of 60 (default) but if you have 2 GPUs with 6GB of RAM, then you initialize each GPU with batch size of 30, and combine the gradients on CPU. This can be easily done by running:
python -m basic.cli --mode train --noload --num_gpus 2 --batch_size 30
To test, run:
# run test by specifying the shared json and trained model
export TIMESTAMP=2017-07-10___21-37-44
python -m basic.cli --len_opt --cluster \
--shared_path out/${TIMESTAMP}/basic/00/shared.json \
--load_path out/${TIMESTAMP}/basic/00/save/basic-10000 # the model saved at step 10000
# Test by multi-gpus
python -m basic.cli --len_opt --cluster \
--num_gpus 2 --batch_size 30 \
--zero_threshold 0.02 \ # zero out small weights whose absolute values are <0.02
--shared_path out/${TIMESTAMP}/basic/00/shared.json \
--load_path out/${TIMESTAMP}/basic/00/save/basic-10000 # the model saved at step 10000
We can also input --group_config groups_hidden100.json
to plot the sizes of ISS and ISS sparsity. An example:
structure sparsity:
16/100
10/100
62/100
54/100
66/100
79/100
group sizes:
[4800, 4800, 3201, 3201, 6401, 6401]
This command loads the saved model during training and begins testing on the test data.
After the process ends, it prints F1 and EM scores, and also outputs a json file ($PWD/out/basic/00/answer/test-000000.json
).
Note that the printed scores are not official (our scoring scheme is a bit harsher).
To obtain the official number, use the official evaluator (copied in squad
folder) and the output json file:
python squad/evaluate-v1.1.py \
$HOME/data/squad/dev-v1.1.json out/basic/00/answer/test-000000.json
Learning ISS (hidden states) in LSTMs
python -m basic.cli --mode train --len_opt --cluster \
--num_gpus 2 --batch_size 30 \
--input_keep_prob 0.9 \
--load_path out/${TIMESTAMP}/basic/00/save \ # fine-tune baseline, use [--noload] if train from scratch
--structure_wd 0.001 \ # the hyperparameter to make trade-off between sparsity and EM/F1 performance
--group_config groups_hidden100.json # the json to specify ISS structures for LSTMs
# finetuning with L1
python -m basic.cli --mode train --len_opt --cluster \
--num_gpus 2 --batch_size 30 \
--input_keep_prob 0.9 \
--load_path ${HOME}/trained_models/squad/bidaf_adam_baseline/basic-10000 \
--l1wd 0.0002
python -m basic.cli --mode train --len_opt --cluster \
--num_gpus 2 --batch_size 30 \
--input_keep_prob 0.9 \
--load_path ${HOME}/trained_models/squad/bidaf_adam_baseline/basic-10000 \
--l1wd 0.0001 \
--row_col_wd 0.0004