Skip to content

Latest commit

 

History

History

bidaf

Learning Intrinsic Sparse Structures in BiDAF

Use python -m basic.cli --help for usage.

Requirements

General

  • Python (developed on 3.5.2. Issues have been reported with Python 2!)
  • unzip

Python Packages

  • tensorflow (deep learning library, verified on 1.1.0)
  • nltk (NLP tools, verified on 3.2.1)
  • tqdm (progress bar, verified on 4.7.4)
  • jinja2 (for visaulization; if you only train and test, not needed)

Pre-processing

First, prepare data. Donwload SQuAD data and GloVe and nltk corpus (~850 MB, this will download files to $HOME/data):

chmod +x download.sh; ./download.sh

Second, Preprocess Stanford QA dataset (along with GloVe vectors) and save them in $PWD/data/squad (~5 minutes):

python -m squad.prepro

Training BiDAF baseline

Note that the training script saves results in a subfolder of out named as ${TIMESTAMP} (e.g. out/2017-07-10___21-37-44/). Before running, please:

mkdir out

The model requires at least 12GB of GPU RAM. If your GPU RAM is smaller than 12GB, you can either decrease batch size (performance might degrade), or you can use multi GPU (see below). The training converges at ~10k steps, and it took ~10 hours in Titan X.

Before training, it is recommended to first try the following code to verify everything is okay and memory is sufficient:

python -m basic.cli --mode train --noload --debug

Then to fully train baseline without sparsity learning, run:

python -m basic.cli --mode train --noload

You can speed up the training process with optimization flags:

python -m basic.cli --mode train --noload --len_opt --cluster

You can still omit them, but training will be much slower.

Our model supports multi-GPU training. We follow the parallelization paradigm described in TensorFlow Tutorial. In short, if you want to use batch size of 60 (default) but if you have 2 GPUs with 6GB of RAM, then you initialize each GPU with batch size of 30, and combine the gradients on CPU. This can be easily done by running:

python -m basic.cli --mode train --noload --num_gpus 2 --batch_size 30

Test

To test, run:

# run test by specifying the shared json and trained model
export TIMESTAMP=2017-07-10___21-37-44
python -m basic.cli --len_opt --cluster \
--shared_path out/${TIMESTAMP}/basic/00/shared.json \
--load_path out/${TIMESTAMP}/basic/00/save/basic-10000 # the model saved at step 10000

# Test by multi-gpus
python -m basic.cli --len_opt --cluster \
--num_gpus 2 --batch_size 30 \
--zero_threshold 0.02 \ # zero out small weights whose absolute values are <0.02
--shared_path out/${TIMESTAMP}/basic/00/shared.json \
--load_path out/${TIMESTAMP}/basic/00/save/basic-10000 # the model saved at step 10000

We can also input --group_config groups_hidden100.json to plot the sizes of ISS and ISS sparsity. An example:

structure sparsity:
16/100 
10/100 
62/100 
54/100 
66/100 
79/100 
group sizes:
[4800, 4800, 3201, 3201, 6401, 6401]

This command loads the saved model during training and begins testing on the test data. After the process ends, it prints F1 and EM scores, and also outputs a json file ($PWD/out/basic/00/answer/test-000000.json). Note that the printed scores are not official (our scoring scheme is a bit harsher). To obtain the official number, use the official evaluator (copied in squad folder) and the output json file:

python squad/evaluate-v1.1.py \
$HOME/data/squad/dev-v1.1.json out/basic/00/answer/test-000000.json

Learning sparse LSTMs

Learning ISS (hidden states) in LSTMs

python -m basic.cli --mode train --len_opt --cluster \
--num_gpus 2 --batch_size 30 \
--input_keep_prob 0.9 \
--load_path out/${TIMESTAMP}/basic/00/save \ # fine-tune baseline, use [--noload] if train from scratch
--structure_wd 0.001 \ # the hyperparameter to make trade-off between sparsity and EM/F1 performance
--group_config groups_hidden100.json # the json to specify ISS structures for LSTMs

Misc

Learning sparse LSTMs by L1-norm regularization

# finetuning with L1
python -m basic.cli --mode train --len_opt --cluster \
--num_gpus 2 --batch_size 30 \
--input_keep_prob 0.9 \
--load_path ${HOME}/trained_models/squad/bidaf_adam_baseline/basic-10000 \
--l1wd 0.0002

Learning to remove columns and rows in the weight matrices of LSTMs by group Lasso regularization

python -m basic.cli --mode train --len_opt --cluster \
--num_gpus 2 --batch_size 30 \
--input_keep_prob 0.9 \
--load_path ${HOME}/trained_models/squad/bidaf_adam_baseline/basic-10000 \
--l1wd 0.0001 \
--row_col_wd 0.0004