Implementations of three models described in the three papers related to sequence matching:
-
Learning Natural Language Inference with Lstm by Shuohang Wang, Jing Jiang
-
Machine Comprehension Using Match-LSTM and Answer Pointer by Shuohang Wang, Jing Jiang
-
A Compare-Aggregate Model for Matching Text Sequences by Shuohang Wang, Jing Jiang
sh preprocess.sh snli
cd main
th main.lua -task snli -model mLSTM -dropoutP 0.3 -num_classes 3
sh preprocess.sh snli
will download the datasets and preprocess the SNLI corpus into the files
(train.txt dev.txt test.txt) under the path "data/snli/sequence" with the format:
sequence1(premise) \t sequence2(hypothesis) \t label(from 1 to num_classes) \n
main.lua
will first initialize the preprossed data and word embeddings into a Torch format and
then run the alogrithm. "dropoutP" is the main prarameter we tuned.
You may try to use Docker for running the code.
- Docker Install
- Image: docker pull shuohang/seqmatchseq:1.0
After installation, just run the following codes (/PATH/SeqMatchSeq need to change):
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt shuohang/seqmatchseq:1.0 /bin/bash -c "sh preprocess.sh snli"
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt/main shuohang/seqmatchseq:1.0 /bin/bash -c "th main.lua"
- Torch7
- nn
- nngraph
- optim
- parallel
- Python 2.7
- Python Packages: NLTK, collections, json, argparse
- NLTK Data: punkt
- Multiple-cores CPU
sh preprocess.sh squad
cd main
th mainDt.lua
sh preprocess.sh squad
will download the datasets and preprocess the SQuAD corpus into the files
(train.txt dev.txt) under the path "data/squad/sequence" with the format:
sequence1(Doument) \t sequence2(Question) \t sequence of the positions where the answer appear in Document (e.g. 3 4 5 6) \n
mainDt.lua
will first initialize the preprossed data and word embeddings into a Torch format and
then run the alogrithm. As this code is run through multiple CPU cores, the initial parameters are
written in the file "main/init.lua".
opt.num_processes
: 5. The number of threads used.opt.batch_size
: 6. Batch size for each thread. (Then the mini_batch would be 5*6 .)opt.model
: boundaryMPtr / sequenceMPtr
You may try to use Docker for running the code.
- Docker Install
- Image: docker pull shuohang/seqmatchseq:1.0
After installation, just run the following codes (/PATH/SeqMatchSeq need to change):
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt shuohang/seqmatchseq:1.0 /bin/bash -c "sh preprocess.sh squad"
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt/main shuohang/seqmatchseq:1.0 /bin/bash -c "th mainDt.lua"
- The Stanford Natural Language Inference (SNLI) Corpus
- MovieQA: Story Understanding Benchmark
- InsuranceQA Corpus V1: Answer Selection Task
- WikiQA: A Challenge Dataset for Open-Domain Question Answering
- GloVe: Global Vectors for Word Representation
For now, this code only support SNLI and WikiQA data sets.
SNLI task (The preprocessed format follows the previous description):
sh preprocess.sh snli
cd main
th main.lua -task snli -model compAggSNLI -comp_type submul -learning_rate 0.002 -mem_dim 150 -dropoutP 0.3
WikiQA task:
sh preprocess.sh wikiqa (Please first dowload the file "WikiQACorpus.zip" to the path SeqMatchSeq/data/wikiqa/ through address: https://www.microsoft.com/en-us/download/details.aspx?id=52419)
cd main
th main.lua -task wikiqa -model compAggWikiqa -comp_type mul -learning_rate 0.004 -dropoutP 0.04 -batch_size 10 -mem_dim 150
model
(model name) : compAggSNLI / compAggWikiqacomp_type
(8 different types of word comparison): submul / sub / mul / weightsub / weightmul / bilinear / concate / cos
You may try to use Docker for running the code.
- Docker Install
- Image: docker pull shuohang/seqmatchseq:1.0
After installation, just run the following codes (/PATH/SeqMatchSeq need to change):
For SNLI:
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt shuohang/seqmatchseq:1.0 /bin/bash -c "sh preprocess.sh snli"
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt/main shuohang/seqmatchseq:1.0 /bin/bash -c "th main.lua -task snli -model compAggSNLI -comp_type submul -learning_rate 0.002 -mem_dim 150 -dropoutP 0.3"
For WikiQA
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt shuohang/seqmatchseq:1.0 /bin/bash -c "sh preprocess.sh wikiqa"
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt/main shuohang/seqmatchseq:1.0 /bin/bash -c "th main.lua -task wikiqa -model compAggWikiqa -comp_type mul -learning_rate 0.004 -dropoutP 0.04 -batch_size 10 -mem_dim 150"
Copyright 2015 Singapore Management University (SMU). All Rights Reserved.