Skip to content

Latest commit

 

History

History
61 lines (39 loc) · 3.05 KB

REPLICATE.md

File metadata and controls

61 lines (39 loc) · 3.05 KB

Replicating published results for the AMALGrAM system

Nathan Schneider
2016-04-14

This document describes how to replicate the joint MWE+supersense tagging results in Schneider et al., NAACL-HLT 2015. (To replicate the MWE-only results in Schneider et al., TACL 2014, see instructions in README.md.)

  1. Download STREUSLE 2.0 instead of the most recent STREUSLE version.

  2. Separate streusle.sst into train and test splits by sentence ID.

  3. Run the following bash commands to remove the ` and `j label refinements but keep the MWE part of the tag:

# keep `a (auxiliaries), remove `j and plain `
src/sst2tags.py $trainsst | sed -r $'s/-`j?\t/\t/g' > $traintags
src/sst2tags.py $testsst | sed -r $'s/-`j?\t/\t/g' > $testtags
cut -f5 $traintags | sort | uniq | sed '/^\s*$/d' > $tagset
  1. Generate said.json, or if you do not have access to the SAID lexicon, you can create an empty file.

  2. Download the pretrained model file sst.model.pickle.gz. Run gunzip to attempt to unzip it; if this produces an error, your web browser has probably unzipped it for you, so just remove the .gz extension from the filename. Then run predict_sst.sh on the test data.

  3. You can retrain the model with the command below (with $train and $test data files):

     python2.7 src/main.py --cutoff 5 --iters 4 --YY tagsets/bio2gNV_dim --defaultY O --debug --train $train --test-predict $test --bio NO_SINGLETON_B --cluster-file mwelex/yelpac-c1000-m25.gz --clusters --lex mwelex/{semcor_mwes,wordnet_mwes,said,phrases_dot_net,wikimwe,enwikt}.json
    

    To save the learned model to a file, add --save $modelfilename.

Expected results with said.json

For the condition in the paper which internally I call Cutoff.5+FClust.1+FSet.sst+LearningCurve.1+Test.1:

MWE scores (end of mweval.py output):

   P   |   R   |   F   |   EP  |   ER  |   EF  |  Acc  |   O   | non-O | ingap | B vs I | strength
 71.05%  56.24%  62.74%  63.98%  55.16%  59.22%  90.17%  96.91%  58.45%  98.38%  95.33%  100.00%
                                                   6466    6026     557     548     531     299
                                                   7171    6218     953     557     557     299

Supersense scores (end of ssteval.py output):

  Acc  |   P   |   R   |   F   || R: NSST | VSST |  `a  | PSST
 82.49%  69.47%  71.90%  70.67%    66.95%  74.17%  94.97%  nan%
   5915    1684    1684               798     735     151       0
   7171    2424    2342              1192     991     159       0

Expected results without said.json

A user who trained and tested a model without SAID features (said.json) reports obtaining F1 scores of 61.37% for MWEs and 70.12% for supersenses.

Acknowledgments

Thanks to Java Hosseini, Haibo Ding, and Youhyun Shin for requesting clarification on replicating the results, prompting this explanation.