This module is part of IXA-Pipeline, a multilingual NLP pipeline developed by the IXA NLP Group (ixa.si.ehu.es).
Ixa-pipe-srl provides a wrapper for English and Spanish dependency parser and semantic role labeller using mate-tools (https://code.google.com/p/mate-tools/). The module takes tokenized and POS-tagged text in NAF format as standard input and outputs syntactic and semantic analysis also in NAF.
The models for English and Spanish have been trained using PropBank(http://verbs.colorado.edu/~mpalmer/projects/ace.html), NomBank(http://nlp.cs.nyu.edu/meyers/NomBank.html) Ancora corpus (http://clic.ub.edu/corpus/en/ancora) in CoNLL 2009 Shared Task format (http://ufal.mff.cuni.cz/conll2009-st/).
The semantic annotation provided by the module is enriched using the PredicateMatrix (http://adimen.si.ehu.es/web/PredicateMatrix).
Installing the ixa-pipe-srl requires the following steps:
If you already have installed in your machine JDK7 and MAVEN 3, please go to step 3 directly. Otherwise, follow these steps:
If you do not install JDK in a default location, you will probably need to configure the PATH in .bashrc or .bash_profile:
export JAVA_HOME=/yourpath/local/java7
export PATH=${JAVA_HOME}/bin:${PATH}
If you use tcsh you will need to specify it in your .login as follows:
setenv JAVA_HOME /usr/java/java17
setenv PATH ${JAVA_HOME}/bin:${PATH}
If you re-login into your shell and run the command
java -version
You should now see that your jdk is 1.7
Download MAVEN 3 from
wget http://www.apache.org/dyn/closer.cgi/maven/maven-3/3.0.4/binaries/apache-maven-3.0.4-bin.tar.gz
Now you need to configure the PATH. For Bash Shell:
export MAVEN_HOME=/path-to-apache-maven/apache-maven-3.0.4
export PATH=${MAVEN_HOME}/bin:${PATH}
For tcsh shell:
setenv MAVEN3_HOME ~/local/apache-maven-3.0.4
setenv PATH ${MAVEN3}/bin:{PATH}
If you re-login into your shell and run the command
mvn -version
You should see reference to the MAVEN version you have just installed plus the JDK that is using.
git clone https://github.com/newsreader/ixa-pipe-srl
Some dependencies are not included in maven repositories, but they can be found in the Mate-tools package. First download mate-tools package:
wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/mate-tools/srl-4.3.tgz
Notice that the version of mate-tools needed is the 4.3. The module will not work with a higher version. The external references you should extract are liblinear-1.51-with-deps.jar, seg.jar and srl.jar:
tar -zxvf srl-4.3.tgz srl-20130917/lib/liblinear-1.51-with-deps.jar
tar -zxvf srl-4.3.tgz srl-20130917/lib/seg.jar
tar -zxvf srl-4.3.tgz srl-20130917/srl.jar
Now, install these dependencies into the local maven repository:
mvn install:install-file -Dfile=srl-20130917/lib/liblinear-1.51-with-deps.jar -DgroupId=local -DartifactId=liblinear -Dversion=1.51 -Dpackaging=jar
mvn install:install-file -Dfile=srl-20130917/srl.jar -DgroupId=local -DartifactId=srl -Dversion=1.0 -Dpackaging=jar
mvn install:install-file -Dfile=srl-20130917/lib/seg.jar -DgroupId=local -DartifactId=seg -Dversion=1.0 -Dpackaging=jar
cd ixa-pipe-srl/IXA-EHU-srl
mvn clean package
This step will create a directory called target/ which contains various directories and files. Most importantly, there you will find the module executable:
IXA-EHU-srl-3.0.jar
This executable contains every dependency the module needs, so it is completely portable as long as you have a JVM 1.7 installed.
To install the module in the local maven repository, usually located in ~/.m2/, execute:
mvn clean install
In the target directory, where the executable jar is, you must create a directory for trained modules. This directory should contain two subdirectories, one for the english modules and another for the spanish ones:
mkdir target/models
mkdir target/models/eng
mkdir target/models/spa
The module needs models for dependency parsing and semantic role labeling. You can get all the models required with the following commands:
wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/mate-tools/CoNLL2009-ST-English-ALL.anna-3.3.parser.model --directory-prefix=target/models/eng/
wget http://fileadmin.cs.lth.se/nlp/models/srl/en/srl-20100906/srl-eng.model --directory-prefix=target/models/eng/
wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/mate-tools/CoNLL2009-ST-Spanish-ALL.anna-3.3.parser.model --directory-prefix=target/models/spa/
wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/mate-tools/CoNLL2009-ST-Spanish-ALL.anna-3.3.morphtagger.model --directory-prefix=target/models/spa/
wget http://adimen.si.ehu.es/web/files/AnCoraModel/srl-spa.model --directory-prefix=target/models/spa/
Move the "models.conf" this file to the target directory.
mv models.cnf target/
Edit the "models.conf" file to set the path to the downloaded models.
In the target directory you must create a directory called PredicateMatrix.
mkdir target/PredicateMatrix
Now download and unpack the PredicateMatrix into that directory:
wget http://adimen.si.ehu.es/web/files/PredicateMatrix/PredicateMatrix.srl-module.tar.gz
tar -zxf PredicateMatrix.srl-module.tar.gz -C target/PredicateMatrix/
rm -f PredicateMatrix.srl-module.tar.gz
The input of the program must be tokenized and POS-tagged text in NAF format and must be giving as standard input.
First, the server must been initialize:
java -Xms2500m -cp /path-to-the-jar/target/IXA-EHU-srl-3.0.jar ixa.srl.SRLServer en
It is strongly recomended to reserve at least 2,5 gigabytes of memory for the execution of the server module. After loading all the modules the server will stay listening for client petitions. The client module can be executed in three different modes:
To perform dependency parsing and semantic role labelling:
cat infile.naf | java -cp /path-to-the-jar/IXA-EHU-srl-3.0.jar ixa.srl.SRLClient en
To perform just dependency parsing:
cat infile.naf | java -cp /path-to-the-jar/IXA-EHU-srl-3.0.jar ixa.srl.SRLClient en only-deps
To perform just semantic role labelling:
cat infile.naf | java -cp /path-to-the-jar/IXA-EHU-srl-3.0.jar ixa.srl.SRLClient en only-srl
In the last case the input in NAF must contain syntactic dependencies too.
To run the server for Spanish:
java -Xms2500m -cp /path-to-the-jar/IXA-EHU-srl-3.0.jar ixa.srl.SRLServer es
To run the program for Spanish:
cat infile.naf | java -cp /path-to-the-jar/IXA-EHU-srl-3.0.jar ixa.srl.SRLClient es
cat infile.naf | java -cp /path-to-the-jar/IXA-EHU-srl-3.0.jar ixa.srl.SRLClient es only-deps
cat infile.naf | java -cp /path-to-the-jar/IXA-EHU-srl-3.0.jar ixa.srl.SRLClient es only-srl