- Clone the repository
- Install the following python packages:
pandas
tqdm
conllu
- In the root folder of the project create the following directory tree structure (case sensitive), putting the train and test file from https://github.com/Babelscape/wikineural/tree/master/data/wikineural/en and https://github.com/Babelscape/wikineural/tree/master/data/wikineural/en in their respective folders:
data
├── outputs
│ ├── DECODING
│ │ ├── EN
│ │ └── IT
│ └── LEARNING
│ ├── EN
│ └── IT
├── wikineural_en
│ ├── test.conllu
│ └── train.conllu
└── wikineural_it
├── test.conllu
└── train.conllu
- Start the Training launching:
learning.py LANG
replacing LANG withIT
orEN
(if no argument is passed it defaults to IT). The results of the training will be saved indata/outputs/DECODING/LANG/
in CSV format - Launch the Decoding with:
main.py LANG BASELINE
replacingLANG
withIT
orEN
andBASELINE
withbaseline
to calculate the NER tags with the baseline method. To don't calculate the baseline don't pass any argument asBASELINE
. The results will be saved as CSV files indata/outputs/DECODING/LANG
- To get the NER tags of the 3 sentences present in the slides launch
decode.py
. The result will be printed to console in the CoNLL-U format