A multi-task learning CNN-RNN model combined together with the potential of task-optimized phonetic features to predict the Lemma, POS category, Gender, Number, Person, Case, and Tense-aspect-mood (TAM) of Hindi words.
git clone [email protected]:Saurav0074/morph_analyzer.git
cd morph_analyzer
The file main.py
takes the following command-line arguments:
Argument | Values | Required | Specification |
---|---|---|---|
lang | hindi, urdu | Yes | Language |
mode | train, test and predict (i.e., no gold labels required) | Yes | Training, testing and predictions. |
phonetic | True/1/yes/y/t and False/0/no/n/f | No (default=False ) |
Use MOO-driven phonological features or not. |
freezing | " " and " " | No (default=False ) |
Use progressive freezing for training or not (see FreezeOut). |
train
and test
modes operate upon the standard train-test split specified by the HDTB and UDTB datasets (see datasets
README while predict
uses the text provided manually in src/[lang]_predict_data/
.
Training:
>>> python main.py --lang urdu --mode train --phonetic true --freezing true #train
Testing:
>>> python main.py --lang urdu --mode test --phonetic true --freezing true #test
Predicting:
>>> python main.py --lang urdu --mode predict --phonetic true --freezing true #predict
For prediction, the plain text should be provided within src/[lang]_predict_data/test_data.txt
.
For the test mode:
- the predicted roots and features as well as their gold-labelled counterparts are written to separate files within
output/[lang]/roots.txt, feature_0.txt, ..., feature_6.txt
. - Micro-averaged precision-recall graphs are stored in
graph_outputs/[lang]/
.
For the predict mode, all the predictions (i.e., roots + features) are written to: output/[lang]/predictions.txt
.
Micro-averaged precision-recall cuves for each class arranged by increasing F1 scores:
If this repo was helpful in your research, consider citing our work:
@article{jha2018multi,
title={Multi Task Deep Morphological Analyzer: Context Aware Joint Morphological Tagging and Lemma Prediction},
author={Jha, Saurav and Sudhakar, Akhilesh and Singh, Anil Kumar},
journal={arXiv preprint arXiv:1811.08619},
year={2018}
}