Skip to content

A Keras-based multi-tasking morphological analyzer for Hindi and Urdu

Notifications You must be signed in to change notification settings

srvCodes/morph_analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Task Deep Morph Analyzer made-with-python

A multi-task learning CNN-RNN model combined together with the potential of task-optimized phonetic features to predict the Lemma, POS category, Gender, Number, Person, Case, and Tense-aspect-mood (TAM) of Hindi words.

image

Framework

image2

Getting started

Clone the repository

git clone [email protected]:Saurav0074/morph_analyzer.git
cd morph_analyzer

Provide the arguments

The file main.py takes the following command-line arguments:

Argument Values Required Specification
lang hindi, urdu Yes Language
mode train, test and predict (i.e., no gold labels required) Yes Training, testing and predictions.
phonetic True/1/yes/y/t and False/0/no/n/f No (default=False) Use MOO-driven phonological features or not.
freezing " " and " " No (default=False) Use progressive freezing for training or not (see FreezeOut).

train and test modes operate upon the standard train-test split specified by the HDTB and UDTB datasets (see datasets README while predict uses the text provided manually in src/[lang]_predict_data/.

Sample run commands:

Training:

>>> python main.py --lang urdu --mode train --phonetic true --freezing true #train

Testing:

>>> python main.py --lang urdu --mode test --phonetic true --freezing true #test

Predicting:

>>> python main.py --lang urdu --mode predict --phonetic true --freezing true #predict

For prediction, the plain text should be provided within src/[lang]_predict_data/test_data.txt.

Outputs

For the test mode:

  • the predicted roots and features as well as their gold-labelled counterparts are written to separate files within output/[lang]/roots.txt, feature_0.txt, ..., feature_6.txt.
  • Micro-averaged precision-recall graphs are stored in graph_outputs/[lang]/.

For the predict mode, all the predictions (i.e., roots + features) are written to: output/[lang]/predictions.txt.

Graph outputs

Micro-averaged precision-recall cuves for each class arranged by increasing F1 scores:

pr-curves

Citation

If this repo was helpful in your research, consider citing our work:

@article{jha2018multi,
  title={Multi Task Deep Morphological Analyzer: Context Aware Joint Morphological Tagging and Lemma Prediction},
  author={Jha, Saurav and Sudhakar, Akhilesh and Singh, Anil Kumar},
  journal={arXiv preprint arXiv:1811.08619},
  year={2018}
}

About

A Keras-based multi-tasking morphological analyzer for Hindi and Urdu

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages