Multi-Task Deep Morph Analyzer

A multi-task learning CNN-RNN model combined together with the potential of task-optimized phonetic features to predict the Lemma, POS category, Gender, Number, Person, Case, and Tense-aspect-mood (TAM) of Hindi words.

Framework

Getting started

Clone the repository

git clone [email protected]:Saurav0074/morph_analyzer.git
cd morph_analyzer

Provide the arguments

The file main.py takes the following command-line arguments:

Argument	Values	Required	Specification
lang	hindi, urdu	Yes	Language
mode	train, test and predict (i.e., no gold labels required)	Yes	Training, testing and predictions.
phonetic	True/1/yes/y/t and False/0/no/n/f	No (default=`False`)	Use MOO-driven phonological features or not.
freezing	" " and " "	No (default=`False`)	Use progressive freezing for training or not (see FreezeOut).

train and test modes operate upon the standard train-test split specified by the HDTB and UDTB datasets (see datasets README while predict uses the text provided manually in src/[lang]_predict_data/.

Sample run commands:

Training:

>>> python main.py --lang urdu --mode train --phonetic true --freezing true #train

Testing:

>>> python main.py --lang urdu --mode test --phonetic true --freezing true #test

Predicting:

>>> python main.py --lang urdu --mode predict --phonetic true --freezing true #predict

For prediction, the plain text should be provided within src/[lang]_predict_data/test_data.txt.

Outputs

For the test mode:

the predicted roots and features as well as their gold-labelled counterparts are written to separate files within output/[lang]/roots.txt, feature_0.txt, ..., feature_6.txt.
Micro-averaged precision-recall graphs are stored in graph_outputs/[lang]/.

For the predict mode, all the predictions (i.e., roots + features) are written to: output/[lang]/predictions.txt.

Graph outputs

Micro-averaged precision-recall cuves for each class arranged by increasing F1 scores:

Citation

If this repo was helpful in your research, consider citing our work:

@article{jha2018multi,
  title={Multi Task Deep Morphological Analyzer: Context Aware Joint Morphological Tagging and Lemma Prediction},
  author={Jha, Saurav and Sudhakar, Akhilesh and Singh, Anil Kumar},
  journal={arXiv preprint arXiv:1811.08619},
  year={2018}
}

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
config		config
datasets		datasets
graph_outputs		graph_outputs
output		output
resources		resources
src		src
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Task Deep Morph Analyzer

Framework

Getting started

Clone the repository

Provide the arguments

Sample run commands:

Outputs

Graph outputs

Citation

About

Releases

Packages

Languages

NLPRL/MTDMA-Morph-Analyzer

Folders and files

Latest commit

History

Repository files navigation

Multi-Task Deep Morph Analyzer

Framework

Getting started

Clone the repository

Provide the arguments

Sample run commands:

Outputs

Graph outputs

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages