Skip to content

MANASLU8/CoreNLPRusModels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 

Repository files navigation

CoreNLPRusModels

Stanford Tagger and NN Dependency Parser Models for Russian Language

  1. Parser models
  2. Tagger models and lemmatization resources

Getting Started with Pipeline for Russian language

  1. Clone CoreNLP from the project repository.

  2. Download resources for lemmatization 'dict.tsv', tagger and parser models using links in section 'CoreNLPRusModels' above.

  3. Build the project and run the Launcher (edu.stanford.nlp.international.russian.process.Launcher).
    Obligatory Launcher parameters are the following:

  • -tagger - filepath to POS-tagging model russian-ud-pos.tagger;
  • -taggerMF - filepath to POS-tagging model russian-ud-mf.tagger, which outputs POS-tags with inflectional morphological features (according to UD v.2), and these morpho features are reused by the parsing model;
  • -mf - if this flag is True, inflectional morphology is written to the FEATS field of the CoNLL annotations;
  • -parser - dependency parser model, inventory of syntactic relations meets UD v.2, better start with the model nndep.rus.modelMFWiki100HS400_80.txt.gz, which uses embeddings, trained on Wikipedia dump;
  • -pLemmaDict - filepath to dict.tsv, preferrably to put it to /CoreNLP/src/edu/stanford/nlp/international/russian/process directory;
  • -pText - filepath to input file, encoding = UTF-8; /home/filepath/input_file.txt
  • -pResults - filepath to output file '.conll', format = CoNLL-U.
  1. Running from console example:
java -Xmx8g edu.stanford.nlp.international.russian.process.Launcher -tagger russian-ud-pos.tagger -taggerMF russian-ud-mf.tagger -pLemmaDict src/edu/stanford/nlp/international/russian/process/dict.tsv -parser nndep.rus.modelMFWiki100HS400_80.txt.gz -pText input.txt -pResults output.conll -mf 

Other Requirements

  • Java 1.8
  • allocate at less 5 Gb for JVM: -Xmx5g
  • input file encoding: UTF-8

If you find the pipeline useful in your research, please consider citing our paper:

@inproceedings{DBLP:conf/kesw/KovriguinaSSP17,
  author    = {Liubov Kovriguina and
               Ivan Shilin and
               Alexander Shipilo and
               Alina Putintseva},
  title     = {Russian Tagging and Dependency Parsing Models for Stanford CoreNLP
               Natural Language Toolkit},
  booktitle = {Knowledge Engineering and Semantic Web - 8th International Conference,
               {KESW} 2017, Szczecin, Poland, November 8-10, 2017, Proceedings},
  pages     = {101--111},
  year      = {2017},
  doi       = {10.1007/978-3-319-69548-8\_8}
}

About

Stanford NN Dependency Parser, Russian language

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published