Skip to content

A tool for preprocessing of text data in Japanese for further machine learning. It uses MeCab for tokenization and part-of-speech tagging and Cabocha for shallow and deep parsing.

License

Notifications You must be signed in to change notification settings

ptaszynski/cabocha-extractor

Repository files navigation

caocha-extractor

[DESCRIPTION]

A tool for preprocessing of text data in Japanese for further machine learning. It uses MeCab for tokenization and part-of-speech tagging and Cabocha for shallow and deep parsing.

usage:

bash main.sh input_file.exe

[DEPENDENCIES]

MeCab, MeCab Perl binding, Cabocha.

About

A tool for preprocessing of text data in Japanese for further machine learning. It uses MeCab for tokenization and part-of-speech tagging and Cabocha for shallow and deep parsing.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published