Skip to content

Program to convert a Kana Kanji Conversion Input to Pronunciation

License

Notifications You must be signed in to change notification settings

gologo13/kkci2pron

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kkci2pron(Kana Kanji Conversion Input to Pronunciation)

Contribution

The kkci2pron converts a Japanese yomi, precisely Kana-Kanji conversion input, to a pronunciation.

With this program, you can generate a speaking-style corpus from a writing-style corpus annotated with word boundaries and Japanese yomis, then can construct a speaking-stype language model. You can improve an accurary of a speech recogniton system by combining this language model and the domain-independent large corpus, i.e. CSJ. This is proven [1].

This program is developed by Yohei Yamaguchi when he was a graduate student. If you have an any question, please contact him.

Installation

$ git clone git://github.com/gologo13/kkci2pron

You must install Kyfd (the Kyoto Fst Decoder) before running kkci2pron.

Configuration

Edit config.xml to setup kyfd before running the kkci2pron.

Usage

$ cat sample.txt
私/ワタシ は/ハ 太郎/タロウ です/デス
気温/キオン 変動/ヘンドウ
$ perl bin/kkci2pron.pl < sample.txt
私/ワタシ は/ワ 太郎/タロー です/デス
気温/キオン 変動/ヘンドー

Input Format

An input text must follow the following format.

text := sentence + '\n'(newline character) + sentence + … + sentence

sentence := unit + ' '(space) + unit + … + unit

unit := word + '/'(slash) + yomi

word := (Japanese Full-width Character)+

yomi := (Japanese Full-width Katakana Character)+

Next, an input text must be encoded in UTF8.

License

MIT License. Please see the LICENSE file for details.

Reference

[1]山口 洋平、森 信介、河原 達也
仮名漢字変換ログを用いた講義音声認識のための言語モデル適応
言語処理学会第18回年次大会(NLP2012)、広島、March 2012

About

Program to convert a Kana Kanji Conversion Input to Pronunciation

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages