Skip to content

Latest commit

 

History

History
65 lines (39 loc) · 1.49 KB

README.md

File metadata and controls

65 lines (39 loc) · 1.49 KB

LmCSC (Language Model-based Chinese Spelling Check)

This is an implementation of Chinese spelling check system.

Quick Links

About

The system mainly consists of the following three parts:

  • A Tri-gram Language Model
  • Confusionset
  • Other sources

Demo

Installation

Except for some pre-installed python libraries, there some additional packages needed to be installed in order to successfully run our system. We have listed the compulsory packages in the requirements.txt. Run the following commands to clone the repository and install LmCSC:

git clone https://github.com/wdimmy/LmCSC.git
cd LmCSC; pip install -r requirements.txt; python setup.py develop

Note: requirements.txt includes a subset of all the possible required packages. Depending on what you want to run, you might need to install an extra package.

You can train the langauge model using kenlm, or downlowed our already trained model by run:

chmod 777 ./download.sh 
./download.sh 

NOTE: we provide two versions:

kenlm_3.bin(about 13GB): https://pan.baidu.com/s/1g7LL_sLs-ra2l9VxeDp-9w Extraction Code:0u3q

kenlm_3_small.bin (about 3GB): https://pan.baidu.com/s/1mMVVHmNtM_FXLJ5yIiRX7Q Extraction Code:91qj

The bigger one works better.