LmCSC (Language Model-based Chinese Spelling Check)

This is an implementation of Chinese spelling check system.

Quick Links

About
Demon
Installation

About

The system mainly consists of the following three parts:

A Tri-gram Language Model
Confusionset
Other sources

Demo

Installation

Except for some pre-installed python libraries, there some additional packages needed to be installed in order to successfully run our system. We have listed the compulsory packages in the requirements.txt. Run the following commands to clone the repository and install LmCSC:

git clone https://github.com/wdimmy/LmCSC.git
cd LmCSC; pip install -r requirements.txt; python setup.py develop

Note: requirements.txt includes a subset of all the possible required packages. Depending on what you want to run, you might need to install an extra package.

You can train the langauge model using kenlm, or downlowed our already trained model by run:

chmod 777 ./download.sh 
./download.sh

NOTE: we provide two versions:

kenlm_3.bin（about 13GB): https://pan.baidu.com/s/1g7LL_sLs-ra2l9VxeDp-9w Extraction Code：0u3q

kenlm_3_small.bin (about 3GB): https://pan.baidu.com/s/1mMVVHmNtM_FXLJ5yIiRX7Q Extraction Code：91qj

The bigger one works better.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LmCSC (Language Model-based Chinese Spelling Check)

Quick Links

About

Demo

Installation

Files

README.md

Latest commit

History

README.md

File metadata and controls

LmCSC (Language Model-based Chinese Spelling Check)

Quick Links

About

Demo

Installation