This is an implementation of Chinese spelling check system.
The system mainly consists of the following three parts:
- A Tri-gram Language Model
- Confusionset
- Other sources
Except for some pre-installed python libraries, there some additional packages needed to be installed in order to successfully run our system. We have listed the compulsory packages in the requirements.txt. Run the following commands to clone the repository and install LmCSC:
git clone https://github.com/wdimmy/LmCSC.git
cd LmCSC; pip install -r requirements.txt; python setup.py develop
Note: requirements.txt includes a subset of all the possible required packages. Depending on what you want to run, you might need to install an extra package.
You can train the langauge model using kenlm, or downlowed our already trained model by run:
chmod 777 ./download.sh
./download.sh
NOTE: we provide two versions:
kenlm_3.bin(about 13GB): https://pan.baidu.com/s/1g7LL_sLs-ra2l9VxeDp-9w Extraction Code:0u3q
kenlm_3_small.bin (about 3GB): https://pan.baidu.com/s/1mMVVHmNtM_FXLJ5yIiRX7Q Extraction Code:91qj
The bigger one works better.