Skip to content

Latest commit

 

History

History
185 lines (131 loc) · 7.43 KB

README_EN.md

File metadata and controls

185 lines (131 loc) · 7.43 KB

Build Status Financial Contributors on Open Collective Author Donate Platform Performance License NpmDownload Status NPM Version Code Climate Coverage Status


NodeJieba 简体中文

logo

Introduction

NodeJieba provides chinese word segmentation for Node.js based on CppJieba.

Install

npm install nodejieba

Or cnpm instead of npm

npm install nodejieba --registry=https://registry.npmmirror.com --nodejieba_binary_host_mirror=https://npm.taobao.org/mirrors/nodejieba

Usage

var nodejieba = require("nodejieba");
var result = nodejieba.cut("南京市长江大桥");
console.log(result);
//["南京市","长江大桥"]

See details in test/demo.js

Initialization

Initialization is optional and will be executed once cut is called with the default dictionaries.

Loading the default dictionaries can be called explicitly by

nodejieba.load();

This is similar to the internal call of

nodejieba.load({
  dict: './dict/jieba.dict.utf8',
  hmmDict: './dict/hmm_model.utf8',
  userDict: './dict/userdict.utf8',
  idfDict: './dict/idf.utf8',
  stopWordDict: './dict/stop_words.utf8',
});

If a dictionary parameter is missing, its default value will be uesd.

Dictionary description

  • dict: the main dictionary with weight and lexical tags, it's recommended to use the default dictionary
  • hmmDict: hidden markov model, it's recommended to use the default dictionary
  • userDict: user dictionary, it's recommended to modify it to your use case
  • idfDict: idf information for keyword extraction
  • stopWordDict: list of stop words for keyword extraction

POS Tagging

var nodejieba = require("nodejieba");
console.log(nodejieba.tag("红掌拨清波"));
//[ { word: '红掌', tag: 'n' },
//  { word: '拨', tag: 'v' },
//  { word: '清波', tag: 'n' } ]

See details in test/demo.js

Keyword Extractor

var nodejieba = require("nodejieba");
var topN = 4;
console.log(nodejieba.extract("升职加薪,当上CEO,走上人生巅峰。", topN));
//[ { word: 'CEO', weight: 11.739204307083542 },
//  { word: '升职', weight: 10.8561552143 },
//  { word: '加薪', weight: 10.642581114 },
//  { word: '巅峰', weight: 9.49395840471 } ]

console.log(nodejieba.textRankExtract("升职加薪,当上CEO,走上人生巅峰。", topN));
//[ { word: '当上', weight: 1 },
//  { word: '不用', weight: 0.9898479330698993 },
//  { word: '多久', weight: 0.9851260595435759 },
//  { word: '加薪', weight: 0.9830464899847804 },
//  { word: '升职', weight: 0.9802777682279076 } ]

See details in test/demo.js

Testing

Testing passed in the following version:

  • node v10
  • node v12
  • node v14
  • node v15

Use Cases

Similar projects

Performance

It is supposed to have the best performance out of all available Node.js modules. There is a post available in mandarin Jieba中文分词系列性能评测.

Online Demo

http://cppjieba-webdemo.herokuapp.com/ (chrome is suggested)

Contact

Email: [email protected]

Author

Contributors

Code Contributors

This project exists thanks to all the people who contribute. [Contribute].

Financial Contributors

Become a financial contributor and help us sustain our community. [Contribute]

Individuals

Organizations

Support this project with your organization. Your logo will show up here with a link to your website. [Contribute]