Word Sense Disambiguaion by Chinese Word Net

Chinese word sense disambiguation has been known to a very difficult problem since Chinese is a complicated language. A word can have dozens or even hundreds of meanings on different occasions. Manually labels the senses of the words is labor-intensive and inefficient.

In this project, we aim to solve this problem by state-of-the-art Bert model. It gives us huge performance gains and can score roughly 82% accuracy on Chinese word sense disambiguation problem.

Prerequest

Word Segmentation and POS Tagging is preferred but not required.

Suppose we have m sentences and each sentence has $n_m$ words.

list_of_sentence[ [list_of_word[[target, pos] * $n_m$ ]] *m ]

The following is an example that has 2 sentences. Input data should be formed as following:

  tagged=[[["他","Nh"],["由","P"],["昏沈","VH"],["的","DE"],["睡夢","Na"],["中","Ng"],["醒來","VH"],["，","COMMACATEGORY"]],
   [["臉","Na"],["上","Ncd"],["濕涼","VH"],["的","DE"],["騷動","Nv"],["是","SHI"],["淚","Na"],["。","PERIODCATEGORY"]]]

How to get sense

In your Project root directory

  pip install -U gdown
  git clone https://github.com/lopentu/CwnSenseTagger
  pip install -q CwnGraph transformers
  import sys
  if "dwsd-beta" not in sys.path:
      sys.path.append("dwsd-beta")
  
  from dotted_wsd import DottedWsdTagger
  tagger = DottedWsdTagger()

Basic Query
```
  tagger.wsd_tag("<打>電話")[0]
```
Query with a result after Word Segmentation and POS tagging
```
  tagger.sense_tag_per_sentence(tagged[0])
```

A Pipeline of ckip-transformers and the CWN sense tagger

See dwsd.ipynb

Acknowledgement

We thank Po-Wen Chen ([email protected]) and Yu-Yu Wu ([email protected]) for contributions in model development.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
__pycache__		__pycache__
dwsd-beta/dotted_wsd		dwsd-beta/dotted_wsd
README.md		README.md
README.zh.md		README.zh.md
dwsd.ipynb		dwsd.ipynb
get_wsd.py		get_wsd.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word Sense Disambiguaion by Chinese Word Net

Prerequest

How to get sense

A Pipeline of ckip-transformers and the CWN sense tagger

Acknowledgement

About

Releases

Packages

Languages

mcku1115/dwsd

Folders and files

Latest commit

History

Repository files navigation

Word Sense Disambiguaion by Chinese Word Net

Prerequest

How to get sense

A Pipeline of ckip-transformers and the CWN sense tagger

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages