Skip to content

Commit

Permalink
Merge pull request #7 from NLPIR-team/dev
Browse files Browse the repository at this point in the history
Add  Modules merge to master
  • Loading branch information
yangyaofei authored Dec 4, 2020
2 parents d26a5fc + 8caf6a4 commit 81e9e79
Show file tree
Hide file tree
Showing 55 changed files with 1,490 additions and 106 deletions.
33 changes: 29 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,32 @@ easier to use.

## TODO feature

- [] Native ICTCLAS
- [×] All Native NLPIR modules
- [×] High level API for All modules
- [×] Samples and Tutorial
- [] Native ICTCLAS
- [] All Native NLPIR modules
- [] High level API for All modules
- [] Samples and Tutorial

## Supported Table

| | Native | Native Doc | Native Test | High-Level | High-Level Doc | High-Level Test | Tutorial |
| ---- | :----: | :----: | :----: | :----: | :----: | :----: | :----: |
| ICTCLAS ||||||||
| NewWordFinder ||||||| |
| KeyExtract ||||||| |
| Summary ||||||| |
| SentimentNew |||| | | | |
| SentimentAnalysis |||| | | | |
| Classify |||| | | | |
| DeepClassify |||| | | | |
| Cluster || | | | | | |
| DocCompare | | | | | | | |
| DocExtractor | | | | | | | |
| DocParser | | | | | | | |
| iEncoder | | | | | | | |
| HTMLPaser | | | | | | | |
| KeyScanner || | | | | | |
| RedupRemover | | | | | | | |
| SpellChecker | | | | | | | |
| SplitSentence | | | | | | | |
| TextSimilarity || | | | | | |
| Word2vec | | | | | | | |
34 changes: 33 additions & 1 deletion docs/nlpir.native.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,14 @@ nlpir.native.new\_word\_finder module
:undoc-members:
:show-inheritance:

nlpir.native.summary module
-------------------------------------

.. automodule:: nlpir.native.summary
:members:
:undoc-members:
:show-inheritance:

nlpir.native.key\_extract module
-------------------------------------

Expand All @@ -33,8 +41,32 @@ nlpir.native.key\_extract module
:undoc-members:
:show-inheritance:

nlpir.native.deep_classifier module
-------------------------------------

.. automodule:: nlpir.native.deep_classifier
:members:
:undoc-members:
:show-inheritance:

nlpir.native.classifier module
-------------------------------------

.. automodule:: nlpir.native.classifier
:members:
:undoc-members:
:show-inheritance:

nlpir.native.sentiment module
-------------------------------------

.. automodule:: nlpir.native.sentiment
:members:
:undoc-members:
:show-inheritance:

nlpir.native.nlpir\_base module
-------------------------------
---------------------------------------

.. automodule:: nlpir.native.nlpir_base
:members:
Expand Down
35 changes: 34 additions & 1 deletion docs/nlpir.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,43 @@ nlpir.ictclas module
:undoc-members:
:show-inheritance:

nlpir.new\_word\_finder module
-------------------------------

.. automodule:: nlpir.new_word_finder
:members:
:undoc-members:
:show-inheritance:

nlpir.key\_extract module
----------------------------

.. automodule:: nlpir.key_extract
:members:
:undoc-members:
:show-inheritance:

nlpir.summary module
-----------------------

.. automodule:: nlpir.summary
:members:
:undoc-members:
:show-inheritance:


nlpir.tools module
------------------
--------------------

.. automodule:: nlpir.tools
:members:
:undoc-members:
:show-inheritance:

nlpir.exception module
------------------------

.. automodule:: nlpir.exception
:members:
:undoc-members:
:show-inheritance:
9 changes: 9 additions & 0 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ NLPIR本身为收费共享软件,对于科研和个人用户提供免费使用.
from nlpir import tools
tools.update_license()

或者使用命令 ``nlpir_update`` 更新

=====================
Custom init config
=====================
Expand Down Expand Up @@ -126,3 +128,10 @@ ICTCLAS支持两种词典添加方式,一种是直接持久化的添加方式保
但是此方法只能一次性删除所有,不能仅删除部分单词.所以,请谨慎使用.


====================================
New word finder 新词发现
====================================

====================================
Summary 摘要
====================================
Binary file not shown.
Binary file not shown.
Binary file not shown.
2 changes: 2 additions & 0 deletions nlpir/Data/DeepClassifier/Channel1_dc_class.pdat

Large diffs are not rendered by default.

Binary file not shown.
Binary file added nlpir/Data/DeepClassifier/Channel1_dc_model.dat
Binary file not shown.
Binary file added nlpir/Data/DeepClassifier/Channel1_dc_train.dat
Binary file not shown.
98 changes: 93 additions & 5 deletions nlpir/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,13 @@
import logging
import sys
import functools
from .exception import NLPIRException

__version__ = "0.0.1"
__version__ = "0.0.3"
PACKAGE_DIR = os.path.abspath(os.path.dirname(__file__))
logger = logging.getLogger("nlpir")


class NLPIRException(Exception):
pass


def clean_logs(data_path: typing.Optional[str] = None, include_current: bool = False):
"""
Clean logs
Expand Down Expand Up @@ -102,3 +99,94 @@ def init_setting(
init_module.__data__ = data_path if data_path is not None else init_module.__data__
init_module.__license_code__ = license_code if license_code is not None else init_module.__license_code__
return init_module


def import_dict(word_list: list, instance) -> list:
"""
Temporary add word as dictionary, will loss it when restart the Program.
Can use :func:`save_user_dict` to make persistence, :func:`clean_user_dict` to
delete all temporary words or :func:`delete_user_word` to delete part of them.
The persistent dict cannot be clean by using method above. :func:`clean_saved_user_dict`
will be used in this situation. But it will delete all user dict include saved dict in the past.
Every word in `word_list` can be a single word and the POS will be `n`. The custom POS can be added
as `word pos` in `word_list`.
:param instance: instance to execute the function
:param word_list: list of words want to add to NLPIR
:return: the word fail to add to the NLPIR
"""
if not hasattr(instance, "add_user_word"):
raise NLPIRException("This instance not support this method")
fail_list = list()
for word in word_list:
if 0 != instance.add_user_word(word):
fail_list.append(word_list)
return fail_list


def clean_user_dict(instance) -> bool:
"""
Clean all temporary dictionary, more information shows in :func:`import_dict`
:param instance: instance to execute the function
:return: success or not
"""
if not hasattr(instance, "clean_user_word"):
raise NLPIRException("This instance not support this method")
return instance.clean_user_word() == 0


def delete_user_word(word_list: list, instance):
"""
Delete words in temporary dictionary, more information shows in :func:`import_dict`
:param instance: instance to execute the function
:param word_list: list of words want to delete
"""
if not hasattr(instance, "del_usr_word"):
raise NLPIRException("This instance not support this method")
for word in word_list:
instance.del_usr_word(word)


def save_user_dict(instance) -> bool:
"""
Save temporary dictionary to Data, more information shows in :func:`import_dict`
:param instance: instance to execute the function
:return: Success or not
"""
if not hasattr(instance, "save_the_usr_dic"):
raise NLPIRException("This instance not support this method")
return 1 == instance.save_the_usr_dic()


def clean_saved_user_dict():
"""
Delete user dict from disk, which is :
1. ``Data/FieldDict.pdat``
2. ``Data/FieldDict.pos``
3. ``Data/FieldDict.wordlist``
4. ``Data/UserDefinedDict.lst``
:return: Delete success or not
"""
try:
# for ictclas
with open(os.path.join(PACKAGE_DIR, "Data/FieldDict.pdat"), 'w') as f:
f.write("")
with open(os.path.join(PACKAGE_DIR, "Data/FieldDict.pos"), 'w') as f:
f.write("")
with open(os.path.join(PACKAGE_DIR, "Data/FieldDict.wordlist"), 'w') as f:
f.write("")
with open(os.path.join(PACKAGE_DIR, "Data/UserDefinedDict.lst"), 'w') as f:
f.write("")
# for key_extract
with open(os.path.join(PACKAGE_DIR, "Data/UserDict.pdat"), 'w') as f:
f.write("")
return True
except OSError:
return False
4 changes: 4 additions & 0 deletions nlpir/exception.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# coding : utf-8

class NLPIRException(Exception):
pass
58 changes: 15 additions & 43 deletions nlpir/ictclas.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
"""
high-level toolbox for Chinese Word Segmentation
"""
from nlpir import get_instance as __get_instance__
from nlpir import native, PACKAGE_DIR
import typing
import re
import os
import typing
import nlpir
from nlpir import get_instance as __get_instance__
from nlpir import native

# class and class instance
__cls__ = native.ictclas.ICTCLAS
Expand Down Expand Up @@ -89,80 +89,52 @@ def process_to_generator(text: str, pos_tag: bool) -> typing.Generator:
@__get_instance__
def import_dict(word_list: list) -> list:
"""
Temporary add word as dictionary, will loss it when restart the Program.
Can use :func:`save_user_dict` to make persistence, :func:`clean_user_dict` to
delete all temporary words or :func:`delete_user_word` to delete part of them.
The persistent dict cannot be clean by using method above. :func:`clean_saved_user_dict`
will be used in this situation. But it will delete all user dict include saved dict in the past.
Every word in `word_list` can be a single word and the POS will be `n`. The custom POS can be added
as `word pos` in `word_list`.
See :func:`nlpir.import_dict`
:param word_list: list of words want to add to NLPIR
:return: the word fail to add to the NLPIR
"""
fail_list = list()
for word in word_list:
if 0 != __instance__.add_user_word(word):
fail_list.append(word_list)
return fail_list
return nlpir.import_dict(word_list=word_list, instance=__instance__)


@__get_instance__
def clean_user_dict() -> bool:
"""
Clean all temporary dictionary, more information shows in :func:`import_dict`
See :func:`nlpir.clean_user_dict`
:return: success or not
"""
return __instance__.clean_user_word() == 0
return nlpir.clean_user_dict(instance=__instance__)


@__get_instance__
def delete_user_word(word_list: list):
"""
Delete words in temporary dictionary, more information shows in :func:`import_dict`
See :func:`nlpir.delete_user_word`
:param word_list: list of words want to delete
"""
for word in word_list:
__instance__.del_usr_word(word)
return nlpir.delete_user_word(word_list=word_list, instance=__instance__)


@__get_instance__
def save_user_dict() -> bool:
"""
Save temporary dictionary to Data, more information shows in :func:`import_dict`
See :func:`nlpir.save_user_dict`
:return: Success or not
"""
return 1 == __instance__.save_the_usr_dic()
return nlpir.save_user_dict(instance=__instance__)


@__get_instance__
def clean_saved_user_dict():
"""
Delete user dict from disk, which is :
1. ``Data/FieldDict.pdat``
2. ``Data/FieldDict.pos``
3. ``Data/FieldDict.wordlist``
4. ``Data/UserDefinedDict.lst``
See :func:`nlpir.clean_saved_user_dict`
:return: Delete success or not
"""
try:
with open(os.path.join(PACKAGE_DIR, "Data/FieldDict.pdat"), 'w') as f:
f.write("")
with open(os.path.join(PACKAGE_DIR, "Data/FieldDict.pos"), 'w') as f:
f.write("")
with open(os.path.join(PACKAGE_DIR, "Data/FieldDict.wordlist"), 'w') as f:
f.write("")
with open(os.path.join(PACKAGE_DIR, "Data/UserDefinedDict.lst"), 'w') as f:
f.write("")
return True
except OSError:
return False
return nlpir.clean_saved_user_dict()


@__get_instance__
Expand Down
Loading

0 comments on commit 81e9e79

Please sign in to comment.