Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

加载用户字典不起作用以及实体未识别出来的情况 #45

Open
ShuGao0810 opened this issue Sep 29, 2018 · 3 comments
Open
Labels

Comments

@ShuGao0810
Copy link

ShuGao0810 commented Sep 29, 2018

博主好,foolnltk使用时发现加载用户字典不起作用,不知道是什么原因导致的,具体如下:
环境:win10+python3.6

fool.analysis('阿里收购饿了么')
返回:([[('阿里', 'nz'), ('收购', 'v'), ('饿', 'v'), ('了', 'y'), ('么', 'y')]], [[(0, 3, 'company', '阿里')]])

用户字典格式:
饿了么 10

fool.load_userdict(path)
fool.analysis('阿里收购饿了么')
返回:([[('阿里', 'nz'), ('收购', 'v'), ('饿', 'v'), ('了', 'y'), ('么', 'y')]], [[(0, 3, 'company', '阿里')]])

加载用户字典似乎不起作用?分词时“饿了么”还是被拆开了,实体识别中也没识别出来

@rockyzhengwu
Copy link
Owner

@ShuGao0810 谢谢你的反馈,现在的词典在分词的时候是有效的,analysis 不支持,稍后修改

@xrzlizheng
Copy link

如何加载jieba格式的字典,

@yu45020
Copy link

yu45020 commented Dec 14, 2018

@ShuGao0810
或许可行的解决办法:修改__init__.py
ner 的修改抄 cut 的

这样改好像不行 ><

def ner(text, ignore=False):
    text = _check_input(text, ignore)
    if not text:
        return [[]]
    res = LEXICAL_ANALYSER.ner(text)
-    return res
+    new_words = []
+    if _DICTIONARY.sizes != 0:
+        for sent, words in zip(text, res):
+            words = _mearge_user_words(sent, words)
+            new_words.append(words)
+    else:
+        new_words = res
+    return new_words


def analysis(text, ignore=False):
    text = _check_input(text, ignore)
    if not text:
        return [[]], [[]]
-    res = LEXICAL_ANALYSER.analysis(text)
-    return res
+    word_inf = pos_cut(text)
+    ners = ner(text)
+    return word_inf, ners
a = ['阿里收购饿了么']
fool.load_userdict('foolnltk_userdict.txt')
# fool.delete_userdict()
print(fool.cut(a))
[['阿里', '收购', '饿了么']]

print(fool.analysis(a))
([[('阿里', 'nz'), ('收购', 'v'), ('饿了么', 'nz')]], [['阿里收购', '饿了么']])

@rockyzhengwu
应该是笔误吧: init.py 下

_mearge_user_words -- 改为 --> _merge_user_words

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants