forked from candlewill/Bots
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chinese segmentation and gdb debugger
- Loading branch information
1 parent
b9fe83a
commit ad195df
Showing
22 changed files
with
420 additions
and
128 deletions.
There are no files selected for viewing
5 changes: 3 additions & 2 deletions
5
BOTDATA/TEST/error_correction.top → BOTDATA/TEST/error_correct.top
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ topic: ~segment keep repeat ( 分词 ) | |
|
||
t: 请输入测试语句: | ||
|
||
u: (看 _*) | ||
u: ( 分词 _* ) | ||
使用撇号: \n | ||
'_0 \n | ||
没有撇号: \n | ||
|
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
break MainLoop() | ||
break PerformChat | ||
break PerformChatGivenTopic | ||
break ProcessInputFile | ||
break /letv/workspace/Bots/ChatScript-7.3/SRC/mainSystem.cpp:1441 | ||
break /letv/workspace/Bots/ChatScript-7.3/SRC/mainSystem.cpp:1471 | ||
break ProcessInputFile | ||
break FinishVolley | ||
break MainLoop() | ||
break PerformChat | ||
break PerformChatGivenTopic | ||
break ProcessInputFile | ||
break /letv/workspace/Bots/ChatScript-7.3/SRC/mainSystem.cpp:1441 | ||
break /letv/workspace/Bots/ChatScript-7.3/SRC/mainSystem.cpp:1471 | ||
break ProcessInputFile | ||
break FinishVolley | ||
break CNPreprocess(char*) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
#include "cppjieba/Jieba.hpp" | ||
|
||
const char* const DICT_PATH = "./privatecode/Jieba/DICT/jieba.dict.utf8"; | ||
const char* const HMM_PATH = "./privatecode/Jieba/DICT/hmm_model.utf8"; | ||
const char* const USER_DICT_PATH = "./privatecode/Jieba/DICT/user.dict.utf8"; | ||
const char* const IDF_PATH = "./privatecode/Jieba/DICT/idf.utf8"; | ||
const char* const STOP_WORD_PATH = "./privatecode/Jieba/DICT/stop_words.utf8"; | ||
|
||
cppjieba::Jieba cpp_jieba(DICT_PATH, | ||
HMM_PATH, | ||
USER_DICT_PATH, | ||
IDF_PATH, | ||
STOP_WORD_PATH); | ||
|
||
char * CNPreprocess(char * incoming) | ||
{ | ||
char * segmented_result; | ||
if (strlen(incoming) == 0 || !strncmp(incoming, " :", 2) || !strncmp(incoming, ":", 1)) | ||
segmented_result = incoming; | ||
else | ||
{ | ||
vector<string> words; | ||
vector<cppjieba::Word> jiebawords; | ||
string s(incoming); | ||
string result; | ||
|
||
cpp_jieba.Cut(s, words, true); | ||
result = limonp::Join(words.begin(), words.end(), " "); | ||
char *pw = new char(strlen(incoming) + 1); | ||
|
||
// Method #2: Allocate memory on stack and copy the contents of the | ||
// original string. Keep in mind that once a current function returns, | ||
// the memory is invalidated. | ||
segmented_result = (char *)alloca(result.size() + 1); | ||
memcpy(segmented_result, result.c_str(), result.size() + 1); | ||
} | ||
return segmented_result; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
#ifndef PREPROCESSH | ||
#define PREPROCESSH | ||
|
||
char* CNPreprocess(char * incoming); | ||
|
||
#endif |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.