Skip to content

Commit

Permalink
Merge pull request #1 from yahoojapan/mnagaya/refactor
Browse files Browse the repository at this point in the history
Trivial changes
  • Loading branch information
y-yuyano authored Mar 14, 2018
2 parents d7f9372 + 3551cf1 commit 61d3640
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 17 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,11 @@ You can configure package by <config name="language.lib.kuromoji.kuromoji"&gt
|parameter|type|default|description|
|:--------|:---|:------|:----------|
|mode|string|search|mode of Kuromoji (normal OR search OR extended)|
|kanji.length_threshold|int|2|TODO|
|kanji.penalty|int|3000|TODO|
|other.length_threshold|int|7|TODO|
|other.penalty|int|1700|TODO|
|nakaguro_split|bool|false|TODO|
|kanji.length_threshold|int|2|threshold of the length of kanji tokens which is penalized while running the Viterbi search (expert feature).|
|kanji.penalty|int|3000|additional cost for kanji tokens which is longer than the pre-defined length threshold (expert feature).|
|other.length_threshold|int|7|threshold of the length of non-kanji tokens which is penalized while running the Viterbi search (expert feature).|
|other.penalty|int|1700|additional cost for non-kanji tokens which is longer than the pre-defined length threshold (expert feature).|
|nakaguro_split|bool|false|whether splits unknown words on the middle dot character (U+30FB KATAKANA MIDDLE DOT)|
|user_dict|string|-|path of user dictionary|
|tokenlist_name|string|default|target specialtokens name|
|all_language|bool|false|apply kuromoji tokenizer to all language or only Japanese|
Expand Down
15 changes: 8 additions & 7 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,14 @@
<packaging>container-plugin</packaging>

<name>kuromoji-linguistics</name>
<url>http://maven.apache.org</url>
<url>https://github.com/yahoojapan/vespa-kuromoji-linguistics</url>

<licenses>
<license>
<name>The Apache License, Version 2.0</name>
<url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
</license>
</licenses>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
Expand Down Expand Up @@ -55,12 +62,6 @@
<version>${vespa.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.yahoo.vespa</groupId>
<artifactId>container-di</artifactId>
<version>${vespa.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.yahoo.vespa</groupId>
<artifactId>linguistics</artifactId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,11 @@
* <table>
* <tr><th>parameter</th><th>default</th><th>description</th></tr>
* <tr><td>mode</td><td>search</td><td>mode of Kuromoji (normal|search|extended)</td></tr>
* <tr><td>kanji.length_threshold</td><td>2</td><td>TODO</td></tr>
* <tr><td>kanji.penalty</td><td>3000</td><td>TODO</td></tr>
* <tr><td>other.length_threshold</td><td>7</td><td>TODO</td></tr>
* <tr><td>other.penalty</td><td>1700</td><td>TODO</td></tr>
* <tr><td>nakaguro_split</td><td>false</td><td>TODO</td></tr>
* <tr><td>kanji.length_threshold</td><td>2</td><td>threshold of the length of kanji tokens which is penalized while running the Viterbi search (expert feature).</td></tr>
* <tr><td>kanji.penalty</td><td>3000</td><td>additional cost for kanji tokens which is longer than the pre-defined length threshold (expert feature).</td></tr>
* <tr><td>other.length_threshold</td><td>7</td><td>threshold of the length of non-kanji tokens which is penalized while running the Viterbi search (expert feature).</td></tr>
* <tr><td>other.penalty</td><td>1700</td><td>additional cost for non-kanji tokens which is longer than the pre-defined length threshold (expert feature).</td></tr>
* <tr><td>nakaguro_split</td><td>false</td><td>whether splits unknown words on the middle dot character (U+30FB KATAKANA MIDDLE DOT)</td></tr>
* <tr><td>user_dict</td><td>-</td><td>path of user dictionary</td></tr>
* <tr><td>tokenlist_name</td><td>default</td><td>target specialtokens name</td></tr>
* <tr><td>all_language</td><td>false</td><td>apply kuromoji tokenizer to all language</td></tr>
Expand Down

0 comments on commit 61d3640

Please sign in to comment.