Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fatal error when using analyzer-nori and opensearch 2.15 #15895

Closed
davidshimjs opened this issue Sep 11, 2024 · 4 comments
Closed

[BUG] Fatal error when using analyzer-nori and opensearch 2.15 #15895

davidshimjs opened this issue Sep 11, 2024 · 4 comments
Labels
bug Something isn't working Plugins

Comments

@davidshimjs
Copy link

davidshimjs commented Sep 11, 2024

Describe the bug

I get an error when using nori analyzer in opensearch 2.15 when Hangul is tokenized. (analyzer-nori V1)
you can check out error messages below.

[2024-09-11T06:23:28,756][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [23b074df6fbb0c1e295e0f6e5e71f4c2] fatal error in thread [opensearch[23b074df6fbb0c1e295e0f6e5e71f4c2][write][T#2]], exiting
java.lang.NoSuchMethodError: 'void org.apache.lucene.util.fst.FST.<init>(org.apache.lucene.store.DataInput, org.apache.lucene.store.DataInput, org.apache.lucene.util.fst.Outputs)'
	at org.apache.lucene.analysis.ko.dict.TokenInfoDictionary.<init>(TokenInfoDictionary.java:115)
	at org.apache.lucene.analysis.ko.dict.TokenInfoDictionary.<init>(TokenInfoDictionary.java:43)
	at org.apache.lucene.analysis.ko.dict.TokenInfoDictionary$SingletonHolder.<clinit>(TokenInfoDictionary.java:139)
	at org.apache.lucene.analysis.ko.dict.TokenInfoDictionary.getInstance(TokenInfoDictionary.java:131)
	at org.apache.lucene.analysis.ko.KoreanTokenizer.<init>(KoreanTokenizer.java:186)
	at org.opensearch.index.analysis.NoriTokenizerFactory.create(NoriTokenizerFactory.java:110)
	at org.opensearch.index.analysis.CustomAnalyzer.createComponents(CustomAnalyzer.java:112)
	at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:120)
	at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:120)
	at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:193)
	at org.apache.lucene.document.Field.tokenStream(Field.java:491)
	at org.apache.lucene.index.IndexingChain$PerField.invertTokenStream(IndexingChain.java:1199)
	at org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1183)
	at org.apache.lucene.index.IndexingChain.processField(IndexingChain.java:731)
	at org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:609)
	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:263)
	at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:425)
	at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1558)
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1843)
	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1483)
	at org.opensearch.index.engine.InternalEngine.addDocs(InternalEngine.java:1281)
	at org.opensearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:1217)
	at org.opensearch.index.engine.InternalEngine.index(InternalEngine.java:1011)
	at org.opensearch.index.shard.IndexShard.index(IndexShard.java:1277)
	at org.opensearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:1222)
	at org.opensearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:1113)
	at org.opensearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:625)
	at org.opensearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:471)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:950)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at __PATH__(ThreadPoolExecutor.java:1144)
	at __PATH__(ThreadPoolExecutor.java:642)
	at __PATH__(Thread.java:1583)

Related component

Plugins

To Reproduce

GET /_analyze
{
"analyzer": "nori",
"text": "테스트"
}

Expected behavior

It should be tokenized without errors.

Additional Details

Plugins
analyzer-nori 1.0.0

Host/Environment (please complete the following information):
AWS Opensearch 2.15

@davidshimjs davidshimjs added bug Something isn't working untriaged labels Sep 11, 2024
@dblock
Copy link
Member

dblock commented Sep 11, 2024

This is not good. @davidshimjs care to turn this into a failing test (maybe YAML REST test)?

@kj-sas
Copy link

kj-sas commented Sep 30, 2024

FYI. Using Open Search 2.16 and the Nori analyzer, I was able to analyze and search the string mentioned above: "테스트"

@dblock
Copy link
Member

dblock commented Sep 30, 2024

[Catch All Triage - 1, 2, 3, 4]

@dblock dblock removed the untriaged label Sep 30, 2024
@davidshimjs
Copy link
Author

It works on AWS Opensearch 2.17 and analyzer-nori 2.0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Plugins
Projects
None yet
Development

No branches or pull requests

3 participants