Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greek stemmer crashes the server #2888

Closed
3 of 5 tasks
subnix opened this issue Dec 29, 2024 · 6 comments · Fixed by #2892
Closed
3 of 5 tasks

Greek stemmer crashes the server #2888

subnix opened this issue Dec 29, 2024 · 6 comments · Fixed by #2892
Assignees
Labels

Comments

@subnix
Copy link
Contributor

subnix commented Dec 29, 2024

Bug Description:

The specific sequence of Greek symbols, when used with the enabled Greek stemmer (libstemmer_el), causes the server to crash.

MRE:

CREATE TABLE test (data text indexed) morphology='libstemmer_el' charset_table='non_cjk';
INSERT INTO test (data) VALUES ('ισαισα'); -- crash
select * from test where match('ισαισα'); -- crash

Crash dump:

------- FATAL: CRASH DUMP -------
[Sun Dec 29 14:48:49.377 2024] [    1]

--- crashed SphinxQL request dump ---
select * from test where match('ισαισα')
--- request dump end ---
--- local index:test
Manticore 6.3.8 d17bd2b6b@24112202
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with Clang 16.0.6
Configured with flags: Configured with these definitions: -DDISTR_BUILD=jammy -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_NLJSON=1 -DWITH_UNIALGO=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ZSTD=1 -DDL_ZSTD=1 -DZSTD_LIB=libzstd.so.1 -DWITH_CURL=1 -DDL_CURL=1 -DCURL_LIB=libcurl.so.4 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.21 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore -DFULL_SHARE_DIR=/usr/share/manticore
Built on Linux x86_64 for Linux aarch64 (jammy)
Stack bottom = 0xffff74023f80, thread stack size = 0x20000
Trying manual backtrace:
Frame pointer is null, manual backtrace failed (did you build with -fomit-frame-pointer?)
Trying system backtrace:
begin of system symbols:
searchd(_Z12sphBacktraceib+0x220)[0xaaaae8038a30]
searchd(_ZN11CrashLogger11HandleCrashEi+0x338)[0xaaaae7ebe668]
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffff8d3247a0]
/lib/aarch64-linux-gnu/libc.so.6(+0x97b94)[0xffff8c8e7b94]
searchd(_ZNK20TemplateDictTraits_c8StemByIdEPhi+0x10c)[0xaaaae8e6f158]
searchd(_ZNK20TemplateDictTraits_c13ApplyStemmersEPh+0x244)[0xaaaae8e6efec]
searchd(_ZN11CSphDictCRCIL7CRCALGO1EE9GetWordIDEPh+0x5c)[0xaaaae8e6d260]
searchd(_ZN10XQParser_t8GetTokenEP7YYSTYPE+0x7d0)[0xaaaae8025f00]
searchd(_Z7yyparseP10XQParser_t+0x154)[0xaaaae8022938]
searchd(_ZN10XQParser_t5ParseER9XQQuery_tPKcPK9CSphQueryRK17CSphRefcountedPtrI13ISphTokenizerEPK10CSphSchemaRKS7_I8CSphDictERK17CSphIndexSettingsPK8BitVec_TIjLi128EE+0x340)[0xaaaae8026c38]
searchd(_ZNK18QueryParserPlain_c10ParseQueryER9XQQuery_tPKcPK9CSphQuery17CSphRefcountedPtrI13ISphTokenizerES9_PK10CSphSchemaRKS7_I8CSphDictERK17CSphIndexSettingsPK8BitVec_TIjLi128EE+0x6c)[0xaaaae802d714]
searchd(_ZNK9RtIndex_c10MultiQueryER15CSphQueryResultRK9CSphQueryRK11VecTraits_TIP15ISphMatchSorterERK18CSphMultiQueryArgs+0x1170)[0xaaaae8129fec]
searchd(_ZNK13CSphIndexStub12MultiQueryExEiPK9CSphQueryP15CSphQueryResultPP15ISphMatchSorterRK18CSphMultiQueryArgs+0x8c)[0xaaaae800af20]
searchd(+0xecea74)[0xaaaae7f1ea74]
searchd(+0x11f7158)[0xaaaae8247158]
searchd(_ZN7Threads4Coro8ExecuteNEiOSt8functionIFvvEE+0x74)[0xaaaae8f552b4]
searchd(_ZN15SearchHandler_c16RunLocalSearchesEv+0x5dc)[0xaaaae7ed154c]
searchd(_ZN15SearchHandler_c9RunSubsetEii+0x498)[0xaaaae7ed2d5c]
searchd(_ZN15SearchHandler_c10RunQueriesEv+0xc0)[0xaaaae7ecfb44]
searchd(_Z17HandleMysqlSelectR11RowBuffer_iR15SearchHandler_c+0x140)[0xaaaae7ef4160]
searchd(_ZN15ClientSession_c7ExecuteESt4pairIPKciER11RowBuffer_i+0x12b4)[0xaaaae7f04df4]
searchd(_Z8SqlServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EE+0x1718)[0xaaaae7e2e19c]
searchd(_Z10MultiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EESt4pairIitE7Proto_e+0x50)[0xaaaae7e299d0]
searchd(+0xdda560)[0xaaaae7e2a560]
searchd(_ZZN7Threads11CoRoutine_c13CreateContextESt8functionIFvvEESt4pairIN5boost7context13stack_contextENS_14StackFlavour_EEEENUlNS6_6detail10transfer_tEE_8__invokeESB_+0x30)[0xaaaae8f5838c]
searchd(make_fcontext+0x18)[0xaaaae8f9731c]
Trying boost backtrace:
 0# sphBacktrace(int, bool) in searchd
 1# CrashLogger::HandleCrash(int) in searchd
 2# __kernel_rt_sigreturn in linux-vdso.so.1
 3# 0x0000FFFF8C8E7B94 in /lib/aarch64-linux-gnu/libc.so.6
 4# TemplateDictTraits_c::StemById(unsigned char*, int) const in searchd
 5# TemplateDictTraits_c::ApplyStemmers(unsigned char*) const in searchd
 6# CSphDictCRC<(CRCALGO)1>::GetWordID(unsigned char*) in searchd
 7# XQParser_t::GetToken(YYSTYPE*) in searchd
 8# yyparse(XQParser_t*) in searchd
 9# XQParser_t::Parse(XQQuery_t&, char const*, CSphQuery const*, CSphRefcountedPtr<ISphTokenizer> const&, CSphSchema const*, CSphRefcountedPtr<CSphDict> const&, CSphIndexSettings const&, BitVec_T<unsigned int, 128> const*) in searchd
10# QueryParserPlain_c::ParseQuery(XQQuery_t&, char const*, CSphQuery const*, CSphRefcountedPtr<ISphTokenizer>, CSphRefcountedPtr<ISphTokenizer>, CSphSchema const*, CSphRefcountedPtr<CSphDict> const&, CSphIndexSettings const&, BitVec_T<unsigned int, 128> const*) const in searchd
11# RtIndex_c::MultiQuery(CSphQueryResult&, CSphQuery const&, VecTraits_T<ISphMatchSorter*> const&, CSphMultiQueryArgs const&) const in searchd
12# CSphIndexStub::MultiQueryEx(int, CSphQuery const*, CSphQueryResult*, ISphMatchSorter**, CSphMultiQueryArgs const&) const in searchd
13# 0x0000AAAAE7F1EA74 in searchd
14# 0x0000AAAAE8247158 in searchd
15# Threads::Coro::ExecuteN(int, std::function<void ()>&&) in searchd
16# SearchHandler_c::RunLocalSearches() in searchd
17# SearchHandler_c::RunSubset(int, int) in searchd
18# SearchHandler_c::RunQueries() in searchd
19# HandleMysqlSelect(RowBuffer_i&, SearchHandler_c&) in searchd
20# ClientSession_c::Execute(std::pair<char const*, int>, RowBuffer_i&) in searchd
21# SqlServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >) in searchd
22# MultiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >, std::pair<int, unsigned short>, Proto_e) in searchd
23# 0x0000AAAAE7E2A560 in searchd
24# Threads::CoRoutine_c::CreateContext(std::function<void ()>, std::pair<boost::context::stack_context, Threads::StackFlavour_E>)::{lambda(boost::context::detail::transfer_t)#1}::__invoke(boost::context::detail::transfer_t) in searchd
25# make_fcontext in searchd

-------------- backtrace ends here ---------------

The problem is in this code snippet:

const sb_symbol* sStemmed = sb_stemmer_stem ( pStemmer, (sb_symbol*)pWord, (int)strlen ( (const char*)pWord ) );
int iStemmedLen = sb_stemmer_length ( pStemmer );
memcpy ( pWord, sStemmed, iStemmedLen );

sb_stemmer_stem returns NULL, causing a subsequent call to memcpy to crash.

Manticore Search Version:

6.3.8 d17bd2b6b@24112202

Operating System Version:

docker (Linux bf00ec55bdd0 6.10.14-linuxkit #1 SMP Thu Oct 24 19:28:55 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux)

Have you tried the latest development version?

No

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation reviewed
  • Changelog updated
@subnix subnix added the bug label Dec 29, 2024
@subnix
Copy link
Contributor Author

subnix commented Dec 29, 2024

libstemmer’s changelog for version 2.2.0 includes some notes about the Greek stemmer. I built manticore using the new version of libstemmer and reproduced the issue. Thus, updating the library doesn't solve the issue.

It appears that libstemmer returns NULL in case of any error. Therefore, we should handle NULL values returned by sb_stemmer_stem and log them as errors. If this is acceptable, I can submit a pull request.

@tomatolog
Copy link
Contributor

yes please submit PR

I just worried that TemplateDictTraits_c::StemById funtion does not set error message on any error. It could be better to investigate why TemplateDictTraits_c::InitMorph creates stemmer and pass the possible error further.

@subnix
Copy link
Contributor Author

subnix commented Dec 30, 2024

I just worried that TemplateDictTraits_c::StemById funtion does not set error message on any error. It could be better to investigate why TemplateDictTraits_c::InitMorph creates stemmer and pass the possible error further.

I've looked at the call stack... This requires changing the signatures of many methods and sounds like a major refactoring. Does it make sense? From my perspective, it's a really rare case that indicates a potential bug in libstemmer. Therefore, I'd consider handling this just in case, reporting the bug to libstemmer, and waiting for a fix.

@tomatolog
Copy link
Contributor

do you have test that reproduces this issue or it happens only with new Greek stemmer?

@tomatolog
Copy link
Contributor

I've just added test case to test 271 at c82ade4 to make sure this or similar crashes will be covered for further changes in that code

@subnix
Copy link
Contributor Author

subnix commented Jan 6, 2025

do you have test that reproduces this issue or it happens only with new Greek stemmer?

No, I don't. Also, I haven't found any other similar cases in the libstemmer issues.

I've just added test case to test 271 at c82ade4 to make sure this or similar crashes will be covered for further changes in that code

I wasn't certain about the tests since it's a bug in the third-party library that should be fixed someday. That will make this test redundant.

Perhaps we should keep the issue open until the upstream fix and subsequent library update? They've already confirmed the bug (snowballstem/snowball#204).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants