-
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not search for exhaustive SNIPPET. #697
Conversation
`MSet::snippet` is more complex than it seems. Xapian does some kind of complex algorithm to find the best text subset to select. It does this by calculating the score/ranking of each term in the text. To do so, it has to evaluate the terms in the context of the whole mset and so, load "lot" of data from the database. The perfect is the enemy of the good. By removing the SNIPPET_EXHAUSTIVE flag, xapian evaluate less and return (far more) quicker. (https://xapian.org/docs/apidoc/html/classXapian_1_1MSet.html#a4797ae2295f88e49a9f76e3b89c21d88aea6a34a9c66720a44d5969ed47ca8edb) Generated snippet is different, but still valid. Do not search for exhaustive SNIPPET
Codecov Report
@@ Coverage Diff @@
## master #697 +/- ##
=======================================
Coverage 84.61% 84.61%
=======================================
Files 98 98
Lines 4308 4310 +2
Branches 1873 1869 -4
=======================================
+ Hits 3645 3647 +2
Misses 662 662
Partials 1 1
Continue to review full report at Codecov.
|
Great! This deserves a dedicated (patch) release IMO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
It seems it has even more importance in the snippet generation speed. SNIPPET_BACKGROUND_MODEL ask xapian to also compute score for non-query terms. By removing it, we compute score only for query terms and it is far better.
It seems that by removing
|
Working on kiwix/libkiwix#769 (which is about suggestion), I've found this important improvement (about search).
MSet::snippet
is more complex than it seems.Xapian does some kind of complex algorithm to find the best text subset to
select. It does this by calculating the score/ranking of each term in the
text. To do so, it has to evaluate the terms in the context of the whole
mset and so, load "lot" of data from the database.
The perfect is the enemy of the good.
By removing the SNIPPET_EXHAUSTIVE flag, xapian evaluate less and return
(far more) quicker.
(https://xapian.org/docs/apidoc/html/classXapian_1_1MSet.html#a4797ae2295f88e49a9f76e3b89c21d88aea6a34a9c66720a44d5969ed47ca8edb)
Generated snippet is different, but still valid.
Do not search for exhaustive SNIPPET
On my computer with a zim file on a external usb drive (for low IO) and with a clean fs cache, a search for
home
drop from 87s to 15s (!!)Generated snippets are different but the quality doesn't seem really impacted :
Original snippet (slow) :
New snippet (quick):