-
-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use libzim for search #769
Comments
Seems clearly the path to follow, but for the moment blocked (at least by) kiwix/kiwix-build#503 |
It depends whether and how search results are returned from the backend. It's not a great experience currently when using Kiwix Serve. See kiwix/libkiwix#769 and kiwix/libkiwix#785. Progressive display of results makes a big difference to the sensation of responsiveness in the current app, especially for long searches (and especially given that the first few results usually contain what you are looking for). |
@Jaifroid Not sure I understand you properly. Both tickets reference libkiwix and features kiwix-js won't rely on: respectively kiwix-serve UI and Opensearch Multizim API. Or this ticket is about libkiwix, not libzim? Or you suspect underlying weaknesees? |
@kelson42 I was thinking of this explanation from mgautierfr on one of those threads:
It seems this relates to the rate at which the backend returns results from Xapian. It remains to be seen how the balance is made between returning title search results quickly and returning slower Xapian searches, but I suspect we'll want some element of progression still. |
I feel myself uncomfortable to run a strategical discussion, the ticket topic, and ending about discussing a very detailed aspect, where we are now. Not that the concern about this particular topic is not legitim, in particular because the devil can be in the detail. But we should no run the second in place of the first IMO. I see no fundamental reason why we should/could not close kiwix/libkiwix#769 ultimatively so you can wait until we close if you feel this particular ticket puts the whole strategical discussion at stack. My opinion, is that it should not. |
@kelson42 I think we're on topic. This issue is about replacing (or augmenting) our current title search with libzim Xapian search. You can test the libzim search here: https://mossroy.github.io/libzim_wasm/index.html . In full English Wikipedia it is a lot slower than current title search, but you get search from the full text index. The ideal situation would probably be to display any title search results straight away (because if a title exists corresponding to what the user searched for, it is probably what they want), and then integrate the Xapian full-text search results once they come in. Then the user won't feel like they are waiting too long for simple searches, but they will be prepared to wait for more obscure searches. This probably wouldn't be difficult to integrate. |
Also relevant to this issue: Xapian search doesn't support split-ZIM archives, if I understand correctly, whereas we currently support such archives. Either we have to turn off Xapian search if the archive is split, or else we have to deprecate split-ZIM support. |
It seems to me that now that kiwix/libkiwix#769 is closed, and we can reliably reproduce the Emscripten/wasm build of libzim with openzim/javascript-libzim#14, it might be a good test of the libzim build to begin to augment our title search with results returned from the libzim worker in a progressive manner. The idea is that the algorithm should return our (very fast, near-instant) title search results first, while it is waiting for libzim/Xapian to return any full text search results. Once they come in, they would be inserted into the results already displayed, taking care not to display duplicates. Search integration would only work if the following are true:
I think that this can be integrated relatively easily and without the user having to turn on libzim reading in the UI. It would provide something of an opportunity to road-test reading archives with libzim without relying on it for mission-critical functions, so we can iron out any issues such as memory leaks, etc. |
@Jaifroid I would recommend to treat suggestions (title search) from (fulltext) search in different tickets. They are pretty different technically, differrent from a user perspective and current code Kiwix JS development is not at the same status (existing for suggestions, inexistant for search). First thing first, using libkiwix for suggestions, seems a good next step. but I see:
|
@kelson42 OK, thanks for the advice. I'm at the stage of integrating the libzim worker for search, then I'll be in a position to test the relative performance of the title search vs full text search code. I can't throw away the old code because it only works in browsers that support WASM, but if the libzim is performant enough, I could switch over fully to using it for those browsers that support it. |
Issue closed in #935. |
The prototype shows that it's possible, using xapian full-text search.
This would be a great feature, that kiwix-js is missing today.
We should probably remove the algorithm that handles the progressive display of search results, if it's fast enough.
The libzim API provides pagination, that should be useful for us.
Ideally, the WebWorker would keep the Search instance (see https://libzim.readthedocs.io/en/stable/api/classzim_1_1Search.html) so that the search is not done again each time the user paginates. But it's not critical : kiwix-serve has restarted the search on each pagination for many years without complaints
The text was updated successfully, but these errors were encountered: