Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use libzim for search #769

Closed
mossroy opened this issue Oct 29, 2021 · 12 comments
Closed

Use libzim for search #769

mossroy opened this issue Oct 29, 2021 · 12 comments
Labels
Milestone

Comments

@mossroy
Copy link
Contributor

mossroy commented Oct 29, 2021

The prototype shows that it's possible, using xapian full-text search.
This would be a great feature, that kiwix-js is missing today.

We should probably remove the algorithm that handles the progressive display of search results, if it's fast enough.

The libzim API provides pagination, that should be useful for us.
Ideally, the WebWorker would keep the Search instance (see https://libzim.readthedocs.io/en/stable/api/classzim_1_1Search.html) so that the search is not done again each time the user paginates. But it's not critical : kiwix-serve has restarted the search on each pagination for many years without complaints

@kelson42
Copy link
Collaborator

Seems clearly the path to follow, but for the moment blocked (at least by) kiwix/kiwix-build#503

@Jaifroid
Copy link
Member

We should probably remove the algorithm that handles the progressive display of search results, if it's fast enough.

It depends whether and how search results are returned from the backend. It's not a great experience currently when using Kiwix Serve. See kiwix/libkiwix#769 and kiwix/libkiwix#785. Progressive display of results makes a big difference to the sensation of responsiveness in the current app, especially for long searches (and especially given that the first few results usually contain what you are looking for).

@kelson42
Copy link
Collaborator

@Jaifroid Not sure I understand you properly. Both tickets reference libkiwix and features kiwix-js won't rely on: respectively kiwix-serve UI and Opensearch Multizim API. Or this ticket is about libkiwix, not libzim? Or you suspect underlying weaknesees?

@Jaifroid
Copy link
Member

@kelson42 I was thinking of this explanation from mgautierfr on one of those threads:

Xapian doesn't load all the database in memory. It loads a first part, start interpret it and load another part and so.... So the total time is probably less than 7 seconds but you need several seconds just for the IO. During this time, the frontend is waiting a answer and do not show the dropdown. It appears many second later, when the search complete and the frontend has results to show.

It seems this relates to the rate at which the backend returns results from Xapian. It remains to be seen how the balance is made between returning title search results quickly and returning slower Xapian searches, but I suspect we'll want some element of progression still.

@kelson42
Copy link
Collaborator

I feel myself uncomfortable to run a strategical discussion, the ticket topic, and ending about discussing a very detailed aspect, where we are now.

Not that the concern about this particular topic is not legitim, in particular because the devil can be in the detail. But we should no run the second in place of the first IMO.

I see no fundamental reason why we should/could not close kiwix/libkiwix#769 ultimatively so you can wait until we close if you feel this particular ticket puts the whole strategical discussion at stack. My opinion, is that it should not.

@Jaifroid
Copy link
Member

@kelson42 I think we're on topic. This issue is about replacing (or augmenting) our current title search with libzim Xapian search. You can test the libzim search here: https://mossroy.github.io/libzim_wasm/index.html . In full English Wikipedia it is a lot slower than current title search, but you get search from the full text index. The ideal situation would probably be to display any title search results straight away (because if a title exists corresponding to what the user searched for, it is probably what they want), and then integrate the Xapian full-text search results once they come in. Then the user won't feel like they are waiting too long for simple searches, but they will be prepared to wait for more obscure searches. This probably wouldn't be difficult to integrate.

@Jaifroid
Copy link
Member

Also relevant to this issue: Xapian search doesn't support split-ZIM archives, if I understand correctly, whereas we currently support such archives. Either we have to turn off Xapian search if the archive is split, or else we have to deprecate split-ZIM support.

@Jaifroid
Copy link
Member

Jaifroid commented Nov 24, 2022

It seems to me that now that kiwix/libkiwix#769 is closed, and we can reliably reproduce the Emscripten/wasm build of libzim with openzim/javascript-libzim#14, it might be a good test of the libzim build to begin to augment our title search with results returned from the libzim worker in a progressive manner. The idea is that the algorithm should return our (very fast, near-instant) title search results first, while it is waiting for libzim/Xapian to return any full text search results. Once they come in, they would be inserted into the results already displayed, taking care not to display duplicates.

Search integration would only work if the following are true:

  • We are in ServiceWorker mode
  • We are using WASM (not ASM)
  • The ZIM archive has a Xapian full-text search index
  • The archive is not split (this needs to be reviewed)
  • Possibly only if title search has not produced more than params.maxSearchResultsSize

I think that this can be integrated relatively easily and without the user having to turn on libzim reading in the UI. It would provide something of an opportunity to road-test reading archives with libzim without relying on it for mission-critical functions, so we can iron out any issues such as memory leaks, etc.

@kelson42
Copy link
Collaborator

kelson42 commented Nov 26, 2022

@Jaifroid I would recommend to treat suggestions (title search) from (fulltext) search in different tickets. They are pretty different technically, differrent from a user perspective and current code Kiwix JS development is not at the same status (existing for suggestions, inexistant for search).

First thing first, using libkiwix for suggestions, seems a good next step. but I see:

  • No reason to mix suggestions results with search results
  • No reason to not just use libkiwix search/suggestion results and throw away the old code

@Jaifroid
Copy link
Member

@kelson42 OK, thanks for the advice. I'm at the stage of integrating the libzim worker for search, then I'll be in a position to test the relative performance of the title search vs full text search code. I can't throw away the old code because it only works in browsers that support WASM, but if the libzim is performant enough, I could switch over fully to using it for those browsers that support it.

@Jaifroid
Copy link
Member

Issue closed in #935.

@Jaifroid Jaifroid modified the milestones: v4.0, v3.7 Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants