-
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Put search result with exact title in first position. #766
Comments
See kiwix/kiwix-android#2033 and kiwix/kiwix-android#2035 which I created 3 years ago. |
@mgautierfr We need concrete examples and explanations about why this is not the case today. We should talk about "suggestions" as this is what this is about. If the example is the "apple", then the only way i see is too add a layer on the top of Xapian and I'm against it because it has been done at least twice (even by @mgautierf) and this was bringing more problems than goods. Not in favour of redoing same errors over time. |
By "accident", Kiwix JS does something like this. The accident is that we only very recently got full-text searching (thanks to the libzim WASM), so I grafted ft search on top of existing title search. Because ft search is considerably slower than title search, we get search results coming in a two-stage process: exact prefix matching first (pseudo-case-insensitive), and then a few seconds later, the ft results (which are pruned to remove any duplicates before displaying them). NB We can't currently provide any "snippets", because that part of the API isn't yet bound to JavaScript. (It might be too slow, anyway.) |
@Jaifroid I hardly believe doing what you describe implements this feature because ft search does not implement this feature either. |
kiwix-android search feels slower than kiwix-js anyway. It spins for noticeably longer than on desktop.
Perhaps something else in kiwix-js is implementing this. On kiwix-js I can actually find what I'm looking for in the first result, while on kiwix-android I have to scroll and look through a bunch of random results |
You've read it wrong I believe. @Jaifroid said there is a Title-prefix search displayed (sort of like suggestions) while the FT search is requested in the background and once FT results are ready, those are added to the page (removing the entries that were already there from the prefix search). |
Regardless of how practical it is to implement, I support the feature request as this IMO a very common scenario: you type a request, you get the suggestions but it's not giving you exactly what you wanted. So you type |
I guess we do need a proper specification of the problem. Kiwix Desktop (and Kiwix Serve) seem to do a version of prefix matching if you enter more than a single word, but we get a slightly unintuitive list of results for single words I compared searching for "caribbean basin" in Kiwix JS and Kiwix Desktop (see top screenshot, full English Wikipedia) -- almost exaclty the same results for the title search (outlined in red). But with "apple" we get a very different search result order, with the first result matching the fruit being the one outlined in red in each case (bottom screenshot). To be clear, Kiwix JS title search is not intelligent or weighted in any way, it merely does a binary search on as many upper-case and lower-case variants of the entered prefix as it can. and gathers anything that matches the prefix. It then fills up the rest of the space (up to the max search results requested, default 30, but user-selectable) with full-text search results (from which duplicates are removed). |
@rgaudin Honestly, I have no real clue honestly what this ticket is about as there is not concrete example of input/output... If this is not done I will close the ticket as I can not follow what all this is about. |
My initial idea was about search for term. If you search for "Apple" on wikipedia_en_all, you have this list (https://library.kiwix.org/viewer#search?content=wikipedia_en_all_maxi_2023-02&pattern=apple):
The idea is to "move" the "Apple" result (the article with a title equal (case insensitive) to the search term) on top of the list How the "move" is implemented is still open to discussion. It could be specific criteria in xapian to give the highest score to "Apple" article, or it could be the libzim iterator starting with "Apple" and then with the classic xapian results (skip in the "Apple" article in them), or libkiwix itself inserting the result in the html page (maybe with a specific section), or ... But as @Jaifroid suggests in its last comment, we could also do the same for suggestions. This could be compared with kiwix/libkiwix#748. We were redirecting directly to the exact title article in case of search. Now we are not redirecting, but at least we could put the exact title article first. |
@mgautierfr To me, if the ticket seems obvious for suggestions, it sounds far less obvious for ft search. If I ft search "Verdun", would be kind of expecting "Battle of Verdun" as first result, but if I search a suggestion, kind of expect "Verdun" as first result. In both cases, this is the job of Xapian to deliver things properly... see no fundamental reason it could not. |
What happens for ZIMs that don't have a Xapian index? Presumably fallback to binary search of Directory Entry titles. |
I think there are two distinct discussions here: what we'd want to get and how to implement it. It's usually more efficient to define the former first and then try to reconcile with the second. Away from all technical considerations, I believe if there is an entry matching the exact search query, it should be highlighted. It can be the first result or a different card or anything that tells the user “you've requested this, we have it”.
So it's reasonable to assume that a suggested Entry can be considered but user would like more details before discarding it. In terms of UX, I think I'd even want if that matching Entry is a redirect to have something like “Le great XXX (redirection from XXX)” I'd be careful with examples (in this ticket! Not in other related to improving search) as you seem to incorporate cultural background to it. We can design various scoring mechanism so that we influence the sorting of search results. In your example, on WPEN that battle article is not the first result. Verdun, the city, is. WPFR is similar but it could be different. That's a discussion about sorting and it's not what this ticket is about. This ticket is about a UX improvement of asserting that the exact search query has a matching result and this could be highlighted. I agree the ticket title is a bit incorrect as it suggests a technical solution. |
This topic It's not a prority considering we don't produce this kind of ZIM files. That said, considering the logic of dichotomy finding, this should be already the case IMHO. |
I think it's important to have a concrete example. It's impossible to objectively measure whether the bug is fixed or not without a test case.
No, it shouldn't be a different card. On desktop, I want to just press the enter key without looking. On mobile, I want to tap the first search result row with my eyes closed. Simple WikipediaI will be using https://library.kiwix.org/viewer#wikipedia_en_simple_all_mini_2023-03/A/Main_Page . Example 1: appleExpected behavior
Actual Behavior
Example 2: mountainExpected behavior
Actual Behavior
Example 3: libraryExpected behavior
Actual Behavior
WiktionaryThis bug is worse with wiktionary which I mainly use Kiwix for, but there are less users compared to wikipedia. In wiktionary, the exact result doesn't even appear first. I will use Example 4: desExpected behavior
Actual Behavior
Example 5: queExpected behavior
Actual Behavior
The more intelligent suggestion behavior from https://simple.wikipedia.org/wiki/Main_Page that uses statistics is also good. |
This ticket is a response to #653 (comment) stating we need other implementation idea to discuss the need of a feature.
I don't see why we should have "Battle of Verdun" as first result. Interestingly, search on This let me think that the "natural" (relevance) sorting of wikipedia give a lot of importance to the exactitude of the title but this is not the only criteria to select the first result. @danielzgtg Your example seems to be base on suggestion. It is right ? Your example with wikionnary is interesting. As we stem the words, we have all titles |
The expected order listed above is, in each case, the order given by binary search of the title order list of directory entries, augmented by testing for several common case variations. So, when entering This algorithm is highly effective for Wikipedia/Wiktionary, but _almost useless_ for any ZIM where the alphabetical title order is meaningless (in a Stack Exchange ZIM, the title of many articles/questions will begin with "What...", and the key word will be buried somewhere in the title). The reason it is highly effective for Wikipedia/Wiktionary is because editors of articles add lots of redirects from common search terms (often including common misspellings and common case variants) to the underlying article). So, we effectively have a "pre-weighted" and augmented alphabetical search index. It makes sense to leverage this, if possible. |
I'm fine with Wikipedia doing that because pressing enter will go to the exact search result if found. However someone declined my suggestion for adding this at kiwix/kiwix-android#2033 (comment) , so I need the exact search result at the top.
Exactly as Jaifroid described for kiwix-js.
I never thought of that. But that should be done together with some kind of intelligent ranking feature. The ranking should pay less attention to stopwords and more attention to highly ranked questions/answers. Anyway, that would be more complicated to implement than the change described in this GitHub issue.
This behaviour from kiwix-android makes the app hard to use. Therefore, we should implement the original request in this GitHub issue. |
When user search for a term, we should put article with the exact same title first in the result.
(Same for suggestion)
See comments in #653
The text was updated successfully, but these errors were encountered: