-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better handling of multi search #729
Conversation
701a416
to
311f783
Compare
27953dc
to
8637a8e
Compare
Codecov Report
@@ Coverage Diff @@
## master #729 +/- ##
==========================================
+ Coverage 61.97% 63.39% +1.41%
==========================================
Files 58 59 +1
Lines 3887 4051 +164
Branches 2103 2192 +89
==========================================
+ Hits 2409 2568 +159
- Misses 1477 1481 +4
- Partials 1 2 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since @kelson42 requested my review on this WIP PR I started looking at the changes but I soon figured out that I was missing some context. The outcome of my first iteration are a few low value comments for the first couple of commits. It will be much helpful if a high level description of the use-model and functional enhancement sought by this PR is provided.
This pull request has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions. |
d72ea4c
to
5296157
Compare
@veloman-yunkan There is still few unit tests missing but it is ready for re-review. Please review it as a new PR as I've change few thing when rebasing on master and it was difficult to do fixup commit. PR description updated. |
5431661
to
8be016a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only part of the review (I only had time to skim through the first several commits).
try { | ||
const char* envString = std::getenv(name); | ||
if (envString == nullptr) { | ||
throw std::runtime_error("Environment variable not set"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include the name of the environment variable in the error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This execption is never thrown to the caller as it is catched by following catch(...)
We have to provide a string, so we use it as "documentation", but it is useless to generate one.
src/library.cpp
Outdated
bool Library::removeBookById(const std::string& id) | ||
{ | ||
std::lock_guard<std::mutex> lock(m_mutex); | ||
mp_impl->m_bookDB->delete_document("Q" + id); | ||
dropReader(id); | ||
dropCache(id); | ||
return mp_impl->m_books.erase(id) == 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the cache size be updated here too? If not, a comment must be added that it is not done intentionally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it is not needed to update the cache size here. I've added a comment.
src/tools/lrucache.h
Outdated
@@ -138,12 +138,18 @@ class lru_cache { | |||
return _cache_items_map.size(); | |||
} | |||
|
|||
size_t set_max_size(size_t new_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that this class is a hybrid of a snake with a camel, but I wonder if your choice of the style here was deliberate :)
Now getting serious. If the cache's current size exceeds the new value of max size, shouldn't it be truncated immediately? Though, a deeper question is - do we really need a dynamic cache size at all (i.e. is linking the cache size to the actual amount of data a good idea)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that this class is a hybrid of a snake with a camel, but I wonder if your choice of the style here was deliberate :)
Yes, indeed.
Now getting serious. If the cache's current size exceeds the new value of max size, shouldn't it be truncated immediately? Though, a deeper question is - do we really need a dynamic cache size at all (i.e. is linking the cache size to the actual amount of data a good idea)?
The real question is what is a good default value for the cache size ?
On a use case as library.kiwix.org, as we have a lot of zim files, we probably want a important cache.
But on small server runs on a raspberryPI, we want a small cache.
Using a percentage of the number of book is a heuristic that takes this into account (although not perfect, as all heuristic).
Before this PR, the cache was created after the library was populated, so we could calculate the cache size once. But as we add books to the library after the cache creation, we need to increase the cache size as we add books.
Reducing the actual cache size seems less important. Either it is not a problem, or it was already a problem when we increase the cache size (and so user should have set a fixed value corresponding to its usecase)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A second set of review comments covering the next few commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This completes the first pass over the entire PR. But I think that with the big picture of the PR now in my head I will make another pass even before any of my comments is addressed.
8be016a
to
b7522f1
Compare
@veloman-yunkan I should have handle all your numerous remarks. Ready for another review pass. |
b7522f1
to
282c03b
Compare
The prefix will be used to parse a "query to select book" in different context. For now we have only one context : selecting books for the catalog search. But we will want to select books to do fulltext search on them (will be done in later commit)
`selectBooks` allow us to parse a query in a "standard" way to get the book(s) on which the user want to work.
This introduce a intermediate mustache object to store information about the request made by the user.
We are currently limiting to 5 but it will be changed in next commit.
The default value is 0, which means no limit.
- Adapt lrucache.cpp for rigth include path and use `kiwix::lru_cache` instead of `zim::lru_cache`. - Add missing `#include <set>` in lrucache.h
When ConcurrentCache store a shared_ptr we may have shared_ptr in used while the ConcurrentCache has drop it. When we "recreate" a value to put in the cache, we don't want to recreate it, but copying the shared_ptr in use. To do so we use a (unlimited) store of weak_ptr (aka `WeakStore`) Every created shared_ptr added to the cache has a weak_ptr ref also stored in the WeakStore, and we check the WeakStore before creating the value.
libzim's search is not thread safe (mainly because xapian is not). So we must protect our search objects from multi thread calls. The best way to do this is to associate a mutex to the `zim::Searcher` and lock the searcher each time we access object derivated from the searcher (search, results, iterator, ...)
Providing the core part of the query explicitly in the search results testsuite test data.
Note that some tests are failing and will be fixed in next commits.
The request_context can now take a filter to select arguments to keep in the query string.
We have to reuse the query the user give us to generate the pagination links. At search result rendering step we don't have access to the query object. The best place to know which arguments are used to select books (and so which arguments to keep in the pagination links) is when we parse the query to select books. Fix tests (pagination links) with book selector other than "books.id=" (pattern=jazz&books.query.lang=eng)
Fix tests with querystring needed url encoding (pattern=jazz&books.query.title=Ray%20Charles)
684e5dc
to
a7651d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGBT
Based on #724
The filtering of books to use in the search is made using new querystring parameter:
books.id
to specify book's id to use (may be provided several times to select several books)books.name
to specify book's name to use (may be provided several times to select several books).content
, same asbooks.name
. Keep for compatibility.content
can be provided only oncebooks.filter.foo
to do a search on the books using thefoo
criteria. Available criterias are the same as to search books in the opds streamThis PR now integrate #730 as both PR must be merge together to have something coherent.