-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tasks for multiple negated keywords and its optimized version #258
base: main
Are you sure you want to change the base?
Conversation
I'm unclear of your goal here. Since these queries only have negated clauses, they do not match anything. Are you focusing on the rewrite() cost? |
One of our engineers in Amazon Product Search found that replacing should clause having disjunctive negated keywords eg:
@jpountz Do you mean to update the queries to something like |
Ok, while checking on the benchmark sanity I found a mistake with the test I performed. For
Here we need to test the performance of queries 1 with 5 or 2 with 5. I'll update the query task file and post the new results shortly. Looking for thoughts if this makes sense? |
I ran the benchmarks again with modified unoptimized queries (just changed Run 1 :
Run 3:
|
a840bea
to
12c5fa0
Compare
Am I misreading your benchmark results? The Pct diff looks as if it in the -1 - 2% range and the p-values indicate no significant differences. For significance, we'd expect to see p-values < 0.05. |
I was more thinking of writing non optimized queries as |
Thanks for digging on this @shubhamvishu! +1 to get a I thought Lucene would not even run pure negated queries anymore, so I'm confused how the benchy is even running, on either the baseline or the candidate. Also, note that the QPS in your latest run are absurdly high (7K or 8K QPS) and not really believable/trustworthy. I'd rather see at most maybe a few hundred QPS :) |
@msokolov Here I'm only comparing the raw QPS numbers for 2 tasks so the baseline and candidate are exactly same in one task or row (2 different representations of one query is executed by the 2 tasks). Hence, we could safely ignore p-value or pct-diff.
|
This also seem like one option and I ran benchmarks for this as well but rewriting this way into a single negated clause does not seem to help in the benchmarks. Will post the results shortly.
@jpountz I was thinking of the opposite case actually i.e. rewriting multiple
|
I randomly sampled Results for :
|
I'm still not happy with the 2-3 K QPS :) Something seems amiss. The benchmark writes detailed "results" files for each iteration -- can you peek at those and confirm your two forms of the same query are in fact getting the same top N hits? That would be a quick sanity check. It might be that BMW is kicking in, in some cases and not others? Maybe we can forcefully disable BMW if so -- we are trying to test the full iteration cost here (e.g. a query that wants to do facet counting)? I think your case 2 above is a more "typical" comparison. Case 1 is causing |
I suspect it's because the only scoring query produces a constant score. So in practice, Lucene will stop processing the query as soon as it collect 1,000 hits ( |
Thanks @jpountz -- that makes sense. Since we are trying to measure whether the intersection / union cost of |
I did a grep in benchmarks logs. Here is what I got :
@mikemccand I see one of the queries for
My understanding here was we are trying to rewrite the query in such a way 2 queries produces the same result but is more optimized(we could confirm the overlap numbers). So why do we still care about running |
Indeed, the 1001+ means BMW applied (see this nice blog post about it, thank you @jpountz!). So this means your first test ( (I do wonder why it's 1001+ and not 1000+ but probably there is a harmless ob1 somewhere, heh). Note that it's not stopping early -- the BMW optimization uses skipping to jump through the postings blocks to find possibly competitive (might crack into the top K) hits. It is an admissible optimization, meaning it is 100% correct (unless we have exciting bugs) and will always give the same top K hits returned as if the whole index was checked. In your case the scores are quite synthetic (just 1f from
Hmm that's odd. Each specific query should have been executed N times by M threads, so you should see N * M instances of it in each iteration's results file. Can you post one of your results files for each of baseline and candidate?
OK indeed that is the goal here (rewrite queries to faster form). I guess I feel like case 2) is the more common real-world example? "search for stuff but filter stuff out" use case. |
Makes sense! Is there any straightforward way to disable BMW for
@mikemccand I uploaded one log file for each baseline and candidate in the recent commit(under If we grep in the log files for the text "OrNegated2Terms " or "OrNegated2TermsOpt " I see both tasks using different query though that query indeed has NM(120=20) instances in the single log file. But what about the other queries of the task?....If I'm right there are 500 queries for each task so I expected to find all of them or alteast more but looks like there is a query(random out of 500?) picked for each iteration(out of 20) and runs 20(N*M) times i.e. total 20 different queries?. This is why wasn't able to compare 1-1 for a single query. Very likely, I might be misinterpreting something here.
I agree! 2nd case looks more common use case compare to 1st. |
I ran benchmarks again and this time increased
|
These gains indeed look correct (identical hit counts from X and XOpt tasks) and significant, especially for the N-term OR cases! Thanks @shubhamvishu. I thought we had a Lucene issue to explore having Lucene rewrite BQ clauses e.g. |
Actually, I'm not sure how the Maybe we first explore rewriting the |
9cdffca
to
adc70fa
Compare
adc70fa
to
ceff964
Compare
So I changed the new tasks to count only tasks as it was pointed out as alternative in #265 and pushed the tasks file I used and the logs file. For Note : I tried writing
|
I don't remember coming across any open issue in Lucene queue for this. I was thinking of doing that once we consolidate the findings here based on the benchmarks results and then pursue it in Lucene. Maybe should I open one right away?
If the above approach of disabling BMW using |
Did you mean that seemed NOT to do what we want? Likely this is a bug in But also in your
Well, Can you post one of your results files for each of baseline and candidate? I'm still confused why you see only one instance of each task in the benchmark output files... and could you also post you |
@mikemccand Yes, it seems to be doing wrong with the way the task is parsed. I checked the query paseed to
Here is how the a task is mapped to Lucene query and its equivalent hits :
The issue here seems to be with the
Great! So we could hope some QPS improvement with this rewrite change I guess. If the testing so far seems convincing enough, I could go ahead and prepare a PR for this in Apache Lucene with the benchmarking methodology of maxing out
I have also attached both the logs file in the latest commit on this PR (its under the folder
I'm running the benchmarks using |
Hi @mikemccand @jpountz , I'm looking for review on the above results. Any thoughts on these results? |
Looking for some thoughts here. cc - @mikemccand @jpountz |
Added 2 new tasks for multi keyword negated queries for performance comparison.
(-A -B -C -D)
-(A B C D)
UPDATE : See in the comments below for new updated results
Benchmark results :
Observation :
OrNegatedHighHighOpt
is more than 100% faster thanOrNegatedHighHigh
or conversely, the current implementation is >50% slower.Full results :
I'm working on fixing this with BQ.rewrite. Will open a issue/PR in Lucene once I have something concrete or I'll create one straight away. We can use the added tasks in this PR to test if the rewrite change is making the performance of queries with many negated keywords comparable to its optimized version.
@mikemccand : If this change looks good, maybe we could also add negated query tasks to other tasks files other than for
wikimediumall
?