Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GITHUB-11915] Make Lucene smarter about long runs of matches via new API on DISI #12194

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

zacharymorn
Copy link
Contributor

@zacharymorn zacharymorn commented Mar 8, 2023

This PR adds a new API to DISI to find / estimate long running matches, which can then be leveraged by higher level code to skip over a range of matches and speed up certain query tasks.

@zacharymorn zacharymorn requested a review from jpountz March 8, 2023 02:58
@zacharymorn zacharymorn changed the title [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI [GITHUB-11915] [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI Mar 8, 2023

while (++wordIndex < numWords) {
word = bits[wordIndex];
if (Long.bitCount(word) != Long.SIZE) { // there are 0s in this word
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you really need to call bitcount instruction and compare to 64, or can you just do:

if (word != -1L) { // there are 0s in this word

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes! I wonder why I didn't think of it in the first place...will update them in the next commit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theres also a test missing, see below comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this one and a few more places to not use Long#bitCount.

@uschindler
Copy link
Contributor

There are no tests about correctness of those new BitSet methods for any implementation (Fixed, Sparse,...). Would it be possible to add them?

@zacharymorn
Copy link
Contributor Author

There are no tests about correctness of those new BitSet methods for any implementation (Fixed, Sparse,...). Would it be possible to add them?

Thanks for the review and comment @uschindler ! This PR is currently in draft state to facilitate discussion to make sure I'm on the right track. Will definitely add dedicated tests before converting it to formal PR.

@zacharymorn zacharymorn marked this pull request as draft March 8, 2023 08:31
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into it! Did you manage to observe some speedups with this change? You explored implementing this new API in several different places: BitSetIterator, doc-value iterator, postings, etc. and it's already a bit exhausting to review and will get worse when we add more tests. I think it would be helpful if we focused on a single thing for the initial PR that focuses on proving that this API is a good addition, adds good testing (e.g. enhancing AssertingScorer, CheckIndex and other similar classes to check that it is implemented correctly), and then implement the new API on other implementations of DocIdSetIterator in follow-up PRs.


/**
* Returns the next doc ID that may not be a match. Note that this API will essentially provide
* the following two guarantees:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add that it's illegal to call this method once the iterator is exhausted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I have been wondering about how to handle this case properly, hence the multiple NO_MORE_DOCS returned below. By making it illegal to call this method once the iterator is exhausted, do you mean we need to throw an exception here? As this condition may be pretty common, I'm wondering if we could give callers an easier way to detect / handle this situation. Is it ok we always return the last non matching doc (or maxDoc) under this scenario, and warn callers to check if the iterator's current doc is NO_MORE_DOCS to detect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While on this topic, I'm wondering also if we should also adjust the behavior of BitSet#nextSetBit, as it would return NO_MORE_DOCS when there are no more set bit (as opposed to max doc):

* Returns the index of the first set bit starting at the index specified. {@link
* DocIdSetIterator#NO_MORE_DOCS} is returned if there are no more set bits.
*/
public abstract int nextSetBit(int index);

In contrast, Java's BitSet#nextSetBit would return -1 in such a scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind on the BitSet question above, I got myself confused.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By making it illegal to call this method once the iterator is exhausted, do you mean we need to throw an exception here?

No I was thinking of just documenting the behavior as undefined. And throw assertions in AssertingScorer.

Is it ok we always return the last non matching doc (or maxDoc) under this scenario, and warn callers to check if the iterator's current doc is NO_MORE_DOCS to detect?

Yes, exactly. This is the same for nextDoc or advance, it is illegal to call these methods when the current doc is NO_MORE_DOCS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

@@ -82,6 +82,11 @@ public int advance(int target) throws IOException {
return doc;
}

@Override
public int peekNextNonMatchingDocID() {
return NO_MORE_DOCS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should return maxDoc, since maxDoc would not be a match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


@Override
public int advance(int target) throws IOException {
return reqApproximation.advance(target);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we like advance(docID()+1) to not perform too differently from nextDoc(), maybe we should have the peekNextNonMatchingDocID logic here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@zacharymorn
Copy link
Contributor Author

Thanks @jpountz for the review and comment!

Did you manage to observe some speedups with this change?

So far I have only able to run wikimedium10m and see the implementation has around -10% slow down (listed below) for full text boolean queries OrXXXNotYYYY due to changes in ReqExclScorer and Lucene90PostingsReader (and the facet ones don't seems to exercise the changes and should be just random fluctuation). I'm currently still searching for any existing benchmarking tasks that can measure these targeted use cases:

Is it actually common to have long runs of matches? For full-text indexes, maybe not so much, only stop words may have runs of adjacent matches. For string fields, this may happen if the field has a default value that is the value of most documents in the collection. Also it's possible for users to use index sorting in order to cluster similar documents together, which increases the likelihood to have long runs of adjacent matches.

Do you have any pointer which benchmark task I could potentially use? If there isn't one available, I could try to add some next.

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
     BrowseRandomLabelTaxoFacets       39.06     (49.1%)       33.42     (45.6%)  -14.4% ( -73% -  157%) 0.335
            BrowseDateTaxoFacets       34.22     (29.5%)       30.49     (27.9%)  -10.9% ( -52% -   65%) 0.229
       BrowseDayOfYearTaxoFacets       34.30     (29.4%)       30.57     (27.9%)  -10.9% ( -52% -   65%) 0.230
                   OrNotHighHigh      426.33      (2.7%)      388.27      (1.6%)   -8.9% ( -12% -   -4%) 0.000
                   OrHighNotHigh      621.42      (2.7%)      573.32      (2.0%)   -7.7% ( -12% -   -3%) 0.000
                    OrHighNotMed      616.08      (3.8%)      573.20      (2.7%)   -7.0% ( -13% -    0%) 0.000
                    OrHighNotLow      562.98      (4.0%)      525.80      (3.3%)   -6.6% ( -13% -    0%) 0.000
                    OrNotHighMed      712.40      (2.6%)      672.88      (2.4%)   -5.5% ( -10% -    0%) 0.000
            HighTermTitleBDVSort       19.04      (7.9%)       18.73      (8.3%)   -1.6% ( -16% -   15%) 0.534
            HighIntervalsOrdered        1.89     (12.3%)        1.86     (15.0%)   -1.6% ( -25% -   29%) 0.719
           BrowseMonthTaxoFacets       32.23     (33.9%)       31.77     (33.6%)   -1.4% ( -51% -  100%) 0.895
                    OrNotHighLow     1719.50      (3.8%)     1696.85      (4.6%)   -1.3% (  -9% -    7%) 0.326
               HighTermTitleSort      202.79      (2.9%)      200.20      (2.8%)   -1.3% (  -6% -    4%) 0.155
                     AndHighHigh       51.74      (5.8%)       51.08      (5.4%)   -1.3% ( -11% -   10%) 0.475
                          Fuzzy1       59.04      (2.7%)       58.36      (3.2%)   -1.2% (  -6% -    4%) 0.214
                         MedTerm     1364.68      (4.4%)     1349.31      (3.4%)   -1.1% (  -8% -    6%) 0.362
                        Wildcard      314.79      (2.8%)      311.35      (3.5%)   -1.1% (  -7% -    5%) 0.277
                         LowTerm     2087.86      (3.2%)     2065.24      (3.8%)   -1.1% (  -7% -    6%) 0.334
             MedIntervalsOrdered       22.66      (8.6%)       22.42     (10.4%)   -1.0% ( -18% -   19%) 0.730
                        PKLookup      331.54      (2.9%)      328.12      (2.6%)   -1.0% (  -6% -    4%) 0.242
             LowIntervalsOrdered      161.90      (9.5%)      160.23     (11.6%)   -1.0% ( -20% -   22%) 0.758
                          Fuzzy2      100.43      (1.5%)       99.40      (3.0%)   -1.0% (  -5% -    3%) 0.169
                         Respell       88.01      (2.0%)       87.27      (2.4%)   -0.8% (  -5% -    3%) 0.223
            BrowseDateSSDVFacets        4.89     (21.4%)        4.85     (20.2%)   -0.8% ( -34% -   51%) 0.905
     BrowseRandomLabelSSDVFacets       19.23      (7.1%)       19.09      (6.2%)   -0.7% ( -13% -   13%) 0.728
                      AndHighMed      114.49      (5.6%)      113.77      (5.0%)   -0.6% ( -10% -   10%) 0.708
                         Prefix3      376.91      (1.4%)      374.65      (2.5%)   -0.6% (  -4% -    3%) 0.348
               HighTermMonthSort     4250.83      (4.2%)     4227.31      (3.6%)   -0.6% (  -7% -    7%) 0.653
                       OrHighMed      209.50      (6.2%)      208.61      (3.4%)   -0.4% (  -9% -    9%) 0.787
                       LowPhrase       89.33      (3.0%)       88.96      (2.2%)   -0.4% (  -5% -    4%) 0.623
       BrowseDayOfYearSSDVFacets       24.82     (10.9%)       24.75     (11.2%)   -0.3% ( -20% -   24%) 0.940
         AndHighMedDayTaxoFacets      158.35      (1.6%)      158.08      (1.8%)   -0.2% (  -3% -    3%) 0.756
                        HighTerm     2076.24      (3.7%)     2074.83      (2.9%)   -0.1% (  -6% -    6%) 0.949
        AndHighHighDayTaxoFacets       14.81      (2.4%)       14.81      (2.9%)   -0.0% (  -5% -    5%) 0.992
                    HighSpanNear       11.02      (2.0%)       11.02      (2.4%)    0.0% (  -4% -    4%) 0.951
                     LowSpanNear      178.01      (1.7%)      178.17      (1.8%)    0.1% (  -3% -    3%) 0.864
                       OrHighLow      473.27      (6.2%)      473.94      (3.3%)    0.1% (  -8% -   10%) 0.929
                      TermDTSort      230.93      (4.7%)      231.36      (3.3%)    0.2% (  -7% -    8%) 0.885
                 MedSloppyPhrase       20.69      (3.0%)       20.76      (3.0%)    0.3% (  -5% -    6%) 0.721
                     MedSpanNear       80.38      (2.2%)       80.66      (2.1%)    0.3% (  -3% -    4%) 0.618
                       MedPhrase       53.03      (1.8%)       53.23      (1.8%)    0.4% (  -3% -    3%) 0.520
                      AndHighLow     2127.04      (4.1%)     2136.55      (3.4%)    0.4% (  -6% -    8%) 0.706
           HighTermDayOfYearSort      594.24      (6.9%)      597.03      (6.3%)    0.5% ( -11% -   14%) 0.822
                      OrHighHigh       53.90      (5.7%)       54.21      (4.0%)    0.6% (  -8% -   10%) 0.709
            MedTermDayTaxoFacets       41.47      (1.1%)       41.75      (2.8%)    0.7% (  -3% -    4%) 0.311
                      HighPhrase      119.60      (2.0%)      120.53      (1.8%)    0.8% (  -2% -    4%) 0.195
                 LowSloppyPhrase      166.00      (5.7%)      167.36      (5.5%)    0.8% (  -9% -   12%) 0.644
                HighSloppyPhrase       43.02      (5.5%)       43.40      (5.2%)    0.9% (  -9% -   12%) 0.610
           BrowseMonthSSDVFacets       24.48      (8.6%)       24.85     (11.6%)    1.5% ( -17% -   23%) 0.638
          OrHighMedDayTaxoFacets        7.71      (3.7%)        7.86      (6.0%)    1.9% (  -7% -   12%) 0.218
                          IntNRQ      115.47     (14.3%)      118.04     (13.2%)    2.2% ( -22% -   34%) 0.610

You explored implementing this new API in several different places: BitSetIterator, doc-value iterator, postings, etc. and it's already a bit exhausting to review and will get worse when we add more tests. I think it would be helpful if we focused on a single thing for the initial PR that focuses on proving that this API is a good addition, adds good testing, and then implement the new API on other implementations of DocIdSetIterator in follow-up PRs.

For sure. Once I'm able to benchmark this and observe good speed up & we are good with the API, I will break up this PR into smaller pieces.

Note: currently these two randomized tests will fail due to implementations in IndexedDISI for doc value

1. gradlew :lucene:grouping:test --tests "org.apache.lucene.search.grouping.TestDoubleRangeGroupSelector.testGroupHeads" -Ptests.jvms=6 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC -XX:ActiveProcessorCount=1" -Ptests.seed=62C97A77054E21AA -Ptests.gui=false -Ptests.file.encoding=ISO-8859-1

2. gradlew :lucene:grouping:test --tests "org.apache.lucene.search.grouping.TestLongRangeGroupSelector.testGroupHeads" -Ptests.jvms=6 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC -XX:ActiveProcessorCount=1" -Ptests.seed=62C97A77054E21AA -Ptests.gui=false -Ptests.file.encoding=ISO-8859-1
@gsmiller
Copy link
Contributor

gsmiller commented Mar 9, 2023

Fun! Thanks for picking this up @zacharymorn. What about updating FixedBitSet#or(disi) to use this? That's used when rewriting MultiTermQuery instances, and I would think we'd see a performance improvement to prefix3 and wildcard benchmark tasks. I guess the thinking there would be to "flip" an entire long to -1L at once if a run of 64 docs is included in a dense DISI sent to the or (and then subsequently advance beyond the "dense run" in the DISI).

This idea will probably have diminishing returns after #12055, since we now prioritize only building the more sparse iterators into the bitset upfront, but it could still help. If you really want to look for impact, try switching RegexpQuery and PrefixQuery to use CONSTANT_SCORE_REWRITE instead of CONSTANT_SCORE_BLENDED_REWRITE (in there constructors). That should highlight an impact in the benchmark.

@gsmiller
Copy link
Contributor

Oh and to clarify on my above comment, I’m not trying to create “scope creep” here. In fact, +1 to Adrien’s comment on limiting the scope of the initial PR. Just trying to suggest areas that might be good candidates for seeing initial impact.

@zacharymorn
Copy link
Contributor Author

Thanks @gsmiller for your review and suggestions!

What about updating FixedBitSet#or(disi) to use this? That's used when rewriting MultiTermQuery instances, and I would think we'd see a performance improvement to prefix3 and wildcard benchmark tasks. I guess the thinking there would be to "flip" an entire long to -1L at once if a run of 64 docs is included in a dense DISI sent to the or (and then subsequently advance beyond the "dense run" in the DISI).

This idea will probably have diminishing returns after #12055, since we now prioritize only building the more sparse iterators into the bitset upfront, but it could still help. If you really want to look for impact, try switching RegexpQuery and PrefixQuery to use CONSTANT_SCORE_REWRITE instead of CONSTANT_SCORE_BLENDED_REWRITE (in there constructors). That should highlight an impact in the benchmark.

I think @mdmarshmallow might be working on this as per #11915 (comment). As part of this PR, I've also added a new API to BitSet#nextClearBit just like JDK's BitSet API, which might be useful here as well.

@mdmarshmallow
Copy link
Contributor

I think @mdmarshmallow might be working on this as per #11915 (comment). As part of this PR, I've also added a new API to BitSet#nextClearBit just like JDK's BitSet API, which might be useful here as well.

Yeah I am working on it! Hopefully should be done by today as it's not that big of a change :)

@gsmiller
Copy link
Contributor

Woohoo! Thanks @zacharymorn / @mdmarshmallow! I suspect we may not really see any benefit though if the DISI can only expose the next non-matching doc within its current block. I think the real advantage here would come from being able to actually skip blocks in the DISI, which would rely on knowing that there are actually multiple consecutive "dense" blocks in a DISI. But... if our only way to know there are multiple consecutive "dense" blocks involves decoding those blocks anyway, maybe there isn't much gain to be had? Hmm... not sure. Excited to see what we learn though!

@zacharymorn
Copy link
Contributor Author

Thanks @mdmarshmallow for working on it! Btw, I just pushed a commit (f78182b) that fixed some bugs identified by randomized tests, you may want to pull that for your work especially if you are using BitSet#nextClearBit method.

I suspect we may not really see any benefit though if the DISI can only expose the next non-matching doc within its current block. I think the real advantage here would come from being able to actually skip blocks in the DISI, which would rely on knowing that there are actually multiple consecutive "dense" blocks in a DISI. But... if our only way to know there are multiple consecutive "dense" blocks involves decoding those blocks anyway, maybe there isn't much gain to be had? Hmm... not sure. Excited to see what we learn though!

@gsmiller Yeah I'm guessing that as well especially for posting and sparse / dense block, as it would take at least one pass to identify the next candidate. I have tried to cache the result as well and would like to see if that helps, and how it performs under benchmark.

@mdmarshmallow
Copy link
Contributor

Yeah I was doing some of my own debugging and saw some of those issues. I think this fixed a decent amount of the issues I was seeing but I'm still seeing problems with some tests. I'm not sure if it's an issue with my code or not though, so I'll need to dig a bit deeper.

…able. Caller should handle its result accordingly.
@mdmarshmallow
Copy link
Contributor

Ok so I did some more investigation, I think there might be a bug with Lucene90PostingsReader.BlockDocsEnum#peekNextNonMatchingDocID. I haven't looked super deeply into it yet, but I can post the patch/seed/test if you want to look into it as well @zacharymorn.

@zacharymorn
Copy link
Contributor Author

Ok so I did some more investigation, I think there might be a bug with Lucene90PostingsReader.BlockDocsEnum#peekNextNonMatchingDocID. I haven't looked super deeply into it yet, but I can post the patch/seed/test if you want to look into it as well @zacharymorn.

Hmm that seems strange, as I would think this API won't have direct impact to FixedBitSet#or. Where do you see the error? But yeah feel free to upload a patch/test, or post a comment in the diff for changes that you suspect might be wrong.

@mdmarshmallow
Copy link
Contributor

Sorry for the delayed response, so I saw it in my optimized version of BitSet#or that does use the peekNextNonMatchingDocID API. I also found the bug, it turns out we just weren't resetting the lastNonMatchingDoc in the reset() function. I created a pull request with my BitSet#or changes and the (one-liner) bug fix.

I'm also planning on benchmarking this and will post here when I do.

@mdmarshmallow
Copy link
Contributor

mdmarshmallow commented Mar 15, 2023

I tested with wikimedium10m. Looks like my change caused the Prefix3 test to slow down.. not sure why.

Edit: Ok so it turns out I hadn't rebased the the changes, this is the actual perf numbers (deleted the old ones to not clutter up the comments):

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
     BrowseRandomLabelTaxoFacets       14.18     (20.1%)       13.26     (22.7%)   -6.5% ( -41% -   45%) 0.337
                    OrNotHighMed      227.42      (4.3%)      213.71      (3.1%)   -6.0% ( -12% -    1%) 0.000
                   OrHighNotHigh      238.76      (3.5%)      224.45      (4.0%)   -6.0% ( -13% -    1%) 0.000
                   OrNotHighHigh      200.78      (3.2%)      188.91      (4.2%)   -5.9% ( -12% -    1%) 0.000
       BrowseDayOfYearTaxoFacets       22.11     (23.5%)       20.84     (27.1%)   -5.7% ( -45% -   58%) 0.475
           BrowseMonthTaxoFacets       20.08     (19.9%)       18.93     (28.2%)   -5.7% ( -44% -   52%) 0.457
                    OrHighNotMed      248.31      (3.5%)      235.74      (4.6%)   -5.1% ( -12% -    3%) 0.000
            BrowseDateTaxoFacets       20.79     (22.9%)       19.74     (26.7%)   -5.1% ( -44% -   57%) 0.520
                         Prefix3      212.93      (1.3%)      202.71      (1.7%)   -4.8% (  -7% -   -1%) 0.000
                    OrHighNotLow      202.41      (3.4%)      192.74      (4.8%)   -4.8% ( -12% -    3%) 0.000
                        Wildcard       28.39      (3.2%)       27.37      (2.8%)   -3.6% (  -9% -    2%) 0.000
                     AndHighHigh       50.48      (6.0%)       48.76      (5.1%)   -3.4% ( -13% -    8%) 0.055
            BrowseDateSSDVFacets        2.99     (16.5%)        2.92     (17.2%)   -2.3% ( -30% -   37%) 0.662
               HighTermTitleSort       72.55      (6.5%)       70.89      (3.5%)   -2.3% ( -11% -    8%) 0.167
                HighSloppyPhrase       14.07      (1.7%)       13.76      (4.5%)   -2.2% (  -8% -    4%) 0.042
                          IntNRQ       62.38      (4.3%)       61.02      (5.0%)   -2.2% ( -10% -    7%) 0.137
       BrowseDayOfYearSSDVFacets       11.06     (15.8%)       10.82     (13.2%)   -2.1% ( -26% -   31%) 0.642
                        HighTerm      320.61      (3.8%)      314.19      (3.1%)   -2.0% (  -8% -    5%) 0.067
                      AndHighMed      162.13      (4.1%)      158.93      (3.5%)   -2.0% (  -9% -    5%) 0.103
                    OrNotHighLow      860.40      (3.2%)      846.81      (4.3%)   -1.6% (  -8% -    6%) 0.185
                         MedTerm      340.36      (4.3%)      335.26      (3.1%)   -1.5% (  -8% -    6%) 0.204
                      OrHighHigh       23.30      (3.7%)       22.96      (2.9%)   -1.5% (  -7% -    5%) 0.154
                 MedSloppyPhrase       88.56      (1.8%)       87.31      (1.9%)   -1.4% (  -5% -    2%) 0.018
                       OrHighMed       93.85      (3.7%)       92.65      (3.7%)   -1.3% (  -8% -    6%) 0.271
            HighIntervalsOrdered       21.52      (5.3%)       21.26      (6.7%)   -1.2% ( -12% -   11%) 0.540
                          Fuzzy2       55.74      (2.0%)       55.21      (1.5%)   -1.0% (  -4% -    2%) 0.080
             MedIntervalsOrdered       34.75      (4.9%)       34.43      (6.2%)   -0.9% ( -11% -   10%) 0.600
           HighTermDayOfYearSort      214.31      (2.3%)      212.66      (2.2%)   -0.8% (  -5% -    3%) 0.284
                       MedPhrase      154.98      (3.1%)      153.88      (3.1%)   -0.7% (  -6% -    5%) 0.468
                      TermDTSort       91.27      (2.8%)       90.63      (2.9%)   -0.7% (  -6% -    5%) 0.430
             LowIntervalsOrdered       29.95      (3.3%)       29.74      (4.4%)   -0.7% (  -8% -    7%) 0.574
                        PKLookup      143.41      (3.7%)      142.45      (2.4%)   -0.7% (  -6% -    5%) 0.496
                     LowSpanNear       29.74      (3.7%)       29.56      (4.3%)   -0.6% (  -8% -    7%) 0.630
                          Fuzzy1       93.80      (2.0%)       93.26      (2.6%)   -0.6% (  -5% -    4%) 0.427
                       OrHighLow      308.94      (3.4%)      307.45      (2.6%)   -0.5% (  -6% -    5%) 0.616
                      HighPhrase      170.54      (2.2%)      169.72      (2.0%)   -0.5% (  -4% -    3%) 0.467
                 LowSloppyPhrase       19.79      (1.3%)       19.71      (1.8%)   -0.4% (  -3% -    2%) 0.434
                       LowPhrase       81.06      (3.1%)       80.78      (3.3%)   -0.3% (  -6% -    6%) 0.733
                    HighSpanNear       17.28      (2.5%)       17.23      (2.9%)   -0.3% (  -5% -    5%) 0.708
               HighTermMonthSort     1708.53      (3.5%)     1705.62      (4.1%)   -0.2% (  -7% -    7%) 0.888
                         LowTerm      472.42      (4.8%)      471.79      (4.5%)   -0.1% (  -8% -    9%) 0.928
     BrowseRandomLabelSSDVFacets        6.68      (2.7%)        6.68      (2.6%)   -0.0% (  -5% -    5%) 0.959
                     MedSpanNear       58.66      (1.9%)       58.77      (1.8%)    0.2% (  -3% -    3%) 0.762
           BrowseMonthSSDVFacets       10.94      (1.5%)       10.96      (0.7%)    0.2% (  -1% -    2%) 0.533
            MedTermDayTaxoFacets       15.84      (4.2%)       15.87      (3.6%)    0.2% (  -7% -    8%) 0.849
                         Respell       40.39      (2.7%)       40.63      (2.3%)    0.6% (  -4% -    5%) 0.449
        AndHighHighDayTaxoFacets        9.34      (3.2%)        9.41      (3.3%)    0.7% (  -5% -    7%) 0.473
         AndHighMedDayTaxoFacets       80.88      (3.9%)       81.77      (3.1%)    1.1% (  -5% -    8%) 0.323
          OrHighMedDayTaxoFacets        7.78      (4.1%)        7.87      (5.1%)    1.2% (  -7% -   10%) 0.428
            HighTermTitleBDVSort       11.87      (8.2%)       12.01      (8.0%)    1.2% ( -13% -   18%) 0.651
                      AndHighLow      722.76      (3.5%)      731.28      (3.8%)    1.2% (  -5% -    8%) 0.306

But there doesn't seem to be much improvement either, but I checked the CPU profiles and I didn't see many very samples of BitSet#or so that makes sense. When I took a diff, I saw that the BitSet#or calls increased by 20% (not sure why) but the nextDoc() calls only increased by 4%, so there are some hits being skipped which means that the change seems to be doing something.

@jpountz
Copy link
Contributor

jpountz commented Mar 15, 2023

Do you have any pointer which benchmark task I could potentially use? If there isn't one available, I could try to add some next.

Maybe we could try to leverage the geonames dataset (there's a few benchmarks for it in lucene-util), which has a few low-cardinality fields like the time zone or country. Then enable index sorting on these fields. And make sure we're getting performance when using the time zone in required or excluded clauses?

@zacharymorn
Copy link
Contributor Author

Sorry for the delayed response, so I saw it in my optimized version of BitSet#or that does use the peekNextNonMatchingDocID API. I also found the bug, it turns out we just weren't resetting the lastNonMatchingDoc in the reset() function. I created a pull request with my BitSet#or changes and the (one-liner) bug fix.

I'm also planning on benchmarking this and will post here when I do.

Thanks @mdmarshmallow for the PR and benchmark result. I have left a few comments there. I feel the reasons you are not seeing much change from benchmark could be that the implementation in BitSet#or might got overridden by subclasses, and/or the implementations in subclasses did not utilize their specialized data structures?

@zacharymorn
Copy link
Contributor Author

Maybe we could try to leverage the geonames dataset (there's a few benchmarks for it in lucene-util), which has a few low-cardinality fields like the time zone or country. Then enable index sorting on these fields. And make sure we're getting performance when using the time zone in required or excluded clauses?

Thanks @jpountz for the suggestion! I actually did something similar by modifying the benchmarking code to support sorting index by month field, which is a KeywordField, and creating corresponding query with the following syntax to exercise the ReqExclScorer - Posting - DocValue path that I have changed:

ReqExclWithDocValues: newbenchmarktest//names docvalues:-month:[Jan TO Aug]

However, while I was able to run this query, the test actually failed when verifying top matches & scores between baseline and modified code. Upon further digging, I realized a potential issue to this API idea. As Lucene does a lot of two phase iterations, and two phase iterator's approximation may provide a superset of the actual matches. If we were to use this API to find and ignore / skip over a bunch of doc ids from approximation, wouldn't the result be inaccurate?

For example, in the below skippingReqApproximation disi I put inside ReqExclScorer, exclApproximation.peekNextNonMatchingDocID() may provide an inaccurate, longer run of matches as approximation provides superset matches. If we were to further confirm the boundary by actually checking matches on its iterator, then we basically resort to linear scan on the iterator, which defeats the purpose of this new API?

final DocIdSetIterator skippingReqApproximation =
        new DocIdSetIterator() {
          @Override
          public int docID() {}

          @Override
          public int nextDoc() throws IOException {}

          @Override
          public int advance(int target) throws IOException {
            // this exclNonMatchingDoc could be inaccruate, as exclApproximation provides a superset of matches than its underlying iterator
            int exclNonMatchingDoc = exclApproximation.peekNextNonMatchingDocID(); 

            if (exclApproximation.docID() < target
                && target < exclNonMatchingDoc) {
              return reqApproximation.advance(exclNonMatchingDoc);
            } else {
              return reqApproximation.advance(target);
            }
          }

          @Override
          public long cost() {}
        };

Please let me know if you have any suggestion on this.

@zacharymorn
Copy link
Contributor Author

Hmm, note that the actual QPS is varying quite a bit every time. In your luceneutil run, are you fixing the random seed so the same queries are used every time?

Yeah indeed. I didn't fix the random seed during my luceneutil runs, and thus the results vary a lot as they may depend on the index and queries under test.

It is odd that PKLookup performance drops too.

I did a few more testings for this, and have some interesting findings:

No changes (comparing baseline with baseline) :

Task: AndHighNotMonth: +its -monthPostings:apr #  freq=1160703

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 AndHighNotMonth       62.41      (9.4%)       62.45      (7.7%)    0.1% ( -15% -   18%) 0.979
                        PKLookup      176.62     (28.2%)      177.03     (33.2%)    0.2% ( -47% -   85%) 0.981
Task: AndHighNotMonth: +its -monthPostings:apr #  freq=1160703

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup      175.53     (25.3%)      166.38     (26.2%)   -5.2% ( -45% -   62%) 0.522
                 AndHighNotMonth       60.36     (17.1%)       62.29      (9.0%)    3.2% ( -19% -   35%) 0.459

PKLookup seems varies a lot as well when there are no changes.

With changes (comparing modified with baseline), and also modify task query:

Task: AndHighNotMonth: +its -monthPostings:apr #  freq=1160703

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup      182.12     (28.8%)       84.26      (8.4%)  -53.7% ( -70% -  -23%) 0.000
                 AndHighNotMonth       64.22      (8.4%)      160.51     (59.1%)  149.9% (  75% -  237%) 0.000
Task: AndHighNotMonth: +its -monthPostings:jan #  freq=1160703

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup       81.40     (17.4%)       91.44     (45.6%)   12.3% ( -43% -   91%) 0.258
                 AndHighNotMonth      116.74      (9.2%)      160.54     (45.8%)   37.5% ( -15% -  101%) 0.000
Task: AndHighNotMonth: +its -monthPostings:may #  freq=1160703

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup       80.18      (6.3%)       74.90      (9.4%)   -6.6% ( -20% -    9%) 0.009
                 AndHighNotMonth       92.19     (12.6%)      144.56     (23.6%)   56.8% (  18% -  106%) 0.000
No task, and only PKLookup is run

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup      128.55     (27.2%)      142.59     (36.9%)   10.9% ( -41% -  103%) 0.286

In addition, I noticed adding -Xbatch JVM argument will actually make the -50% slow down go away (and also boost PKLookup's QPS):

localconstants.py

if 'JAVA_EXE' not in globals():
    JAVA_EXE = 'java'
if 'JAVAC_EXE' not in globals():
    JAVAC_EXE = 'javac'
if 'JAVA_COMMAND' not in globals():
    JAVA_COMMAND = '%s -Xbatch' % JAVA_EXE
Task: AndHighNotMonth: +its -monthPostings:apr #  freq=1160703

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        PKLookup      328.59     (10.2%)      347.16      (7.2%)    5.7% ( -10% -   25%) 0.043
                 AndHighNotMonth       60.21      (5.4%)      160.46     (41.8%)  166.5% ( 113% -  225%) 0.000

I suspect it's indeed JVM compilation that's causing the difference? Below is the full jvm command line from modified localconstants above and printed out by benchmark in case it will be useful:

java -Xbatch -XX:StartFlightRecording=dumponexit=true,maxsize=250M,settings=/Users/xichen/IdeaProjects/benchmarks/util/src/python/profiling.jfc,filename=/Users/xichen/IdeaProjects/benchmarks/logs/bench-search-baseline_vs_patch-my_modified_version-19.jfr -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -classpath /Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/core/build/libs/lucene-core-10.0.0-SNAPSHOT.jar:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/sandbox/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/misc/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/facet/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/analysis/common/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/analysis/icu/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/queryparser/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/grouping/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/suggest/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/highlighter/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/codecs/build/classes/java/main:/Users/xichen/IdeaProjects/benchmarks/lucene_candidate/lucene/queries/build/classes/java/main:/Users/xichen/.gradle/caches/modules-2/files-2.1/com.carrotsearch/hppc/0.9.1/4bf4c51e06aec600894d841c4c004566b20dd357/hppc-0.9.1.jar:/Users/xichen/IdeaProjects/benchmarks/util/lib/HdrHistogram.jar:/Users/xichen/IdeaProjects/benchmarks/util/build perf.SearchPerfTest -dirImpl MMapDirectory -indexPath /Users/xichen/IdeaProjects/benchmarks/indices/wikimedium10m.lucene_baseline.facets.taxonomy:Date.taxonomy:Month.taxonomy:DayOfYear.sortedset:Date.sortedset:Month.sortedset:DayOfYear.Lucene90.Lucene90.dvfields.sort=month:custom.nd10M -facets taxonomy:Date;Date -facets taxonomy:Month;Month -facets taxonomy:DayOfYear;DayOfYear -facets sortedset:Date;Date -facets sortedset:Month;Month -facets sortedset:DayOfYear;DayOfYear -analyzer StandardAnalyzer -taskSource /Users/xichen/IdeaProjects/benchmarks/util/tasks/wikimedium.10M.nostopwords.tasks -searchThreadCount 2 -taskRepeatCount 20 -field body -tasksPerCat 1 -staticSeed -2249101 -seed -4093553 -similarity BM25Similarity -commit multi -hiliteImpl FastVectorHighlighter -log /Users/xichen/IdeaProjects/benchmarks/logs/baseline_vs_patch.my_modified_version.19 -topN 100 -pk

In terms of code, PKLookup will execute this section of modified code when its doing doc enumeration, but reverting changes there didn't solve the issue.

@mikemccand
Copy link
Member

PKLookup seems varies a lot as well when there are no changes.

I wonder if luceneutil maybe has a bug where PKLookup task is not using the specified random seed to derive which IDs it looks up? Indeed I have seen it be noisy in the past (not just for you)...

In addition, I noticed adding -Xbatch JVM argument will actually make the -50% slow down go away (and also boost PKLookup's QPS):

Thanks for testing this. We've debated the merits of disabling background compilation (-Xbatch) in the past, but decided it's too risky since nobody actually runs this way in production so the results would not necessarily reflect production impact. It is indeed an interesting data point and does seem to point to "hotspot compilation noise" as the source of the wide differences.

Though, I would also expect that as you vary the particular query (apr, jan, may on the negated clause) that the gains should be quite different? It depends heavily on how the postings fall into long runs or not in the index? Though, the line file docs for luceneutil are randomly sorted, so there should not be a correlation by time with Lucene's docid.

In terms of code, PKLookup will execute this section of modified code when its doing doc enumeration, but reverting changes there didn't solve the issue.

OK thanks for testing.

I think net/net we can conclude that this is all noise and should not block this great change! The speedups for some cases are astounding!

@zacharymorn
Copy link
Contributor Author

Thanks @mikemccand for the additional context!

Thanks for testing this. We've debated the merits of disabling background compilation (-Xbatch) in the past, but decided it's too risky since nobody actually runs this way in production so the results would not necessarily reflect production impact.

+1

Though, I would also expect that as you vary the particular query (apr, jan, may on the negated clause) that the gains should be quite different? It depends heavily on how the postings fall into long runs or not in the index? Though, the line file docs for luceneutil are randomly sorted, so there should not be a correlation by time with Lucene's docid.

The benchmark was run with index sorted by month and the performances gain do vary based on queries (149.9% for apr, 56.8% for may and 37.5% for jan), albeit with different degrees.

@zacharymorn zacharymorn changed the title [GITHUB-11915] [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI [GITHUB-11915] Make Lucene smarter about long runs of matches via new API on DISI Apr 13, 2023
@zacharymorn zacharymorn marked this pull request as ready for review April 13, 2023 04:52
@zacharymorn
Copy link
Contributor Author

Hi @jpountz @mikemccand @rmuir @uschindler @gsmiller , I have added some tests in the last few days and believed this PR is ready for review now, could you please take a look and let me know if you have any suggestion? I'm not particularly sure about my approach for conjunction and leveraging skip data by the way, and am open to alternatives!

@jpountz
Copy link
Contributor

jpountz commented Jun 8, 2023

The logic looks pretty good to me overall. I like that we're seeing good speedups when using FILTER/MUST_NOT clauses on postings that have long runs!

It'd be good to understand if we can reduce the overhead for the case when the optimization doesn't kick in, as I'd expect this case to remain the most common one.

I'm also curious to get other people's take on whether the additional API is worth the performance benefits. I personally like it because it allows having first-class filters (filters on fields that are part of the index sort) that perform much more efficiently than regular filters. This can be especially useful when managing multiple tenants within the same index, e.g. multiple categories of an e-commerce catalog.

@zacharymorn
Copy link
Contributor Author

Thanks @jpountz for the review!

It'd be good to understand if we can reduce the overhead for the case when the optimization doesn't kick in, as I'd expect this case to remain the most common one.

Yes indeed. Aside from further optimizing the logic itself, I have been thinking about two potential approaches, but they each have some pros / cons:

  1. Add a flag for application to control whether this new optimization / API should be used or not.
  2. Automatically pause the use of this optimization and re-check after certain number of docs if the code detects continuous small skips between API calls.

What do you think about these approaches?

@jpountz
Copy link
Contributor

jpountz commented Jun 21, 2023

I'd rather like to avoid introducing a flag, but your second idea sounds interesting. Maybe one way to implement it would be to introduce a bulk scorer for conjunctions, and split the doc ID space into something like 128 equal windows, and only checking for peexNextNonMatchingDocID when moving to a new window. This would only ignore queries whose runs of matches are less than 1% of the total segment size, which wouldn't be very useful to speed things up anyway?

@zacharymorn
Copy link
Contributor Author

Thanks @jpountz for the feedback! On the second approach, I was actually thinking something simpler, such as this 219beab. I ran the benchmark tests after the change, and got the following results:

Index with sorting

Result 1:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
               AndMedFilterMonth     2142.90      (7.1%)     1996.67      (5.0%)   -6.8% ( -17% -    5%) 0.000
        AndHighHighDayTaxoFacets       13.99      (2.3%)       13.08      (2.3%)   -6.5% ( -10% -   -1%) 0.000
               HighTermTitleSort      131.92      (2.5%)      124.03      (2.8%)   -6.0% ( -10% -    0%) 0.000
         AndHighMedDayTaxoFacets       91.57      (2.0%)       86.87      (3.2%)   -5.1% ( -10% -    0%) 0.000
           HighTermDayOfYearSort      235.50      (4.8%)      227.18      (4.1%)   -3.5% ( -11% -    5%) 0.012
            MedTermDayTaxoFacets       72.51      (2.3%)       70.17      (2.5%)   -3.2% (  -7% -    1%) 0.000
               HighTermMonthSort     3126.27      (3.2%)     3032.93      (3.8%)   -3.0% (  -9% -    4%) 0.007
                        Wildcard      280.45      (5.7%)      272.30      (5.7%)   -2.9% ( -13% -    8%) 0.107
          OrHighMedDayTaxoFacets       22.17      (2.5%)       21.55      (2.4%)   -2.8% (  -7% -    2%) 0.000
            HighTermTitleBDVSort       23.23      (2.9%)       22.59      (2.9%)   -2.7% (  -8% -    3%) 0.003
                      TermDTSort      198.40      (9.4%)      194.24      (9.1%)   -2.1% ( -18% -   18%) 0.474
             MedIntervalsOrdered       53.97     (13.6%)       52.88     (12.8%)   -2.0% ( -24% -   28%) 0.629
                     MedSpanNear      172.23      (2.7%)      168.92      (2.4%)   -1.9% (  -6% -    3%) 0.018
           BrowseMonthSSDVFacets       20.44      (5.9%)       20.09      (6.0%)   -1.7% ( -12% -   10%) 0.360
                         Prefix3      879.87      (4.4%)      866.61      (4.5%)   -1.5% (  -9% -    7%) 0.284
                    HighSpanNear       50.36      (2.3%)       49.60      (1.9%)   -1.5% (  -5% -    2%) 0.022
                     LowSpanNear      372.08      (2.3%)      366.84      (1.7%)   -1.4% (  -5% -    2%) 0.030
                      AndHighLow     1852.63      (3.8%)     1829.62      (4.0%)   -1.2% (  -8% -    6%) 0.318
             LowIntervalsOrdered        9.32     (15.4%)        9.21     (15.4%)   -1.2% ( -27% -   35%) 0.811
            HighIntervalsOrdered        1.55     (20.5%)        1.53     (20.6%)   -1.1% ( -35% -   50%) 0.872
       BrowseDayOfYearSSDVFacets       14.47      (4.8%)       14.33      (5.4%)   -1.0% ( -10% -    9%) 0.524
                    OrNotHighLow     1128.41      (4.1%)     1117.38      (3.9%)   -1.0% (  -8% -    7%) 0.438
                   OrHighNotHigh      492.96      (3.9%)      488.47      (4.3%)   -0.9% (  -8% -    7%) 0.486
                       MedPhrase      372.00      (2.7%)      368.75      (3.0%)   -0.9% (  -6% -    4%) 0.325
                    OrNotHighMed      628.90      (2.8%)      623.55      (2.5%)   -0.9% (  -5% -    4%) 0.310
                    OrHighNotLow      791.85      (3.8%)      785.12      (4.5%)   -0.9% (  -8% -    7%) 0.520
                   OrNotHighHigh      447.73      (2.9%)      444.15      (3.1%)   -0.8% (  -6% -    5%) 0.399
                      OrHighHigh       45.13      (4.0%)       44.78      (4.5%)   -0.8% (  -8% -    8%) 0.575
                    OrHighNotMed      543.72      (3.6%)      539.69      (4.2%)   -0.7% (  -8% -    7%) 0.545
                        HighTerm      946.86      (4.2%)      942.67      (3.8%)   -0.4% (  -8% -    7%) 0.727
                         LowTerm     1446.44      (5.6%)     1442.00      (5.9%)   -0.3% ( -11% -   11%) 0.866
                         MedTerm     1064.71      (3.8%)     1061.77      (3.7%)   -0.3% (  -7% -    7%) 0.815
                          Fuzzy1       78.98      (6.5%)       78.77      (6.2%)   -0.3% ( -12% -   13%) 0.896
       BrowseDayOfYearTaxoFacets       14.60      (3.5%)       14.56      (3.0%)   -0.2% (  -6% -    6%) 0.808
                       OrHighLow      383.35      (3.5%)      382.47      (4.1%)   -0.2% (  -7% -    7%) 0.848
                      AndHighMed      419.38      (3.9%)      418.65      (3.4%)   -0.2% (  -7% -    7%) 0.879
                 LowSloppyPhrase       45.89      (1.9%)       45.85      (1.8%)   -0.1% (  -3% -    3%) 0.895
            BrowseDateTaxoFacets       19.56      (2.2%)       19.55      (2.0%)   -0.1% (  -4% -    4%) 0.925
                       OrHighMed       53.51      (4.0%)       53.49      (3.8%)   -0.1% (  -7% -    8%) 0.965
                 MedSloppyPhrase       77.05      (1.8%)       77.03      (2.0%)   -0.0% (  -3% -    3%) 0.962
                     AndHighHigh      119.95      (3.4%)      119.92      (3.2%)   -0.0% (  -6% -    6%) 0.982
           BrowseMonthTaxoFacets       18.33      (2.3%)       18.35      (2.3%)    0.1% (  -4% -    4%) 0.911
                          IntNRQ      119.39     (22.2%)      119.66     (22.4%)    0.2% ( -36% -   57%) 0.974
                       LowPhrase      105.06      (3.4%)      105.34      (4.0%)    0.3% (  -6% -    7%) 0.824
                        PKLookup      282.66      (3.6%)      283.54      (3.4%)    0.3% (  -6% -    7%) 0.782
            BrowseDateSSDVFacets        5.42      (7.9%)        5.45      (7.3%)    0.5% ( -13% -   16%) 0.839
                      HighPhrase       72.39      (2.6%)       72.78      (2.6%)    0.5% (  -4% -    5%) 0.515
                HighSloppyPhrase        1.41      (5.7%)        1.42      (3.5%)    0.9% (  -7% -   10%) 0.567
                         Respell       95.35      (5.5%)       96.22      (5.3%)    0.9% (  -9% -   12%) 0.593
                          Fuzzy2      102.31      (9.0%)      103.34      (9.6%)    1.0% ( -16% -   21%) 0.731
                  AndMedNotMonth      873.00      (3.1%)      984.05      (3.3%)   12.7% (   6% -   19%) 0.000
              AndHighFilterMonth      362.01      (2.8%)      484.47      (6.0%)   33.8% (  24% -   43%) 0.000
                 AndHighNotMonth      253.17      (1.8%)      596.55     (10.5%)  135.6% ( 121% -  150%) 0.000

Result 2:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
         AndHighMedDayTaxoFacets       66.69      (1.9%)       60.98      (4.1%)   -8.6% ( -14% -   -2%) 0.000
                         Prefix3      419.25      (4.7%)      387.06      (4.0%)   -7.7% ( -15% -    1%) 0.000
        AndHighHighDayTaxoFacets       38.85      (1.6%)       36.34      (3.7%)   -6.5% ( -11% -   -1%) 0.000
                    OrNotHighMed      418.16      (2.1%)      395.85      (3.8%)   -5.3% ( -11% -    0%) 0.000
                      TermDTSort      151.83      (3.8%)      143.84      (3.5%)   -5.3% ( -12% -    2%) 0.000
               AndMedFilterMonth     1849.70      (4.7%)     1753.86      (5.3%)   -5.2% ( -14% -    5%) 0.001
               HighTermMonthSort     3079.34      (3.5%)     2932.00      (4.4%)   -4.8% ( -12% -    3%) 0.000
               HighTermTitleSort      161.65     (10.3%)      154.99      (7.7%)   -4.1% ( -20% -   15%) 0.153
          OrHighMedDayTaxoFacets        3.72      (3.8%)        3.59      (2.9%)   -3.5% (  -9% -    3%) 0.001
           HighTermDayOfYearSort      407.83      (3.0%)      394.86      (4.0%)   -3.2% (  -9% -    3%) 0.005
            MedTermDayTaxoFacets       69.43      (2.4%)       67.48      (2.0%)   -2.8% (  -7% -    1%) 0.000
                    OrNotHighLow     1816.25      (3.9%)     1769.50      (3.9%)   -2.6% (  -9% -    5%) 0.037
            HighTermTitleBDVSort       36.84      (2.1%)       36.03      (1.0%)   -2.2% (  -5% -    0%) 0.000
                        Wildcard      162.28      (6.6%)      158.82      (6.4%)   -2.1% ( -14% -   11%) 0.297
                         LowTerm     1036.19      (5.4%)     1019.64      (5.9%)   -1.6% ( -12% -   10%) 0.373
                    HighSpanNear       36.59      (5.4%)       36.09      (5.5%)   -1.4% ( -11% -   10%) 0.428
                     LowSpanNear      103.37      (1.7%)      102.13      (1.9%)   -1.2% (  -4% -    2%) 0.035
                     MedSpanNear       51.35      (1.6%)       50.86      (1.8%)   -0.9% (  -4% -    2%) 0.083
                       OrHighLow      660.72      (3.4%)      656.05      (3.2%)   -0.7% (  -7% -    6%) 0.494
             MedIntervalsOrdered       21.71      (7.6%)       21.57      (7.6%)   -0.6% ( -14% -   15%) 0.794
            HighIntervalsOrdered       12.02      (9.4%)       11.96      (9.2%)   -0.5% ( -17% -   19%) 0.854
             LowIntervalsOrdered       55.04      (7.4%)       54.74      (7.4%)   -0.5% ( -14% -   15%) 0.819
                   OrNotHighHigh      571.44      (2.6%)      568.54      (3.2%)   -0.5% (  -6% -    5%) 0.581
                         MedTerm     1102.15      (4.2%)     1097.46      (4.4%)   -0.4% (  -8% -    8%) 0.755
                      HighPhrase      108.28      (3.3%)      107.92      (2.9%)   -0.3% (  -6% -    6%) 0.736
                          IntNRQ       81.85      (9.6%)       81.66      (9.6%)   -0.2% ( -17% -   21%) 0.941
                   OrHighNotHigh      401.85      (3.9%)      401.24      (4.3%)   -0.2% (  -8% -    8%) 0.907
                HighSloppyPhrase       26.71      (4.1%)       26.67      (3.8%)   -0.1% (  -7% -    8%) 0.908
                         Respell      110.22      (5.5%)      110.08      (5.1%)   -0.1% ( -10% -   11%) 0.939
                    OrHighNotMed      635.92      (3.7%)      635.34      (4.1%)   -0.1% (  -7% -    8%) 0.942
                 LowSloppyPhrase       11.69      (4.1%)       11.68      (3.9%)   -0.1% (  -7% -    8%) 0.951
                          Fuzzy2       35.65      (7.7%)       35.65      (7.3%)   -0.0% ( -13% -   16%) 0.995
                       OrHighMed      225.90      (2.9%)      225.90      (3.1%)    0.0% (  -5% -    6%) 0.998
                        PKLookup      282.08      (3.1%)      282.19      (3.6%)    0.0% (  -6% -    6%) 0.969
            BrowseDateSSDVFacets        5.43      (9.8%)        5.44      (9.7%)    0.1% ( -17% -   21%) 0.981
                      AndHighMed      139.86      (2.5%)      139.97      (2.9%)    0.1% (  -5% -    5%) 0.929
                      OrHighHigh       42.67      (2.4%)       42.71      (2.7%)    0.1% (  -4% -    5%) 0.923
                        HighTerm     1193.77      (4.6%)     1194.76      (4.8%)    0.1% (  -8% -   10%) 0.956
                     AndHighHigh       72.75      (2.0%)       72.84      (3.0%)    0.1% (  -4% -    5%) 0.876
                       MedPhrase       17.85      (2.8%)       17.88      (2.5%)    0.2% (  -5% -    5%) 0.833
            BrowseDateTaxoFacets       19.36      (2.5%)       19.40      (2.2%)    0.2% (  -4% -    5%) 0.794
                      AndHighLow     1612.73      (3.5%)     1616.44      (3.2%)    0.2% (  -6% -    7%) 0.826
                 MedSloppyPhrase       73.06      (2.5%)       73.26      (2.5%)    0.3% (  -4% -    5%) 0.740
       BrowseDayOfYearTaxoFacets       14.68      (4.7%)       14.72      (4.3%)    0.3% (  -8% -    9%) 0.833
       BrowseDayOfYearSSDVFacets       14.23      (7.7%)       14.27      (8.0%)    0.3% ( -14% -   17%) 0.900
           BrowseMonthTaxoFacets       18.14      (2.6%)       18.20      (2.5%)    0.3% (  -4% -    5%) 0.695
                    OrHighNotLow      532.83      (4.4%)      535.13      (5.2%)    0.4% (  -8% -   10%) 0.778
                       LowPhrase       79.15      (2.1%)       79.50      (2.3%)    0.4% (  -3% -    4%) 0.528
                          Fuzzy1      103.09      (7.8%)      104.65      (7.6%)    1.5% ( -12% -   18%) 0.536
           BrowseMonthSSDVFacets       19.62      (7.3%)       19.94     (12.2%)    1.6% ( -16% -   22%) 0.607
                  AndMedNotMonth      891.86      (2.8%)     1048.85      (5.4%)   17.6% (   9% -   26%) 0.000
              AndHighFilterMonth      685.23      (2.7%)      835.74      (6.1%)   22.0% (  12% -   31%) 0.000
                 AndHighNotMonth       76.06      (0.9%)      657.26     (50.3%)  764.1% ( 706% -  822%) 0.000

Index without sorting

Result 1:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                         Prefix3      620.25      (3.8%)      572.56      (3.2%)   -7.7% ( -14% -    0%) 0.000
                        Wildcard       94.62      (6.0%)       87.96      (5.2%)   -7.0% ( -17% -    4%) 0.000
               HighTermTitleSort      185.23     (15.8%)      172.39     (15.9%)   -6.9% ( -33% -   29%) 0.167
        AndHighHighDayTaxoFacets       13.33      (3.2%)       12.53      (3.2%)   -6.0% ( -11% -    0%) 0.000
               AndMedFilterMonth      780.10      (3.2%)      750.30      (3.5%)   -3.8% ( -10% -    2%) 0.000
         AndHighMedDayTaxoFacets       78.48      (1.9%)       75.69      (2.5%)   -3.6% (  -7% -    0%) 0.000
          OrHighMedDayTaxoFacets        9.81      (6.9%)        9.48      (4.9%)   -3.4% ( -14% -    9%) 0.077
            MedTermDayTaxoFacets       91.99      (2.5%)       89.02      (2.2%)   -3.2% (  -7% -    1%) 0.000
              AndHighFilterMonth      458.58      (4.4%)      444.81      (4.4%)   -3.0% ( -11% -    6%) 0.032
            HighTermTitleBDVSort       28.60      (2.0%)       27.83      (1.3%)   -2.7% (  -5% -    0%) 0.000
                          IntNRQ       84.30      (9.3%)       82.10     (11.6%)   -2.6% ( -21% -   20%) 0.430
               HighTermMonthSort     2392.61      (7.2%)     2330.41      (5.4%)   -2.6% ( -14% -   10%) 0.198
                    OrNotHighLow     1069.43      (3.6%)     1042.84      (4.4%)   -2.5% ( -10% -    5%) 0.050
             LowIntervalsOrdered      227.98      (2.7%)      222.64      (4.2%)   -2.3% (  -8% -    4%) 0.035
                      TermDTSort      212.12      (3.3%)      207.21      (3.9%)   -2.3% (  -9% -    5%) 0.044
            HighIntervalsOrdered       38.94      (1.8%)       38.11      (3.3%)   -2.1% (  -7% -    3%) 0.011
                    OrNotHighMed      593.33      (3.4%)      581.36      (3.7%)   -2.0% (  -8% -    5%) 0.074
             MedIntervalsOrdered       21.30      (1.3%)       20.88      (2.9%)   -2.0% (  -6% -    2%) 0.005
                     MedSpanNear       25.77      (2.5%)       25.41      (2.4%)   -1.4% (  -6% -    3%) 0.062
                    HighSpanNear       17.69      (2.7%)       17.45      (2.7%)   -1.4% (  -6% -    4%) 0.107
                     LowSpanNear      128.36      (1.4%)      126.90      (1.3%)   -1.1% (  -3% -    1%) 0.008
                   OrNotHighHigh      548.99      (3.4%)      545.38      (2.9%)   -0.7% (  -6% -    5%) 0.508
           HighTermDayOfYearSort      424.25      (5.8%)      421.73      (3.8%)   -0.6% (  -9% -    9%) 0.703
                   OrHighNotHigh      465.31      (3.7%)      462.73      (4.0%)   -0.6% (  -7% -    7%) 0.648
                        PKLookup      292.46      (4.0%)      290.85      (3.8%)   -0.6% (  -7% -    7%) 0.652
                      HighPhrase      170.22      (3.6%)      169.76      (3.5%)   -0.3% (  -7% -    7%) 0.808
           BrowseMonthTaxoFacets       40.22     (23.2%)       40.14     (23.1%)   -0.2% ( -37% -   60%) 0.979
            BrowseDateTaxoFacets       45.02     (10.1%)       44.99     (10.0%)   -0.1% ( -18% -   22%) 0.982
       BrowseDayOfYearTaxoFacets       45.36      (7.5%)       45.33      (7.3%)   -0.1% ( -13% -   15%) 0.978
                 LowSloppyPhrase       17.25      (5.0%)       17.25      (4.9%)   -0.0% (  -9% -   10%) 0.999
                          Fuzzy1       83.95      (4.1%)       84.05      (4.2%)    0.1% (  -7% -    8%) 0.928
       BrowseDayOfYearSSDVFacets       26.61     (26.3%)       26.64     (26.3%)    0.1% ( -41% -   71%) 0.988
                       MedPhrase       47.60      (3.3%)       47.66      (3.3%)    0.1% (  -6% -    7%) 0.900
                 MedSloppyPhrase       25.42      (4.8%)       25.45      (4.8%)    0.1% (  -9% -   10%) 0.930
                    OrHighNotMed      533.40      (3.8%)      534.12      (3.9%)    0.1% (  -7% -    8%) 0.912
                       LowPhrase       72.95      (2.5%)       73.07      (2.5%)    0.2% (  -4% -    5%) 0.836
                      AndHighMed      331.56      (5.4%)      332.11      (5.5%)    0.2% ( -10% -   11%) 0.923
                         LowTerm     1279.72      (7.0%)     1282.14      (5.1%)    0.2% ( -11% -   13%) 0.922
                         Respell       94.36      (6.2%)       94.70      (5.9%)    0.4% ( -11% -   13%) 0.850
           BrowseMonthSSDVFacets       26.44     (25.9%)       26.53     (25.7%)    0.4% ( -40% -   70%) 0.964
                HighSloppyPhrase       38.93      (4.1%)       39.09      (4.0%)    0.4% (  -7% -    8%) 0.750
                  AndMedNotMonth      891.59      (3.9%)      895.75      (2.6%)    0.5% (  -5% -    7%) 0.657
                      AndHighLow     1747.72      (3.8%)     1756.48      (4.2%)    0.5% (  -7% -    8%) 0.693
                     AndHighHigh       77.63      (4.2%)       78.03      (4.1%)    0.5% (  -7% -    9%) 0.697
                    OrHighNotLow      646.76      (4.5%)      650.22      (4.7%)    0.5% (  -8% -   10%) 0.712
                          Fuzzy2       69.85      (7.1%)       70.28      (7.4%)    0.6% ( -12% -   16%) 0.789
                       OrHighMed      251.43      (4.0%)      252.98      (4.3%)    0.6% (  -7% -    9%) 0.637
            BrowseDateSSDVFacets        5.52      (2.8%)        5.56      (4.1%)    0.7% (  -6% -    7%) 0.534
                       OrHighLow      509.73      (5.2%)      516.04      (4.7%)    1.2% (  -8% -   11%) 0.432
                        HighTerm      716.67      (5.0%)      725.95      (4.2%)    1.3% (  -7% -   10%) 0.372
                         MedTerm     1009.75      (4.5%)     1023.79      (4.1%)    1.4% (  -6% -   10%) 0.306
                      OrHighHigh       40.11      (3.8%)       40.67      (3.4%)    1.4% (  -5% -    8%) 0.217
                 AndHighNotMonth      443.44      (4.1%)      467.71      (3.5%)    5.5% (  -2% -   13%) 0.000

Result 2:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                         Prefix3      379.27      (4.2%)      350.63      (3.5%)   -7.6% ( -14% -    0%) 0.000
               HighTermMonthSort     2644.50     (10.0%)     2467.84      (7.9%)   -6.7% ( -22% -   12%) 0.020
                        Wildcard      111.47      (5.8%)      105.48      (5.2%)   -5.4% ( -15% -    6%) 0.002
               HighTermTitleSort      163.42      (8.4%)      155.35      (6.4%)   -4.9% ( -18% -   10%) 0.036
         AndHighMedDayTaxoFacets      106.98      (2.2%)      101.81      (3.1%)   -4.8% (  -9% -    0%) 0.000
                      TermDTSort      200.05      (4.0%)      192.23      (6.4%)   -3.9% ( -13% -    6%) 0.021
        AndHighHighDayTaxoFacets       39.09      (1.8%)       37.65      (1.5%)   -3.7% (  -6% -    0%) 0.000
              AndHighFilterMonth      205.53      (2.8%)      198.29      (3.4%)   -3.5% (  -9% -    2%) 0.000
            HighTermTitleBDVSort       10.04      (2.4%)        9.74      (2.1%)   -3.0% (  -7% -    1%) 0.000
            MedTermDayTaxoFacets       89.94      (2.0%)       87.38      (1.8%)   -2.8% (  -6% -    1%) 0.000
           HighTermDayOfYearSort      418.97      (6.0%)      408.95      (6.9%)   -2.4% ( -14% -   11%) 0.241
               AndMedFilterMonth      639.08      (2.2%)      624.53      (2.5%)   -2.3% (  -6% -    2%) 0.002
          OrHighMedDayTaxoFacets        7.17      (4.0%)        7.02      (4.4%)   -2.1% ( -10% -    6%) 0.112
                 AndHighNotMonth      606.76      (4.7%)      597.90      (4.7%)   -1.5% ( -10% -    8%) 0.328
                    OrNotHighMed      624.52      (2.5%)      616.48      (3.1%)   -1.3% (  -6% -    4%) 0.149
                         MedTerm      883.90      (5.7%)      872.74      (4.5%)   -1.3% ( -10% -    9%) 0.436
                  AndMedNotMonth     1015.43      (3.5%)     1003.04      (2.7%)   -1.2% (  -7% -    5%) 0.216
                     MedSpanNear       11.88      (0.9%)       11.75      (0.8%)   -1.1% (  -2% -    0%) 0.000
                      OrHighHigh       42.30      (4.6%)       41.85      (3.4%)   -1.1% (  -8% -    7%) 0.406
                    OrHighNotMed      573.99      (4.2%)      568.16      (4.5%)   -1.0% (  -9% -    8%) 0.463
             MedIntervalsOrdered       38.88      (6.8%)       38.50      (6.9%)   -1.0% ( -13% -   13%) 0.656
                        HighTerm     1015.56      (5.6%)     1005.98      (4.9%)   -0.9% ( -10% -   10%) 0.571
             LowIntervalsOrdered      154.19      (8.0%)      152.80      (8.2%)   -0.9% ( -15% -   16%) 0.724
                    OrHighNotLow      844.90      (4.9%)      837.29      (4.9%)   -0.9% ( -10% -    9%) 0.561
                     LowSpanNear       78.74      (4.1%)       78.11      (4.1%)   -0.8% (  -8% -    7%) 0.541
                       OrHighMed      153.42      (4.0%)      152.21      (3.4%)   -0.8% (  -7% -    6%) 0.504
                    HighSpanNear        2.96      (0.7%)        2.94      (0.7%)   -0.8% (  -2% -    0%) 0.001
                       MedPhrase      376.73      (2.0%)      374.16      (1.9%)   -0.7% (  -4% -    3%) 0.272
            HighIntervalsOrdered       29.22      (7.3%)       29.03      (7.4%)   -0.7% ( -14% -   15%) 0.769
                   OrHighNotHigh      498.90      (4.1%)      496.29      (4.3%)   -0.5% (  -8% -    8%) 0.695
                 LowSloppyPhrase      132.03      (3.4%)      131.46      (3.1%)   -0.4% (  -6% -    6%) 0.674
                        PKLookup      293.57      (3.3%)      292.52      (3.3%)   -0.4% (  -6% -    6%) 0.735
                         LowTerm     1274.81      (7.5%)     1270.44      (6.3%)   -0.3% ( -13% -   14%) 0.876
       BrowseDayOfYearSSDVFacets       25.40     (20.8%)       25.33     (20.8%)   -0.3% ( -34% -   52%) 0.968
                      AndHighMed      185.94      (4.1%)      185.51      (4.1%)   -0.2% (  -8% -    8%) 0.859
                      AndHighLow     1532.23      (4.7%)     1529.57      (4.7%)   -0.2% (  -9% -    9%) 0.908
                   OrNotHighHigh      433.52      (3.2%)      433.05      (3.2%)   -0.1% (  -6% -    6%) 0.914
                    OrNotHighLow     2173.75      (2.4%)     2171.48      (4.1%)   -0.1% (  -6% -    6%) 0.922
                HighSloppyPhrase       60.06      (4.6%)       60.02      (4.1%)   -0.1% (  -8% -    8%) 0.966
                          IntNRQ       96.82      (0.8%)       96.78      (0.8%)   -0.0% (  -1% -    1%) 0.867
            BrowseDateTaxoFacets       43.83     (12.3%)       43.82     (12.3%)   -0.0% ( -21% -   27%) 0.992
           BrowseMonthSSDVFacets       25.69     (20.6%)       25.69     (20.6%)   -0.0% ( -34% -   51%) 0.998
                 MedSloppyPhrase      160.44      (3.8%)      160.44      (3.6%)   -0.0% (  -7% -    7%) 0.998
           BrowseMonthTaxoFacets       35.59     (27.4%)       35.60     (27.3%)    0.0% ( -42% -   75%) 0.996
                          Fuzzy1       86.58      (7.6%)       86.62      (6.8%)    0.1% ( -13% -   15%) 0.981
                       OrHighLow     1145.19      (3.4%)     1146.01      (3.5%)    0.1% (  -6% -    7%) 0.948
            BrowseDateSSDVFacets        5.47      (2.3%)        5.48      (2.3%)    0.1% (  -4% -    4%) 0.917
       BrowseDayOfYearTaxoFacets       44.34     (11.1%)       44.38     (11.3%)    0.1% ( -20% -   25%) 0.976
                         Respell       90.09      (4.5%)       90.25      (4.4%)    0.2% (  -8% -    9%) 0.899
                     AndHighHigh      113.61      (3.3%)      113.84      (3.5%)    0.2% (  -6% -    7%) 0.849
                      HighPhrase      103.91      (4.5%)      104.21      (4.5%)    0.3% (  -8% -    9%) 0.837
                       LowPhrase      103.85      (3.5%)      104.28      (3.2%)    0.4% (  -6% -    7%) 0.697
                          Fuzzy2       92.09      (8.4%)       92.55     (10.2%)    0.5% ( -16% -   20%) 0.865

Overall, it was able to prevent severe performance drop when the index is unsorted, and only had around 5% negative impact to AndMedFilterMonth task.

However, I also noticed that somehow with sorted index, AndMedFilterMonth was also negatively impacted, so I also made some changes to the task definition below to use the same set of months from AndHighFilterMonth, just to see if query / index characteristics might have affected this.

--- a/tasks/wikimedium.10M.nostopwords.tasks
+++ b/tasks/wikimedium.10M.nostopwords.tasks
@@ -17381,8 +17381,8 @@ AndHighFilterMonth: +united +filter=monthPostings:feb #  freq=1185528
 AndHighFilterMonth: +year +filter=monthPostings:mar #  freq=1098425
 AndHighFilterMonth: +its +filter=monthPostings:apr #  freq=1160703
 AndHighFilterMonth: +but +filter=monthPostings:may #  freq=1484398
-AndMedFilterMonth: +mostly +filter=monthPostings:jun #  freq=89401
-AndMedFilterMonth: +interview +filter=monthPostings:jul #  freq=94736
-AndMedFilterMonth: +9 +filter=monthPostings:aug #  freq=541405
-AndMedFilterMonth: +hard +filter=monthPostings:sep #  freq=92045
-AndMedFilterMonth: +bay +filter=monthPostings:oct #  freq=117167
\ No newline at end of file
+AndMedFilterMonth: +mostly +filter=monthPostings:jan #  freq=89401
+AndMedFilterMonth: +interview +filter=monthPostings:feb #  freq=94736
+AndMedFilterMonth: +9 +filter=monthPostings:mar #  freq=541405
+AndMedFilterMonth: +hard +filter=monthPostings:apr #  freq=92045
+AndMedFilterMonth: +bay +filter=monthPostings:may #  freq=117167

and got the following results:

Index with sorting

Result 1:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                         Prefix3      283.26      (5.8%)      253.88      (4.9%)  -10.4% ( -19% -    0%) 0.000
        AndHighHighDayTaxoFacets       20.44      (2.7%)       19.15      (2.4%)   -6.3% ( -11% -   -1%) 0.000
               HighTermTitleSort      120.21      (4.1%)      114.08      (3.7%)   -5.1% ( -12% -    2%) 0.000
         AndHighMedDayTaxoFacets      170.82      (3.3%)      162.82      (3.8%)   -4.7% ( -11% -    2%) 0.000
               HighTermMonthSort     3189.35      (3.2%)     3057.15      (3.0%)   -4.1% ( -10% -    2%) 0.000
            MedTermDayTaxoFacets       53.47      (2.7%)       51.33      (2.3%)   -4.0% (  -8% -    1%) 0.000
                      TermDTSort      195.70      (3.6%)      188.61      (3.8%)   -3.6% ( -10% -    4%) 0.002
           HighTermDayOfYearSort      373.45      (2.5%)      360.41      (3.2%)   -3.5% (  -9% -    2%) 0.000
                    OrNotHighLow     1406.59      (3.4%)     1357.71      (3.8%)   -3.5% ( -10% -    3%) 0.002
          OrHighMedDayTaxoFacets       19.41      (5.5%)       18.90      (4.5%)   -2.7% ( -11% -    7%) 0.092
                        Wildcard      147.47      (6.6%)      144.35      (6.7%)   -2.1% ( -14% -   11%) 0.314
                 LowSloppyPhrase      351.23      (4.5%)      344.15      (4.5%)   -2.0% ( -10% -    7%) 0.156
            HighTermTitleBDVSort       30.79      (4.5%)       30.22      (4.2%)   -1.8% ( -10% -    7%) 0.177
            BrowseDateSSDVFacets        5.64      (7.3%)        5.55     (10.2%)   -1.7% ( -17% -   17%) 0.553
                         LowTerm     1412.43      (5.4%)     1393.26      (4.4%)   -1.4% ( -10% -    8%) 0.382
                    HighSpanNear       35.54      (1.5%)       35.08      (1.1%)   -1.3% (  -3% -    1%) 0.003
            HighIntervalsOrdered        6.83     (13.5%)        6.74     (13.5%)   -1.3% ( -24% -   29%) 0.764
                       OrHighLow      715.40      (3.0%)      706.45      (4.5%)   -1.3% (  -8% -    6%) 0.304
                    OrNotHighMed      667.12      (2.4%)      659.49      (2.9%)   -1.1% (  -6% -    4%) 0.173
             LowIntervalsOrdered       16.55      (9.1%)       16.37      (9.1%)   -1.1% ( -17% -   18%) 0.707
                     MedSpanNear      137.56      (2.3%)      136.08      (2.1%)   -1.1% (  -5% -    3%) 0.125
                     LowSpanNear      126.90      (1.7%)      125.66      (1.4%)   -1.0% (  -3% -    2%) 0.046
                       MedPhrase      259.38      (5.0%)      257.29      (5.4%)   -0.8% ( -10% -   10%) 0.625
             MedIntervalsOrdered       94.29     (13.2%)       93.61     (13.3%)   -0.7% ( -24% -   29%) 0.863
                    OrHighNotLow      736.08      (4.6%)      731.29      (4.4%)   -0.7% (  -9% -    8%) 0.645
                        PKLookup      277.80      (4.1%)      276.28      (4.9%)   -0.5% (  -9% -    8%) 0.702
                   OrNotHighHigh      505.64      (3.7%)      503.51      (3.3%)   -0.4% (  -7% -    6%) 0.701
                          Fuzzy2       85.81      (4.3%)       85.49      (7.0%)   -0.4% ( -11% -   11%) 0.837
       BrowseDayOfYearSSDVFacets       14.25      (6.6%)       14.20      (6.0%)   -0.4% ( -12% -   13%) 0.853
                      AndHighMed      123.40      (3.7%)      122.95      (3.0%)   -0.4% (  -6% -    6%) 0.729
                      HighPhrase      105.62      (2.9%)      105.25      (3.0%)   -0.4% (  -6% -    5%) 0.704
                HighSloppyPhrase       25.14      (5.8%)       25.06      (5.8%)   -0.3% ( -11% -   11%) 0.862
                   OrHighNotHigh      488.98      (3.4%)      487.73      (3.4%)   -0.3% (  -6% -    6%) 0.812
                         MedTerm     2015.74      (3.9%)     2010.82      (3.7%)   -0.2% (  -7% -    7%) 0.839
                     AndHighHigh       83.67      (3.7%)       83.48      (3.0%)   -0.2% (  -6% -    6%) 0.831
                          IntNRQ      101.13      (6.3%)      100.91      (6.3%)   -0.2% ( -12% -   13%) 0.914
                       LowPhrase       39.11      (3.0%)       39.03      (3.3%)   -0.2% (  -6% -    6%) 0.845
                 MedSloppyPhrase      106.20      (2.6%)      106.07      (2.4%)   -0.1% (  -5% -    5%) 0.875
                          Fuzzy1       79.48      (7.4%)       79.42      (7.4%)   -0.1% ( -13% -   15%) 0.974
            BrowseDateTaxoFacets       19.43      (2.5%)       19.42      (2.2%)   -0.1% (  -4% -    4%) 0.938
                         Respell       83.42      (7.4%)       83.45      (7.9%)    0.0% ( -14% -   16%) 0.987
                        HighTerm      819.26      (5.4%)      820.06      (4.9%)    0.1% (  -9% -   10%) 0.952
                      OrHighHigh       34.32      (4.6%)       34.36      (3.9%)    0.1% (  -8% -    9%) 0.940
                       OrHighMed       92.58      (3.8%)       92.67      (3.1%)    0.1% (  -6% -    7%) 0.924
                    OrHighNotMed      572.17      (4.5%)      572.87      (4.5%)    0.1% (  -8% -    9%) 0.931
           BrowseMonthTaxoFacets       18.19      (2.6%)       18.24      (2.4%)    0.3% (  -4% -    5%) 0.709
       BrowseDayOfYearTaxoFacets       14.46      (3.5%)       14.55      (3.0%)    0.7% (  -5% -    7%) 0.531
           BrowseMonthSSDVFacets       20.07      (7.4%)       20.23      (9.7%)    0.8% ( -15% -   19%) 0.766
                      AndHighLow     1259.85      (4.0%)     1273.60      (3.8%)    1.1% (  -6% -    9%) 0.373
                  AndMedNotMonth     1075.74      (3.8%)     1172.98      (3.2%)    9.0% (   1% -   16%) 0.000
               AndMedFilterMonth     1140.24      (6.0%)     1289.34      (5.3%)   13.1% (   1% -   25%) 0.000
              AndHighFilterMonth      467.70      (3.7%)      596.35      (4.9%)   27.5% (  18% -   37%) 0.000
                 AndHighNotMonth      364.62      (3.0%)      627.22      (7.8%)   72.0% (  59% -   85%) 0.000

Result 2:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                         Prefix3      378.42      (4.0%)      353.70      (3.3%)   -6.5% ( -13% -    0%) 0.000
         AndHighMedDayTaxoFacets       46.31      (2.7%)       43.95      (3.0%)   -5.1% ( -10% -    0%) 0.000
                      TermDTSort      195.09      (4.0%)      186.41      (4.1%)   -4.5% ( -12% -    3%) 0.001
               HighTermTitleSort      110.04      (3.9%)      105.25      (4.3%)   -4.4% ( -12% -    4%) 0.001
        AndHighHighDayTaxoFacets       28.36      (2.3%)       27.13      (2.2%)   -4.3% (  -8% -    0%) 0.000
                        Wildcard      220.25      (4.3%)      212.79      (3.8%)   -3.4% ( -11% -    4%) 0.008
                       OrHighLow      430.09      (5.2%)      420.41      (6.3%)   -2.3% ( -13% -    9%) 0.218
           HighTermDayOfYearSort      322.71      (3.0%)      315.77      (3.1%)   -2.2% (  -7% -    4%) 0.025
            HighTermTitleBDVSort       31.31      (2.2%)       30.70      (1.6%)   -2.0% (  -5% -    1%) 0.001
            MedTermDayTaxoFacets       91.69      (3.4%)       89.93      (2.8%)   -1.9% (  -7% -    4%) 0.050
                    OrHighNotLow      542.35      (6.1%)      532.91      (4.4%)   -1.7% ( -11% -    9%) 0.301
          OrHighMedDayTaxoFacets        8.60      (4.3%)        8.46      (3.7%)   -1.7% (  -9% -    6%) 0.188
                    HighSpanNear       48.01      (1.6%)       47.32      (1.3%)   -1.4% (  -4% -    1%) 0.002
                    OrHighNotMed      847.43      (4.6%)      835.55      (4.1%)   -1.4% (  -9% -    7%) 0.310
                         MedTerm      962.30      (6.4%)      950.63      (5.4%)   -1.2% ( -12% -   11%) 0.517
                          Fuzzy1       81.00      (6.5%)       80.08      (6.0%)   -1.1% ( -12% -   12%) 0.566
                   OrNotHighHigh      552.98      (3.7%)      546.73      (3.0%)   -1.1% (  -7% -    5%) 0.286
                     MedSpanNear       12.59      (0.8%)       12.45      (0.7%)   -1.1% (  -2% -    0%) 0.000
                         LowTerm     1599.39      (6.3%)     1582.98      (3.8%)   -1.0% ( -10% -    9%) 0.536
                      OrHighHigh       45.70      (4.6%)       45.27      (4.6%)   -1.0% (  -9% -    8%) 0.512
                    OrNotHighLow     1445.93      (3.8%)     1433.84      (3.9%)   -0.8% (  -8% -    7%) 0.494
                   OrHighNotHigh      561.36      (4.2%)      557.15      (3.0%)   -0.7% (  -7% -    6%) 0.514
               HighTermMonthSort     2730.23      (3.9%)     2711.82      (3.6%)   -0.7% (  -7% -    7%) 0.566
             MedIntervalsOrdered       80.30      (5.8%)       79.80      (5.8%)   -0.6% ( -11% -   11%) 0.736
                          Fuzzy2       87.54      (6.0%)       87.09      (5.7%)   -0.5% ( -11% -   11%) 0.780
            HighIntervalsOrdered        6.28      (5.5%)        6.25      (5.5%)   -0.5% ( -10% -   11%) 0.776
                         Respell       71.34      (4.4%)       71.00      (4.3%)   -0.5% (  -8% -    8%) 0.727
                       MedPhrase      214.30      (4.9%)      213.27      (4.7%)   -0.5% (  -9% -    9%) 0.754
             LowIntervalsOrdered       10.88      (5.0%)       10.83      (5.1%)   -0.5% ( -10% -   10%) 0.769
                    OrNotHighMed      503.93      (3.9%)      501.66      (3.0%)   -0.5% (  -7% -    6%) 0.683
                          IntNRQ       53.53      (5.3%)       53.33      (4.6%)   -0.4% (  -9% -   10%) 0.810
                        HighTerm      788.21      (6.7%)      785.26      (5.1%)   -0.4% ( -11% -   12%) 0.843
                       LowPhrase      581.98      (4.8%)      579.89      (5.2%)   -0.4% (  -9% -   10%) 0.822
                       OrHighMed      265.72      (3.6%)      264.97      (3.6%)   -0.3% (  -7% -    7%) 0.803
                        PKLookup      281.97      (2.5%)      281.25      (2.3%)   -0.3% (  -4% -    4%) 0.736
                     LowSpanNear       97.42      (1.4%)       97.18      (1.8%)   -0.2% (  -3% -    2%) 0.620
                 MedSloppyPhrase       57.32      (3.4%)       57.19      (2.9%)   -0.2% (  -6% -    6%) 0.816
                HighSloppyPhrase       30.68      (2.5%)       30.63      (2.6%)   -0.1% (  -5% -    5%) 0.865
                 LowSloppyPhrase       32.06      (2.8%)       32.02      (2.2%)   -0.1% (  -5% -    5%) 0.871
           BrowseMonthSSDVFacets       20.53      (4.6%)       20.52      (4.7%)   -0.1% (  -9% -    9%) 0.956
       BrowseDayOfYearSSDVFacets       14.53      (2.6%)       14.52      (2.5%)   -0.0% (  -5% -    5%) 0.963
       BrowseDayOfYearTaxoFacets       14.63      (2.8%)       14.66      (2.9%)    0.2% (  -5% -    6%) 0.808
            BrowseDateTaxoFacets       19.45      (2.4%)       19.51      (2.2%)    0.3% (  -4% -    5%) 0.671
           BrowseMonthTaxoFacets       18.23      (2.5%)       18.31      (2.5%)    0.4% (  -4% -    5%) 0.611
                      HighPhrase      161.02      (4.2%)      162.00      (3.8%)    0.6% (  -7% -    8%) 0.634
                     AndHighHigh       75.68      (3.6%)       76.22      (3.8%)    0.7% (  -6% -    8%) 0.540
                      AndHighLow     1653.05      (4.5%)     1665.61      (3.5%)    0.8% (  -6% -    9%) 0.550
                      AndHighMed      317.63      (4.7%)      321.08      (3.9%)    1.1% (  -7% -   10%) 0.428
            BrowseDateSSDVFacets        5.47     (12.7%)        5.53     (14.0%)    1.1% ( -22% -   31%) 0.792
                  AndMedNotMonth      907.12      (5.8%)     1006.04      (5.5%)   10.9% (   0% -   23%) 0.000
               AndMedFilterMonth      826.00      (3.8%)     1014.21      (6.3%)   22.8% (  12% -   34%) 0.000
              AndHighFilterMonth      484.41      (3.8%)      618.97      (5.9%)   27.8% (  17% -   39%) 0.000
                 AndHighNotMonth       76.61      (1.0%)      690.62     (48.2%)  801.4% ( 745% -  858%) 0.000

Index without sorting

Result 1:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        Wildcard       35.96      (7.1%)       32.01      (4.6%)  -11.0% ( -21% -    0%) 0.000
                         Prefix3      560.43      (3.0%)      526.43      (3.1%)   -6.1% ( -11% -    0%) 0.000
               AndMedFilterMonth      264.61      (2.4%)      249.82      (2.9%)   -5.6% ( -10% -    0%) 0.000
         AndHighMedDayTaxoFacets       61.28      (1.6%)       58.14      (3.0%)   -5.1% (  -9% -    0%) 0.000
        AndHighHighDayTaxoFacets       24.07      (2.2%)       22.87      (2.5%)   -5.0% (  -9% -    0%) 0.000
               HighTermTitleSort      229.96      (3.0%)      220.60      (3.1%)   -4.1% (  -9% -    2%) 0.000
              AndHighFilterMonth      294.96      (3.8%)      283.28      (4.5%)   -4.0% ( -11% -    4%) 0.002
          OrHighMedDayTaxoFacets       11.38      (4.1%)       10.93      (3.8%)   -3.9% ( -11% -    4%) 0.002
           HighTermDayOfYearSort      412.03      (3.4%)      397.78      (2.8%)   -3.5% (  -9% -    2%) 0.000
                      TermDTSort      226.90      (2.8%)      219.54      (2.7%)   -3.2% (  -8% -    2%) 0.000
            MedTermDayTaxoFacets       60.64      (3.1%)       58.70      (2.5%)   -3.2% (  -8% -    2%) 0.000
            HighTermTitleBDVSort       21.84      (4.4%)       21.15      (4.5%)   -3.2% ( -11% -    5%) 0.023
                    OrNotHighLow     1026.62      (3.2%)      994.09      (3.8%)   -3.2% (  -9% -    3%) 0.004
                    OrNotHighMed      545.06      (2.5%)      530.44      (4.1%)   -2.7% (  -8% -    3%) 0.012
            HighIntervalsOrdered       44.88      (6.0%)       44.07      (5.3%)   -1.8% ( -12% -   10%) 0.315
             LowIntervalsOrdered      108.36      (3.1%)      106.48      (3.8%)   -1.7% (  -8% -    5%) 0.110
                     LowSpanNear       49.51      (1.9%)       48.78      (2.0%)   -1.5% (  -5% -    2%) 0.016
               HighTermMonthSort     2328.38      (7.3%)     2294.68      (6.9%)   -1.4% ( -14% -   13%) 0.518
                          Fuzzy1      150.33      (6.2%)      148.18      (6.3%)   -1.4% ( -13% -   11%) 0.467
                    HighSpanNear       29.48      (1.2%)       29.09      (1.1%)   -1.3% (  -3% -    1%) 0.000
             MedIntervalsOrdered      179.17      (5.8%)      176.92      (5.5%)   -1.3% ( -11% -   10%) 0.484
                     MedSpanNear       15.41      (2.1%)       15.24      (2.0%)   -1.1% (  -5% -    3%) 0.086
                   OrHighNotHigh      525.87      (4.7%)      520.47      (3.8%)   -1.0% (  -9% -    7%) 0.449
                       OrHighLow      567.25      (4.0%)      563.56      (4.5%)   -0.7% (  -8% -    8%) 0.627
                 AndHighNotMonth      585.89      (5.8%)      582.21      (5.9%)   -0.6% ( -11% -   11%) 0.736
                         LowTerm     1183.22      (5.6%)     1176.56      (4.9%)   -0.6% ( -10% -   10%) 0.735
                  AndMedNotMonth      860.00      (4.3%)      855.40      (4.2%)   -0.5% (  -8% -    8%) 0.689
                         Respell       73.80      (5.1%)       73.55      (5.1%)   -0.3% ( -10% -   10%) 0.833
                      HighPhrase       80.32      (2.8%)       80.13      (3.1%)   -0.2% (  -5% -    5%) 0.797
                        PKLookup      288.52      (4.4%)      287.84      (4.8%)   -0.2% (  -8% -    9%) 0.869
                   OrNotHighHigh      512.10      (4.5%)      510.97      (3.9%)   -0.2% (  -8% -    8%) 0.868
                        HighTerm      655.55      (7.1%)      654.16      (6.5%)   -0.2% ( -12% -   14%) 0.922
                         MedTerm     1090.42      (6.0%)     1088.33      (5.3%)   -0.2% ( -10% -   11%) 0.915
                HighSloppyPhrase        8.83      (3.0%)        8.81      (3.2%)   -0.1% (  -6% -    6%) 0.884
            BrowseDateSSDVFacets        5.36      (6.7%)        5.36      (6.7%)   -0.1% ( -12% -   14%) 0.956
       BrowseDayOfYearTaxoFacets       45.49      (5.4%)       45.47      (5.3%)   -0.0% ( -10% -   11%) 0.986
                    OrHighNotMed      602.62      (4.7%)      602.56      (4.8%)   -0.0% (  -9% -    9%) 0.995
                 LowSloppyPhrase       13.54      (1.9%)       13.54      (1.9%)    0.0% (  -3% -    3%) 0.989
            BrowseDateTaxoFacets       45.52      (5.4%)       45.53      (5.4%)    0.0% ( -10% -   11%) 0.992
                 MedSloppyPhrase       22.22      (2.2%)       22.23      (2.2%)    0.1% (  -4% -    4%) 0.941
           BrowseMonthTaxoFacets       37.52     (24.4%)       37.56     (24.5%)    0.1% ( -39% -   64%) 0.987
                       LowPhrase       81.07      (1.9%)       81.19      (2.2%)    0.1% (  -3% -    4%) 0.826
       BrowseDayOfYearSSDVFacets       26.54     (27.0%)       26.59     (26.7%)    0.2% ( -42% -   73%) 0.982
           BrowseMonthSSDVFacets       25.92     (26.6%)       25.98     (26.2%)    0.2% ( -41% -   72%) 0.977
                       MedPhrase      128.80      (1.7%)      129.15      (2.0%)    0.3% (  -3% -    4%) 0.635
                      OrHighHigh       49.81      (3.4%)       49.95      (3.6%)    0.3% (  -6% -    7%) 0.787
                     AndHighHigh       78.84      (3.7%)       79.09      (3.8%)    0.3% (  -6% -    8%) 0.784
                    OrHighNotLow      729.06      (5.0%)      732.65      (5.2%)    0.5% (  -9% -   11%) 0.761
                       OrHighMed      117.86      (3.2%)      118.47      (3.2%)    0.5% (  -5% -    7%) 0.611
                          Fuzzy2      104.46      (5.4%)      105.20      (5.9%)    0.7% ( -10% -   12%) 0.690
                          IntNRQ       87.18      (2.2%)       87.88      (1.2%)    0.8% (  -2% -    4%) 0.142
                      AndHighLow     1166.89      (3.4%)     1183.39      (3.3%)    1.4% (  -5% -    8%) 0.179
                      AndHighMed      367.61      (4.9%)      373.01      (5.2%)    1.5% (  -8% -   12%) 0.355

Result 2:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                         Prefix3      147.12      (6.2%)      129.27      (4.0%)  -12.1% ( -21% -   -2%) 0.000
               AndMedFilterMonth      271.72      (3.3%)      257.32      (3.4%)   -5.3% ( -11% -    1%) 0.000
         AndHighMedDayTaxoFacets      142.77      (2.1%)      135.63      (3.0%)   -5.0% (  -9% -    0%) 0.000
               HighTermTitleSort      202.56      (2.2%)      194.53      (3.0%)   -4.0% (  -8% -    1%) 0.000
        AndHighHighDayTaxoFacets       39.70      (0.9%)       38.17      (1.2%)   -3.8% (  -5% -   -1%) 0.000
              AndHighFilterMonth      207.32      (4.1%)      199.45      (3.7%)   -3.8% ( -11% -    4%) 0.002
           HighTermDayOfYearSort      428.44      (5.8%)      413.79      (5.2%)   -3.4% ( -13% -    8%) 0.050
          OrHighMedDayTaxoFacets        7.64      (9.2%)        7.41      (7.3%)   -3.0% ( -17% -   14%) 0.249
                      TermDTSort      194.56      (4.1%)      188.70      (3.8%)   -3.0% ( -10% -    5%) 0.016
            MedTermDayTaxoFacets       93.00      (1.7%)       90.24      (1.3%)   -3.0% (  -5% -    0%) 0.000
       BrowseDayOfYearTaxoFacets       46.35      (0.6%)       45.27      (8.7%)   -2.3% ( -11% -    7%) 0.234
                    OrNotHighMed      696.99      (3.3%)      681.56      (3.0%)   -2.2% (  -8% -    4%) 0.027
                          Fuzzy1       96.77      (7.0%)       94.90      (8.2%)   -1.9% ( -16% -   14%) 0.423
            BrowseDateSSDVFacets        5.37      (6.3%)        5.27      (8.8%)   -1.8% ( -15% -   14%) 0.449
               HighTermMonthSort     2656.62      (5.9%)     2610.82      (5.8%)   -1.7% ( -12% -   10%) 0.352
                     MedSpanNear      177.92      (1.9%)      174.98      (2.0%)   -1.7% (  -5% -    2%) 0.008
                    OrHighNotMed      637.79      (4.7%)      627.39      (4.7%)   -1.6% ( -10% -    8%) 0.274
            HighTermTitleBDVSort       35.13      (1.6%)       34.58      (1.9%)   -1.6% (  -5% -    1%) 0.005
                 AndHighNotMonth      602.88      (6.3%)      593.45      (6.2%)   -1.6% ( -13% -   11%) 0.426
                    HighSpanNear        6.98      (1.4%)        6.88      (1.5%)   -1.4% (  -4% -    1%) 0.002
                   OrHighNotHigh      442.50      (4.3%)      437.29      (4.1%)   -1.2% (  -9% -    7%) 0.380
                   OrNotHighHigh      634.69      (4.0%)      627.34      (4.8%)   -1.2% (  -9% -    8%) 0.409
                    OrHighNotLow      518.26      (4.7%)      513.28      (5.1%)   -1.0% ( -10% -    9%) 0.537
                     LowSpanNear       16.59      (1.2%)       16.44      (0.9%)   -0.9% (  -2% -    1%) 0.004
           BrowseMonthTaxoFacets       42.47     (18.0%)       42.10     (19.3%)   -0.9% ( -32% -   44%) 0.881
             LowIntervalsOrdered      196.39      (6.2%)      194.69      (6.6%)   -0.9% ( -12% -   12%) 0.669
                    OrNotHighLow     1353.91      (4.2%)     1342.18      (4.3%)   -0.9% (  -9% -    8%) 0.522
                  AndMedNotMonth      975.83      (6.2%)      967.70      (5.7%)   -0.8% ( -11% -   11%) 0.658
            HighIntervalsOrdered       26.70      (7.2%)       26.48      (7.5%)   -0.8% ( -14% -   15%) 0.729
                         LowTerm     1227.46      (5.2%)     1217.96      (5.9%)   -0.8% ( -11% -   10%) 0.658
                        Wildcard      314.75      (5.1%)      312.45      (5.1%)   -0.7% ( -10% -    9%) 0.648
                       LowPhrase      110.34      (5.4%)      109.54      (5.4%)   -0.7% ( -10% -   10%) 0.674
             MedIntervalsOrdered       66.86     (10.3%)       66.39     (10.9%)   -0.7% ( -19% -   22%) 0.834
            BrowseDateTaxoFacets       45.81      (5.6%)       45.50      (8.6%)   -0.7% ( -14% -   14%) 0.766
                      AndHighMed      354.45      (4.5%)      352.40      (4.9%)   -0.6% (  -9% -    9%) 0.697
                     AndHighHigh       91.95      (3.7%)       91.49      (3.6%)   -0.5% (  -7% -    7%) 0.665
                          Fuzzy2       83.49      (7.0%)       83.14      (7.1%)   -0.4% ( -13% -   14%) 0.852
                       OrHighMed      203.18      (3.6%)      202.61      (3.1%)   -0.3% (  -6% -    6%) 0.793
                          IntNRQ      100.90     (11.1%)      100.67     (11.0%)   -0.2% ( -20% -   24%) 0.948
                       MedPhrase       22.39      (2.7%)       22.34      (2.7%)   -0.2% (  -5% -    5%) 0.804
                      HighPhrase      324.29      (3.2%)      323.68      (2.5%)   -0.2% (  -5% -    5%) 0.838
                        PKLookup      293.51      (3.5%)      293.02      (3.3%)   -0.2% (  -6% -    6%) 0.878
                        HighTerm     1125.52      (4.8%)     1124.54      (5.0%)   -0.1% (  -9% -   10%) 0.955
                      OrHighHigh       70.53      (5.3%)       70.47      (5.6%)   -0.1% ( -10% -   11%) 0.965
           BrowseMonthSSDVFacets       27.34     (30.0%)       27.32     (29.9%)   -0.1% ( -46% -   85%) 0.994
                         Respell       75.69      (5.6%)       75.65      (5.5%)   -0.1% ( -10% -   11%) 0.975
       BrowseDayOfYearSSDVFacets       27.94     (29.7%)       27.93     (29.8%)   -0.0% ( -45% -   84%) 0.997
                 MedSloppyPhrase        5.29      (4.0%)        5.29      (3.9%)   -0.0% (  -7% -    8%) 0.997
                HighSloppyPhrase       41.15      (2.4%)       41.18      (2.8%)    0.1% (  -4% -    5%) 0.932
                      AndHighLow     3122.83      (5.9%)     3125.14      (5.3%)    0.1% ( -10% -   11%) 0.966
                 LowSloppyPhrase       36.10      (2.4%)       36.17      (2.4%)    0.2% (  -4% -    5%) 0.808
                       OrHighLow      426.69      (5.1%)      428.21      (4.9%)    0.4% (  -9% -   10%) 0.820
                         MedTerm      909.25      (5.3%)      912.72      (4.8%)    0.4% (  -9% -   11%) 0.811

AndMedFilterMonth is now getting modest improvement with the same sorted index and code. So as expected, the index and query also play a big part in the benchmark results as well.


With regard to the bulk scorer approach, it is interesting! However, if we were to split the window based on certain size and only call peexNextNonMatchingDocID when advancing to a new window, I felt it might not be as effective, since for unsorted index, the matching doc ids might have large gaps between ids to begin with? My implementation above would temporarily pause calling peexNextNonMatchingDocID for the next 128 docs encountered, so it should be more aggressive in terms of skipping the calls.

@jpountz
Copy link
Contributor

jpountz commented Jun 28, 2023

if we were to split the window based on certain size and only call peexNextNonMatchingDocID when advancing to a new window, I felt it might not be as effective, since for unsorted index, the matching doc ids might have large gaps between ids to begin with? My implementation above would temporarily pause calling peexNextNonMatchingDocID for the next 128 docs encountered, so it should be more aggressive in terms of skipping the calls.

You are right, sparse queries may be affected.

Copy link

github-actions bot commented Jan 8, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jan 8, 2024
@jpountz
Copy link
Contributor

jpountz commented Mar 19, 2024

Sorry for the long time without a reply. I had some hesitation about moving this change forward since it's a big API change and I didn't see much appeal for it (besides you and me I guess). But I'd like to also move sparse/zone indexing forward, and they'd need the same API at search-time, so I'm now keen on moving it forward. If you're interesting in updating this branch, I'll be interested in reviewing.

Since we last worked on this branch, conjunctions introduced a BulkScorer: ConjunctionBulkScorer. IMO it would be a better place to take advantage of this new API since it can more naturally check the value of peekNextNonMatchingDocID() every N docs without adding per-doc overhead.

@github-actions github-actions bot removed the Stale label Mar 20, 2024
Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants