Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Order of logic condition alternates query result #7525

Closed
jetlime opened this issue May 11, 2023 · 3 comments
Closed

[BUG] Order of logic condition alternates query result #7525

jetlime opened this issue May 11, 2023 · 3 comments
Labels
bug Something isn't working Search Search query, autocomplete ...etc

Comments

@jetlime
Copy link

jetlime commented May 11, 2023

Describe the bug
On a get query, I do not observe the same output in function of the order of logic conditions. In the case the condition (exists:process.pid) preceeds process.pid= 1782, the query returns no single result. In the contrary, we obtain a set of hits, as it should be.

To Reproduce
Steps to reproduce the behavior:

  1. Query which is not returning any hits
GET raw-*/_search?size=9
  {"_source": {"includes": ["*"]}, "query": {"query_string": {"query":"(_exists_:process.pid) AND process.pid= 1782 AND (@timestamp:[\"2023-05-09T12:02:41.718Z\" TO \"2023-05-11T12:07:41.718Z\"])"}}}
}
  1. We obtain no results,
{
  "took": 1129,
  "timed_out": false,
  "_shards": {
    "total": 11,
    "successful": 11,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  }
}
  1. Modify the order of the logical conditions, to obtain the correct query result,
GET raw-*/_search?size=9
  {"_source": {"includes": ["*"]}, "query": {"query_string": {"query": "process.pid= 1782 AND (_exists_:process.pid) AND (@timestamp:[\"2023-05-09T12:02:41.718Z\" TO \"2023-05-11T12:07:41.718Z\"])"}}}
}

Expected behavior

Both queries shall return the below defined answer. However only the second one returns the expected behavior.

{
  "took": 622,
  "timed_out": false,
  "_shards": {
    "total": 11,
    "successful": 11,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 7521,
      "relation": "eq"
    },
    "max_score": 7.1885996,
    "hits": [{result object}]
}

Host/Environment (please complete the following information):

  • Opensearch Version: 2.6.0
@jetlime jetlime added bug Something isn't working untriaged labels May 11, 2023
@dblock
Copy link
Member

dblock commented May 11, 2023

Interesting. Want to try to add a failing YAML REST test for this?

@andrross andrross added the Search Search query, autocomplete ...etc label May 30, 2023
@sejli sejli removed the untriaged label Jun 6, 2023
@msfroh
Copy link
Collaborator

msfroh commented Jun 20, 2023

Wrote a quick, throwaway unit test (in QueryStringQueryBuilderTests) to see what the two queries parse to:

    public void testToQueryTermOrdering() throws Exception {

        Query query1 = queryStringQuery("(_exists_:process.pid) AND process.pid= 1782 AND (@timestamp:[\\\"2023-05-09T12:02:41.718Z\\\" TO \\\"2023-05-11T12:07:41.718Z\\\"])").toQuery(createShardContext());

        Query query2 = queryStringQuery("process.pid= 1782 AND (_exists_:process.pid) AND (@timestamp:[\\\"2023-05-09T12:02:41.718Z\\\" TO \\\"2023-05-11T12:07:41.718Z\\\"])").toQuery(createShardContext());

        assertEquals(query1, query2);
    }

The actual parsing results are broken, because the unit test mapping doesn't include process.pid or @timestamp.

The problem here seems to be the faulty syntax to match the process whose pid is 1782. The proper syntax for that is process.pid:1782, with : instead of = and no space before 1782.

Both queries end up parsing the following four clauses:

  1. process.pid= (as a term)
  2. 1782 (as a term)
  3. _exists_:process.pid
  4. @timestamp:[\"2023-05-09T12:02:41.718Z\" TO \"2023-05-11T12:07:41.718Z\"]

The difference between the two queries is that the first one, I think because it sees the "AND" early on and then sees the space between two terms, ends up marking all 4 clauses as required. The process.pid= term doesn't match anything. The second query sees the space between the first two terms (process.pid= and 1782) and implicitly inserts "OR" between them, so the process.pid= term is marked as optional. The remaining three terms are marked as required.

It would be great if end users could see the Lucene query parsed from their OpenSearch queries, like you can with debug=query in Solr. That would make these kinds of things easier to debug. We have an issue suggesting that at #6794.

@msfroh
Copy link
Collaborator

msfroh commented Jun 20, 2023

I'm going to close this, since mixing explicit and implicit Boolean operators is a longstanding source of undefined behavior. In this case, I believe the issue was just incorrect syntax around the process.pid:1782 clause.

@jetlime -- feel free to reopen if you think there's a bigger issue at play. Thanks!

@msfroh msfroh closed this as completed Jun 20, 2023
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Search Project Board Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search Search query, autocomplete ...etc
Projects
Archived in project
Development

No branches or pull requests

5 participants