Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Sorting on integer_range field types can fail with out of bounds exceptions #12263

Open
Scrambles56 opened this issue Feb 8, 2024 · 2 comments
Labels
bug Something isn't working Search Search query, autocomplete ...etc

Comments

@Scrambles56
Copy link

Describe the bug

Attempting to sort a search on a integer_range field provides inconsistent results, some searches will be successful, but then if a specific document is in the results, will fail with an exception as follows:

{
  "error": {
    "root_cause": [],
    "type": "search_phase_execution_exception",
    "reason": "",
    "phase": "fetch",
    "grouped": true,
    "failed_shards": [],
    "caused_by": {
      "type": "array_index_out_of_bounds_exception",
      "reason": "Index 5 out of bounds for length 5"
    }
  },
  "status": 500
}

Related component

Search

To Reproduce

  1. Spin up a clean opensearch instance (v2.11.0)
  2. Create an index as follows:
PUT /listings
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "listing_id": {
        "type": "keyword"
      },
      "version": {
        "type": "integer"
      },
      "title": {
        "type": "text"
      },
      "description": {
        "type": "text"
      },
      "supplier_name": {
        "type": "text"
      },
      "categories": {
        "properties": {
          "id": {
            "type": "keyword"
          },
          "name": {
            "type": "keyword"
          }
        }
      },
      "variant_types": {
        "type": "keyword"
      },
      "price_range": {
        "type": "integer_range"
      },
      "compare_at_price": {
        "type": "integer"
      },
      "location_types": {
        "type": "keyword"
      },
      "location": {
        "type": "geo_point"
      }
    }
  }
}
  1. Index a document with the following details:
POST /listings/_doc
{
    "listing_id": "lst_clscbqyvf00060104t1339786",
    "version": 1,
    "title": "Breathwork Session",
    "description": "asd",
    "supplier_name": "Impact Life Coach and Trauma Therapy",
    "categories": [
      {
        "name": "Breathwork",
        "id": "cat_clm4cjukp000a3b6h7qi9k5i5"
      },
      {
        "name": "Mind-Body",
        "id": "cat_clm4chyb600023b6hv8zhwty5"
      }
    ],
    "variant_types": [
      "Pass"
    ],
    "price_range": {
      "gte": 300,
      "lte": 500
    },
    "token_currency": "nztoken",
    "primary_image_url": "https://google.com/",
    "location_types": []
  }
  1. Attempt a search:
POST /listings/_search?typed_keys=true
{
  "from": 0,
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "categories.id": [
              "cat_clm4cjukp000a3b6h7qi9k5i5"
            ]
          }
        }
      ]
    }
  },
  "size": 10,
  "sort": [
    {
      "price_range": {
        "order": "asc"
      }
    }
  ]
}

Observe: Search fails with error detailed above.

  1. Reset your environment running steps 1 & 2 again.
  2. Index a document as follows (note the different price_range.lte):
POST /listings/_doc
{
    "listing_id": "lst_clscbqyvf00060104t1339786",
    "version": 1,
    "title": "Breathwork Session",
    "description": "asd",
    "supplier_name": "Impact Life Coach and Trauma Therapy",
    "categories": [
      {
        "name": "Breathwork",
        "id": "cat_clm4cjukp000a3b6h7qi9k5i5"
      },
      {
        "name": "Mind-Body",
        "id": "cat_clm4chyb600023b6hv8zhwty5"
      }
    ],
    "variant_types": [
      "Pass"
    ],
    "price_range": {
      "gte": 300,
      "lte": 490
    },
    "token_currency": "nztoken",
    "primary_image_url": "https://google.com/",
    "location_types": []
  }
  1. Perform the search again:
POST /listings/_search?typed_keys=true
{
  "from": 0,
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "categories.id": [
              "cat_clm4cjukp000a3b6h7qi9k5i5"
            ]
          }
        }
      ]
    }
  },
  "size": 10,
  "sort": [
    {
      "price_range": {
        "order": "asc"
      }
    }
  ]
}

Observe: Search succeeds

Expected behavior

Option 1:
Sorting on integer_range should be unsupported, and all searches attempting to do so should fail with a clear error message.

Option 2:
Sorting on integer_range should allow you to specify an anchor point to sort on (e.g. min,max,median).

Additional Details

Plugins
N/A

Screenshots
N/A

Host/Environment (please complete the following information):

  • Docker image: opensearchproject/opensearch:2.11.0
@Scrambles56 Scrambles56 added bug Something isn't working untriaged labels Feb 8, 2024
@github-actions github-actions bot added the Search Search query, autocomplete ...etc label Feb 8, 2024
@nknize
Copy link
Collaborator

nknize commented Feb 8, 2024

From Slack discussion (will purge after 90 days) so including explanation below:

oye! tldr; sorting by range fields is unexpected behavior.
Looks like Elastic mucked that one up pretty good. I should've explicitly stated this when I wrote the blog post years ago.
I initially removed doc value support for RangeFields when I first added the field to Elasticsearch, only because
we didn't have any aggregation support for range fields. They were added back not long after in order to boost
query performance using IndexOrDocValuesQuery, but the nasty side effect is that Sort also uses doc values,
and no guard rails were included in the commit. So what's happening is the integer range encoding of the value
to doc value is variable (to save space on disk since S3 is expensive 🙂 ). So when the
DocValueFormat instance is pulled from RangeFieldType.docValueFormat it's just using the default RAW
formatter which doesn't take the RangeType into consideration thus tries to blindly decode the encoded
range to a nonsensicle string using BytesRef.utf8ToString welp, as expected the values aren't UTF8
so the UnicodeUtil#UTF8ToUTF16 trips a byte boundary assertion (if you're running with assertions enabled)
and nasty unexpected behaviors ensue 😕

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7 8]
@Scrambles56 Thanks for filing this issue, look forward to a pull request to address this issue

@getsaurabh02 getsaurabh02 moved this from 🆕 New to Later (6 months plus) in Search Project Board Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search Search query, autocomplete ...etc
Projects
Status: Later (6 months plus)
Development

No branches or pull requests

3 participants