You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PPL supports syntax for sorting numerically, lexicographically, or by IP address (e.g. sort num(field_name), sort str(field_name), and sort ip(field_name), respectively) -- but this syntax has no effect on the resulting sort order. It should be removed.
This syntax appears to have been replaced by the addition of data types, and sorting is always done according to the data type (i.e. numerical data types are sorted numerically, strings are sorted lexicographically), rather than as specified by str, num, or ip (i.e. these keywords have no effect on the result). Moreover, this syntax is not mentioned in the supporting user documentation for either OpenSearch SQL or Spark, and there is no actual implementation of the functionality in either code bases - the specified "sort type" is simply ignored.
How can one reproduce the bug?
Steps to reproduce the behaviour:
Create data set with numerical field
Sort using the str keyword (i.e. sort str(field_name))
Observed: syntax is valid, but sorting is still numerical
What is the expected behaviour?
As a user, you would likely expect the result to be sorted as specified (i.e. numerically, lexicographically, or by IP address), but the actual behaviour is to always sort by the field's data type.
If a user still wants to sort a numerical field lexicographically (for example) once this syntax is removed, they can still do so by first casting the field to a numerical data type before sorting it.
What is the bug?
PPL supports syntax for sorting numerically, lexicographically, or by IP address (e.g. sort num(field_name), sort str(field_name), and sort ip(field_name), respectively) -- but this syntax has no effect on the resulting sort order. It should be removed.
This syntax appears to have been replaced by the addition of data types, and sorting is always done according to the data type (i.e. numerical data types are sorted numerically, strings are sorted lexicographically), rather than as specified by
str
,num
, orip
(i.e. these keywords have no effect on the result). Moreover, this syntax is not mentioned in the supporting user documentation for either OpenSearch SQL or Spark, and there is no actual implementation of the functionality in either code bases - the specified "sort type" is simply ignored.How can one reproduce the bug?
Steps to reproduce the behaviour:
str
keyword (i.e.sort str(field_name)
)What is the expected behaviour?
As a user, you would likely expect the result to be sorted as specified (i.e. numerically, lexicographically, or by IP address), but the actual behaviour is to always sort by the field's data type.
If a user still wants to sort a numerical field lexicographically (for example) once this syntax is removed, they can still do so by first casting the field to a numerical data type before sorting it.
What is your host/environment?
N/A
Do you have any screenshots?
None
Do you have any additional context?
Related to #3145 (Add IP data type) and opensearch-project/opensearch-spark#963 (same issue for Spark).
The text was updated successfully, but these errors were encountered: