Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Outdated PPL Sorting Syntax #3180

Open
currantw opened this issue Dec 2, 2024 · 0 comments
Open

[BUG] Outdated PPL Sorting Syntax #3180

currantw opened this issue Dec 2, 2024 · 0 comments
Labels
bug Something isn't working untriaged

Comments

@currantw
Copy link
Contributor

currantw commented Dec 2, 2024

What is the bug?

PPL supports syntax for sorting numerically, lexicographically, or by IP address (e.g. sort num(field_name), sort str(field_name), and sort ip(field_name), respectively) -- but this syntax has no effect on the resulting sort order. It should be removed.

This syntax appears to have been replaced by the addition of data types, and sorting is always done according to the data type (i.e. numerical data types are sorted numerically, strings are sorted lexicographically), rather than as specified by str, num, or ip (i.e. these keywords have no effect on the result). Moreover, this syntax is not mentioned in the supporting user documentation for either OpenSearch SQL or Spark, and there is no actual implementation of the functionality in either code bases - the specified "sort type" is simply ignored.

How can one reproduce the bug?

Steps to reproduce the behaviour:

  1. Create data set with numerical field
  2. Sort using the str keyword (i.e. sort str(field_name))
  3. Observed: syntax is valid, but sorting is still numerical

What is the expected behaviour?

As a user, you would likely expect the result to be sorted as specified (i.e. numerically, lexicographically, or by IP address), but the actual behaviour is to always sort by the field's data type.

If a user still wants to sort a numerical field lexicographically (for example) once this syntax is removed, they can still do so by first casting the field to a numerical data type before sorting it.

What is your host/environment?

N/A

Do you have any screenshots?

None

Do you have any additional context?

Related to #3145 (Add IP data type) and opensearch-project/opensearch-spark#963 (same issue for Spark).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working untriaged
Projects
None yet
Development

No branches or pull requests

1 participant