Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvements to "from" operator #5378

Merged
merged 3 commits into from
Oct 28, 2024
Merged

improvements to "from" operator #5378

merged 3 commits into from
Oct 28, 2024

Conversation

mccanne
Copy link
Collaborator

@mccanne mccanne commented Oct 27, 2024

This commit improves the from operator and brings it into closer alignment with SQL. We also deprecated pool, get, and file though they can still be used but appear as "from" when running super compile.

Targets of the from operator are now self describing and names are interpreted as files when running locally (super -c) and pools when running on a lake (super db). URLs continue to work in both cases.

Const expressions continue to work but now need to be enclosed in brackets, e.g., "from [poolConst]". This makes easy to distinguish a file reference e.g., "from foo.json", from a record deref. e.g., "from [this.pool]" Currently these expressions must evaluate at compile time, but this sets us up to configure "from" to have a parent and make this all dynamic. Note you can also now say 'from ["a.json", "b.json"]', which will do a merge of the scans (not sequential eval).

The semantic pass now does existence checks of files and pools so "super compile" tests needed to be updated to "touch" files etc.

We will update docs to reflect these changes after we add the parent concept to "from" and have a bit of experience with the new syntax and semantics.

@mccanne mccanne changed the title improvements to from operator improvements to "from" operator Oct 27, 2024
Base automatically changed from ast-name to main October 27, 2024 18:53
This commit improves the from operator and brings it into closer alignment
with SQL.  We also deprecated pool, get, and file though they can still
be used but appear as "from" when running super compile.

Targets of the from operator are now self describing and names are
interpreted as files when running locally (super -c) and pools when
running on a lake (super db).  URLs continue to work in both cases.

Const expressions continue to work but now need to be enclosed
in brackets, e.g., "from [poolConst]".  This makes easy
to distinguish a file reference e.g., "from foo.json", from a
record deref. e.g., "from [this.pool]"  Currently these expressions
must evaluate at compile time, but this sets us up to configure
"from" to have a parent and make this all dynamic. Note you can also
now say 'from ["a.json", "b.json"]', which will do a merge of
the scans (not sequential eval).

The semantic pass now does existence checks of files and pools
so "super compile" tests needed to be updated to "touch" files etc.

We will update docs to reflect these changes after we add the parent
concept to "from" and have a bit of experience with the new syntax and
semantics.
compiler/ast/ast.go Outdated Show resolved Hide resolved
compiler/ast/ast.go Outdated Show resolved Hide resolved
compiler/ast/unpack.go Outdated Show resolved Hide resolved
Kind string `json:"kind" unpack:""`
Path string `json:"path"`
Format string `json:"format"`
SortKeys order.SortKeys `json:"sort_keys"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's reason enough to keep this (we can just add it back if we ever want it) but Parquet has support for sorted data: https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L757-L770

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the system should be able to use sort keys from file objects but this doesn't belong in the query language. Instead this should be part of the interface between the compiler and the file objects as it is with the pool adaptor.

compiler/parser/parser.peg Outdated Show resolved Hide resolved
names := make([]string, 0, len(vals))
for _, val := range vals {
if super.TypeUnder(val.Type()) != super.TypeString {
a.error(entity.Expr, fmt.Errorf("from expression requires a string but encountered: %s", zson.String(val)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error messages read poorly with this colon.

Suggested change
a.error(entity.Expr, fmt.Errorf("from expression requires a string but encountered: %s", zson.String(val)))
a.error(entity.Expr, fmt.Errorf("from expression requires a string but encountered %s", zson.String(val)))

compiler/semantic/op.go Outdated Show resolved Hide resolved
compiler/semantic/op.go Outdated Show resolved Hide resolved
compiler/semantic/op.go Outdated Show resolved Hide resolved
compiler/parser/parser.peg Outdated Show resolved Hide resolved
zfmt/ast.go Outdated Show resolved Hide resolved
zfmt/ast.go Outdated Show resolved Hide resolved
zfmt/ast.go Outdated Show resolved Hide resolved
compiler/parser/parser.peg Show resolved Hide resolved
compiler/parser/parser.peg Show resolved Hide resolved
compiler/parser/parser.peg Show resolved Hide resolved
Copy link
Member

@nwt nwt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@mccanne mccanne merged commit eebbb56 into main Oct 28, 2024
4 checks passed
@mccanne mccanne deleted the from-any branch October 28, 2024 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants