-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improvements to "from" operator #5378
Conversation
This commit improves the from operator and brings it into closer alignment with SQL. We also deprecated pool, get, and file though they can still be used but appear as "from" when running super compile. Targets of the from operator are now self describing and names are interpreted as files when running locally (super -c) and pools when running on a lake (super db). URLs continue to work in both cases. Const expressions continue to work but now need to be enclosed in brackets, e.g., "from [poolConst]". This makes easy to distinguish a file reference e.g., "from foo.json", from a record deref. e.g., "from [this.pool]" Currently these expressions must evaluate at compile time, but this sets us up to configure "from" to have a parent and make this all dynamic. Note you can also now say 'from ["a.json", "b.json"]', which will do a merge of the scans (not sequential eval). The semantic pass now does existence checks of files and pools so "super compile" tests needed to be updated to "touch" files etc. We will update docs to reflect these changes after we add the parent concept to "from" and have a bit of experience with the new syntax and semantics.
Kind string `json:"kind" unpack:""` | ||
Path string `json:"path"` | ||
Format string `json:"format"` | ||
SortKeys order.SortKeys `json:"sort_keys"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's reason enough to keep this (we can just add it back if we ever want it) but Parquet has support for sorted data: https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L757-L770
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the system should be able to use sort keys from file objects but this doesn't belong in the query language. Instead this should be part of the interface between the compiler and the file objects as it is with the pool adaptor.
compiler/semantic/op.go
Outdated
names := make([]string, 0, len(vals)) | ||
for _, val := range vals { | ||
if super.TypeUnder(val.Type()) != super.TypeString { | ||
a.error(entity.Expr, fmt.Errorf("from expression requires a string but encountered: %s", zson.String(val))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error messages read poorly with this colon.
a.error(entity.Expr, fmt.Errorf("from expression requires a string but encountered: %s", zson.String(val))) | |
a.error(entity.Expr, fmt.Errorf("from expression requires a string but encountered %s", zson.String(val))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit improves the from operator and brings it into closer alignment with SQL. We also deprecated pool, get, and file though they can still be used but appear as "from" when running super compile.
Targets of the from operator are now self describing and names are interpreted as files when running locally (super -c) and pools when running on a lake (super db). URLs continue to work in both cases.
Const expressions continue to work but now need to be enclosed in brackets, e.g., "from [poolConst]". This makes easy to distinguish a file reference e.g., "from foo.json", from a record deref. e.g., "from [this.pool]" Currently these expressions must evaluate at compile time, but this sets us up to configure "from" to have a parent and make this all dynamic. Note you can also now say 'from ["a.json", "b.json"]', which will do a merge of the scans (not sequential eval).
The semantic pass now does existence checks of files and pools so "super compile" tests needed to be updated to "touch" files etc.
We will update docs to reflect these changes after we add the parent concept to "from" and have a bit of experience with the new syntax and semantics.